Skip to content

Instantly share code, notes, and snippets.

View MLnick's full-sized avatar

Nick Pentreath MLnick

  • Automattic
  • Cape Town, South Africa
  • X @MLnick
View GitHub Profile
@MLnick
MLnick / onnx.pb
Last active October 17, 2019 11:39
graph {
node {
input: "X"
input: "W"
output: "Y"
name: "matmult"
op_type: "Mul"
}
input {
name: "X"
1. Error: gapply() and gapplyCollect() on a DataFrame (@test_sparkSQL.R#2569) --
org.apache.spark.SparkException: Job aborted due to stage failure: Task 114 in stage 957.0 failed 1 times, most recent failure: Lost task 114.0 in stage 957.0 (TID 13209, localhost, executor driver): org.apache.spark.SparkException: R computation failed with
[1] 1
[1] 3
[1] 2
[1][1] 1 2
[1] 3
[1] 2
[1] 2
@MLnick
MLnick / Ensemble.scala
Created August 30, 2017 07:28
Ensemble pipeline component in Spark
class Ensemble(val uid: String, models: Seq[RegressionModel[_, _]]) extends Model[RegressionModel[_, _]] {
import org.apache.spark.sql.functions._
def this(models: Seq[Model[_]]) = this(Identifiable.randomUID("ensemble"), models)
override def copy(extra: ParamMap) = ???
override def transform(
dataset: Dataset[_]): DataFrame = {
Failed -------------------------------------------------------------------------
1. Error: gapply() and gapplyCollect() on a DataFrame (@test_sparkSQL.R#2853) --
org.apache.spark.SparkException: Job aborted due to stage failure: Task 40 in stage 985.0 failed 1 times, most recent failure: Lost task 40.0 in stage 985.0 (TID 13694, localhost, executor driver): org.apache.spark.SparkException: R computation failed with
[1] 2
[1] 1
[1] 3
[1] 2
[1] 1
[1] 3
[1] 2[1]
@MLnick
MLnick / dask-ps.py
Created May 31, 2017 08:57
Dask Parameter Server - Initial WIP
# ==== dask-ps
import dask
import dask.array as da
from dask import delayed
from dask_glm import families
from dask_glm.algorithms import lbfgs
from distributed import LocalCluster, Client, worker_client
import numpy as np
import time
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/Users/nick/miniconda2/envs/dask_glm/lib/python3.5/site-packages/distributed/protocol/core.py", line 33, in dumps
small_header, small_payload = dumps_msgpack(msg)
File "/Users/nick/miniconda2/envs/dask_glm/lib/python3.5/site-packages/distributed/protocol/core.py", line 134, in dumps_msgpack
payload = msgpack.dumps(msg, use_bin_type=True)
File "/Users/nick/miniconda2/envs/dask_glm/lib/python3.5/site-packages/msgpack/__init__.py", line 47, in packb
return Packer(**kwargs).pack(o)
File "msgpack/_packer.pyx", line 231, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:3661)
File "msgpack/_packer.pyx", line 233, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:3503)
@MLnick
MLnick / SQLTransformerWithJoin.scala
Created August 18, 2016 07:55
Using SQLTransformer to join DataFrames in ML Pipeline
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
@MLnick
MLnick / HyperLogLogStoreUDAF.scala
Last active March 16, 2022 05:31
Experimenting with Spark SQL UDAF - HyperLogLog UDAF for distinct counts, that stores the actual HLL for each row to allow further aggregation
class HyperLogLogStoreUDAF extends UserDefinedAggregateFunction {
override def inputSchema = new StructType()
.add("stringInput", BinaryType)
override def update(buffer: MutableAggregationBuffer, input: Row) = {
// This input Row only has a single column storing the input value in String (or other Binary data).
// We only update the buffer when the input value is not null.
if (!input.isNullAt(0)) {
if (buffer.isNullAt(0)) {
@MLnick
MLnick / call_graphflow_shortcode_in_snippet.php
Last active August 29, 2015 14:16
Custom Graphflow Shortcode using Code Snippets Plugin
add_shortcode('graphflow_shortcode_demo', 'graphflow_shortcode_demo_display');
function graphflow_shortcode_demo_display($attr) {
ob_start();
if ( isset( $_REQUEST['cat'] ) ) {
$cat_id = $_REQUEST['cat'];
$term = get_term_by( 'id', $cat_id, 'product_cat', 'ARRAY_A' );
$cat_name = $term['name'];
echo do_shortcode( '[graphflow_recommendations title="Recommended for you in ' . $cat_name . '" columns=4 per_page=4 product_cat=' . $cat_id . ']' );
} else {
echo do_shortcode( '[graphflow_recommendations columns=4 per_page=4]' );
@MLnick
MLnick / logstash.txt
Created October 16, 2014 10:07
Logstash crash
Failed to flush outgoing items {:outgoing_count=>1, :exception=>#<SocketError: initialize: java.net.SocketException: Too many open files>, :backtrace=>["org/jruby/ext/socket/RubySocket.java:190:in `initialize'", "org/jruby/RubyIO.java:852:in `new'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:146:in `connect'", "org/jruby/RubyArray.java:1613:in `each'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", "org/jruby/RubyProc.java:271:in `call'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48:in `fetch'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", "/home/ec2-user/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `execute'", "/home/ec2-user/logstash-1.4.2/vendor/bund