Skip to content

Instantly share code, notes, and snippets.

View freeman-lab's full-sized avatar

Jeremy Freeman freeman-lab

View GitHub Profile
@freeman-lab
freeman-lab / lightning-for-loop.ipynb
Created September 5, 2015 04:25
Example using a for loop with Lightning in the notebook
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@freeman-lab
freeman-lab / sklearn-mllib-local.ipynb
Created July 30, 2015 17:47
Comparing sklearn & mllib locally
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@freeman-lab
freeman-lab / channel-parallel.ipynb
Created May 24, 2015 21:30
Multi-channel time series parallelization in Thunder
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
var THREE = require('three.js');
var _ = require('lodash');
var ParticleTest = function(selector, data, images, opts) {
var width = $(selector).width();
var height = width * 0.7;
@freeman-lab
freeman-lab / FixedLengthBinaryInputFormat.scala
Created August 12, 2014 23:54
Binary input with fixed record length
import java.io.{FileFilter, File}
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{LongWritable, BytesWritable}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.{JobContext, InputSplit, RecordReader, TaskAttemptContext}
/**
* Custom Input Format for reading and splitting flat binary files that contain records, each of which
* are a fixed size in bytes. The fixed record size is specified through a parameter recordLength
* in the Hadoop configuration.
@freeman-lab
freeman-lab / bisecting.scala
Last active December 29, 2015 07:45
Bisecting k-means for hierarchical clustering in Spark
/**
* bisecting <master> <input> <nNodes> <subIterations>
*
* divisive hierarchical clustering using bisecting k-means
* assumes input is a text file, each row is a data point
* given as numbers separated by spaces
*
*/
import org.apache.spark.SparkContext
@freeman-lab
freeman-lab / StreamingKMeans.scala
Last active February 26, 2019 07:13
Spark Streaming + MLLib integration examples
package thunder.streaming
import org.apache.spark.{SparkConf, Logging}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.mllib.clustering.KMeansModel
import scala.util.Random.nextDouble