darkseed’s gists

darkseed / nearest_neighbor.py

Last active August 29, 2015 14:24 — forked from Newmu/nearest_neighbor.py

	import numpy as np
	from scipy.spatial.distance import cdist

	def nearest_neighbor(samples, targets, samples_to_classify, metric='euclidean'):
	return targets[np.argmin(cdist(samples_to_classify, samples, metric=metric), axis=1)]

darkseed / getcolor.py

Last active February 25, 2017 10:25 — forked from zollinger/getcolor.py

	import Image, ImageDraw

	def get_colors(infile, outfile, numcolors=10, swatchsize=20, resize=150):

	image = Image.open(infile)
	image = image.resize((resize, resize))
	result = image.convert('P', palette=Image.ADAPTIVE, colors=numcolors)
	result.putalpha(0)
	colors = result.getcolors(resize*resize)

darkseed / lm_example

Last active August 29, 2015 14:23 — forked from yoavg/lm_example

	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# The unreasonable effectiveness of Character-level Language Models\n",
	"## (and why RNNs are still cool)\n",
	"\n",
	"###[Yoav Goldberg](http://www.cs.biu.ac.il/~yogo)\n",

darkseed / cluster_example.py

Last active August 29, 2015 14:22 — forked from xim/cluster_example.py

	import sys

	import numpy
	from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
	import nltk.corpus
	from nltk import decorators
	import nltk.stem

	stemmer_func = nltk.stem.EnglishStemmer().stem
	stopwords = set(nltk.corpus.stopwords.words('english'))

darkseed / gist:9b2361a551c4bfeb635d

Last active August 29, 2015 14:20 — forked from debasishg/gist:8172796

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep

darkseed / gist:838072bec58a9da9f526

Last active August 29, 2015 14:18 — forked from gwenshap/gist:505b3fa6e478282e03c9

	ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar;

	DROP TABLE raw_log;

	CREATE EXTERNAL TABLE raw_log(
	IP STRING,
	timestamp STRING,
	URL STRING,
	referrer STRING,
	user_agent STRING)

darkseed / 00-Setup-IPython-PySpark.ipynb

Last active August 29, 2015 14:18 — forked from fperez/00-Setup-IPython-PySpark.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

darkseed / StanfordNERExample.scala

Last active August 29, 2015 14:16 — forked from seralf/StanfordNERExample.scala

	package ner

	import edu.stanford.nlp.ie.crf.CRFClassifier
	import scala.collection.JavaConversions._
	import scala.collection.JavaConverters._
	import edu.stanford.nlp.ling.CoreAnnotations
	import java.util.ArrayList
	import java.util.HashMap
	import java.util.Map
	import scala.xml.XML

darkseed / KMeansJob.scala

Last active August 29, 2015 14:15 — forked from azymnis/KMeansJob.scala

	import com.twitter.algebird.{Aggregator, Semigroup}
	import com.twitter.scalding._

	import scala.util.Random

	/**
	* This job is a tutorial of sorts for scalding's Execution[T] abstraction.
	* It is a simple implementation of Lloyd's algorithm for k-means on 2D data.
	*
	* http://en.wikipedia.org/wiki/K-means_clustering

darkseed / ItemSimilarity.scala

Last active August 29, 2015 14:15 — forked from azymnis/ItemSimilarity.scala

	import com.twitter.scalding._
	import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }

	/**
	* Computes similar items (with a string itemId), based on approximate
	* Jaccard similarity, using LSH.
	*
	* Assumes an input data TSV file of the following format:
	*
	* itemId userId

Tom Mulder darkseed