Skip to content

Instantly share code, notes, and snippets.

View eliasah's full-sized avatar

Elie A. eliasah

View GitHub Profile
@clintongormley
clintongormley / load_test_data.sh
Last active January 5, 2024 07:32
Run these commands in your shell to setup the test data for Chapter 5
curl -XPUT 'http://localhost:9200/us/user/1?pretty=1' -d '
{
"email" : "[email protected]",
"name" : "John Smith",
"username" : "@john"
}
'
curl -XPUT 'http://localhost:9200/gb/user/2?pretty=1' -d '
{
@jeremyfelt
jeremyfelt / open-source-search-compare.md
Last active January 3, 2025 13:33
Comparing open source search solutions
@debasishg
debasishg / gist:8172796
Last active April 16, 2025 13:43
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
package demo;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
@vincentbernat
vincentbernat / elasticsearch.erl
Created November 30, 2013 07:50
Erlang module for Tsung to extract data from large files
-module(elasticsearch).
-export([autocomplete/1, autocomplete/0,
search/1, search/0,
start/0,
stop/0,
loop/1]).
-on_load(start/0).
%% Return an autocomplete request
autocomplete() ->
@guenter
guenter / Main.scala
Last active September 17, 2020 11:25
A simple Mesos "Hello World": downloads and starts a Python web server on every node in the cluster.
import mesosphere.mesos.util.FrameworkInfo
import org.apache.mesos.MesosSchedulerDriver
/**
* @author Tobi Knaup
*/
object Main extends App {
@monperrus
monperrus / Apriori.java
Last active August 11, 2023 11:22
Java implementation of the Apriori algorithm for mining frequent itemsets
import java.io.*;
import java.util.*;
/** The class encapsulates an implementation of the Apriori algorithm
* to compute frequent itemsets.
*
* Datasets contains integers (>=0) separated by spaces, one transaction by line, e.g.
* 1 2 3
* 0 9
* 1 9
@wpm
wpm / spark_parallel_boost.py
Last active December 3, 2018 02:56
A simple example of how to integrate the Spark parallel computing framework and the scikit-learn machine learning toolkit. This script randomly generates test and train data sets, trains an ensemble of decision trees using boosting, and applies the ensemble to the test set. The ensemble training is done in parallel.
from pyspark import SparkContext
import numpy as np
from sklearn.cross_validation import train_test_split, Bootstrap
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
def run(sc):
@visenger
visenger / install_scala_sbt.sh
Last active January 31, 2023 19:10
Scala and sbt installation on ubuntu 12.04
#!/bin/sh
# one way (older scala version will be installed)
# sudo apt-get install scala
#2nd way
sudo apt-get remove scala-library scala
wget http://www.scala-lang.org/files/archive/scala-2.11.4.deb
sudo dpkg -i scala-2.11.4.deb
sudo apt-get update