Skip to content

Instantly share code, notes, and snippets.

@neilkod
neilkod / elephant bird demo.pig
Created June 8, 2012 22:30
elephant bird demo.pig
sample pig script, runs fine in local mode. the elephantbird magic is the JsonLoader() in the LOAD command and then
converting user to a java map so that i can extract screen_name. I haven't read the docs yet but there may be a better way to do this. I'm sure I can combine the two generate statements into one, this is just a first attempt.
REGISTER '/Users/nkodner/Downloads/cdh3/elephant-bird/build/elephant-bird-2.2.4-SNAPSHOT.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/contrib/piggybank/java/lib/json-simple-1.1.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/build/ivy/lib/Pig/guava-r06.jar';
raw = LOAD '/Users/nkodner/clean_tweets/with_deletedaa' using com.twitter.elephantbird.pig.load.JsonLoader();
bah = limit raw 100;
cc = foreach bah generate (chararray)$0#'text' as text,(long)$0#'id' as id,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'user') as user;
dd = foreach cc generat
@zoltanctoth
zoltanctoth / gist:5528402
Last active April 9, 2018 11:30
How to install twitter's elephant-bird on EMR
# Get a proper Maven
wget http://xenia.sote.hu/ftp/mirrors/www.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
tar xzf apache-maven-3.0.5-bin.tar.gz
export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH
echo 'export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH' >> ~/.bash_profile
# Install a supported version of protobuf
sudo apt-get remove protobuf-compiler
wget https://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
tar xzf protobuf-2.4.1.tar.gz
@miketheman
miketheman / zook_grow.md
Created July 22, 2013 21:36
Adding nodes to a ZooKeeper ensemble

Adding 2 nodes to an existing 3-node ZooKeeper ensemble without losing the Quorum

Since many deployments may start out with 3 nodes and so little is known about how to grow a cluster from 3 memebrs to 5 members without losing the existing Quorum, here is an example of how this might be achieved.

In this example, all 5 nodes will be running on the same Vagrant host for the purpose of illustration, running on distinct configurations (ports and data directories) without the actual load of clients.

YMMV. Caveat usufructuarius.

Step 1: Have a healthy 3-node ensemble

@debasishg
debasishg / gist:8172796
Last active April 20, 2025 12:45
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@vdel26
vdel26 / nginx
Last active March 16, 2023 20:31
Openresty init.d script
#!/bin/sh
#
# chkconfig: 2345 55 25
# Description: Nginx init.d script, put in /etc/init.d, chmod +x /etc/init.d/nginx
# For Debian, run: update-rc.d -f nginx defaults
# For CentOS, run: chkconfig --add nginx
#
### BEGIN INIT INFO
# Provides: nginx
# Required-Start: $all
@wangruohui
wangruohui / Install NVIDIA Driver and CUDA.md
Last active April 2, 2025 07:38
Install NVIDIA Driver and CUDA on Ubuntu / CentOS / Fedora Linux OS