Skip to content

Instantly share code, notes, and snippets.

View bugcy013's full-sized avatar
🪄
Focusing

Dhanasekaran Anbalagan bugcy013

🪄
Focusing
View GitHub Profile
#!/usr/bin/env bash
useradd -r -U -s /bin/false -g storm storm
brew update
# install xcode 4.6.1
brew install storm
brew install leiningen
git clone https://github.com/nathanmarz/storm-starter.git
cd storm-starter
lein deps
lein compile
java -cp $(lein classpath) storm.starter.ExclamationTopology
curl -O http://mirror.cc.columbia.edu/pub/software/apache/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz
ssh IP -ldeploy
su root
/usr/sbin/visudo
93gg
deploy ALL=(ALL) NOPASSWD: ALL
We encountered some issues using python processes with storm.
1) Storm uses the stdin and stdout of the python processes to communicate with them. This makes them almost impossible to debug with pdb.set_trace().
2) The local topology (and the real storm to a lesser extent) has issues killing the Python processes it creates. Some of our rouge processes pin the CPU. I find myself running something like this on my local machine.
ps aux | grep python | awk '{ print $2 }' | xargs kill
3) We didn't want to install python packages in the system packages because it adds dependencies on the system storm is running on and we also have different topologies with different bolts that depend of different versions of packages. We install the dependencies (with buildout or virtualenv) in a relative path before creating the uberjar and deploying it, or you can also make the script storm calls a shell script that installs the dependencies locally then runs the python entry point. Either way, it is a little painful, espe
@bugcy013
bugcy013 / why.sh
Created January 1, 2014 22:04 — forked from l1x/why.sh
#why dont you go and do something else instead of writing software? thank you!!
while read -r job_id status start_time user job_name priority _ _ used_mem _ need_mem _ ; do
echo "$job_id $status $user $used_mem $(date -d@$(( $start_time / 1000 )))" ;
done < <(mapred job -list all 2>/dev/null | egrep RUNN) | sort -k 4 -n
#JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info
#gitz
git clone https://blah
git checkout -b local_branch origin/remote_branch
git add files
git commit -m"What the hack is this?"
git push origin local_branch #this will create a remote branch with the name "local_branch"
#submit the pull request using the UI
To grab and build, run the following:
~> git clone https://github.com/QwertyManiac/mahout-cdh4.git
~> cd mahout-cdh4/
~> mvn -Phadoop-0.23 -DskipTests -Dhadoop.version=2.0.0-mr1-cdh4.4.0 -Dmahout.skip.distribution=false clean package
~> cd distribution/target
~> ls -l mahout-distribution-0.8.tar.gz
The resulting release can be found under distribution/target directory after the build succeeds.
git clone git://github.com/QwertyManiac/cassandra-cdh4.git
cd cassandra-cdh4
ant publish -Dversion=1.2.2
cp build/apache-cassandra-1.2.2-bin.tar.gz ~/
# The ~/apache-cassandra-1.2.2-bin.tar.gz is now the release build binaries, and you can use this to deploy/use Cassandra with CDH4
http://google.com/search?{google:RLZ}{google:acceptedSuggestion}{google:originalQueryForSuggestion}sourceid=chrome&ie={inputEncoding}&q=%s
#!/bin/sh
# one way
# sudo apt-get install scala
#2nd way
wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
tar zxf scala-2.10.3.tgz
sudo mv scala-2.10.3 /usr/share/scala