Skip to content

Instantly share code, notes, and snippets.

View danish-rehman's full-sized avatar

danish-rehman

View GitHub Profile
@danish-rehman
danish-rehman / ROLLING_COUNT_STATS.md
Last active July 14, 2016 06:30
Data rollups guidelines

Rolling count of stats using pyspark

CQL Query

CREATE TABLE rollups_min (       
      event_min text,       
      time timestamp,       
      value int,       
 PRIMARY KEY (event_min, time) 
@danish-rehman
danish-rehman / Tweet Stream using pyspark.md
Last active July 19, 2017 19:49
Spark Cassandra Live Tweet Example 1

Count stats for twitter stream and store in Cassandra

cd $SPARK_HOME

/bin/spark-submit --packages TargetHolding/pyspark-cassandra:0.3.5 /Users/drehman/Apps/workspace/spark_cassandra_stream_example.py

python twitter_rolling_count.py -q data -d data 2>&1 | nc -lk 10.0.0.235 9999
@danish-rehman
danish-rehman / ISSUES.md
Last active July 5, 2016 09:08
Spark : Issues and fixes
ISSUE: ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay
FIX: Whenever the driver stops, this message will be their. Increase the timeout to see if it happend pre-maturely.
ISSUE: Whenever the driver stops, this message will be their.
FIX: Increase the timeout to see if it happend pre-maturely.
ISSUE: Cannot obtain a new communication channel pyspark
@danish-rehman
danish-rehman / COMPILE_INSTALL_pyspark_cassandra_connector.md
Last active July 14, 2016 06:33
Pyspark Cassandra build and dist

Compile and install pyspark-cassandra connector.

#INSTALL JDK and EXPORT JAVA_HOME

brew install sbt
sbt compile 
sbt spPublishLocal
make dist
@danish-rehman
danish-rehman / RUN.md
Last active July 4, 2016 14:23
Cassandra spark connector
cd $SPARK_HOME
./bin/spark-submit --packages TargetHolding/pyspark-cassandra:0.3.5 /Users/drehman/Apps/workspace/spark_cassandra_example.py
@danish-rehman
danish-rehman / INSTALLATION.md
Last active July 27, 2016 19:49
Cassandra : Standalone setup on development box
@danish-rehman
danish-rehman / driver.py
Created July 2, 2016 21:09
Spark : Disable INFO log for Spark
log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
@danish-rehman
danish-rehman / Cmd1.md
Last active July 2, 2016 09:10
Spark : Bot stream

python bot_stream.py 2>&1 | nc -lk 127.0.0.1 9999

@danish-rehman
danish-rehman / config.py
Created July 2, 2016 08:01
Twitter Stream using tweepy
consumer_key = 'your-consumer-key'
consumer_secret = 'your-consumer-secret'
access_token = 'your-access-token'
access_secret = 'your-access-secret'