Skip to content

Instantly share code, notes, and snippets.

View emaadmanzoor's full-sized avatar

Emaad Manzoor emaadmanzoor

View GitHub Profile
@emaadmanzoor
emaadmanzoor / ExpandEdinburghFSDCorpus.md
Last active October 31, 2020 20:30
Expand the Edinburgh Twitter FSD corpus

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

  • Respect the Twitter API rate limits and throttle API hits.
  • Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
  • Parse the API response and dump it into the correct column in the sqlite3 database.
  • Gracefully handle exceptions while acquiring tweets from the API.
  • Wrap version 1.1 of the Twitter API.
  • Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.
@emaadmanzoor
emaadmanzoor / freivald.py
Created September 9, 2013 13:32
Frievald's Algorithm
import random
import operator
t = int(raw_input())
randint = random.randint
def deterministic(a,b,c,n):
no = 0
for p in xrange(n):
for q in xrange(n):
@emaadmanzoor
emaadmanzoor / AttentionPotentialValidation.md
Last active August 29, 2015 14:18
Attention Potential Validation Code

See the project website for more details.

Please report any issues to [email protected].

Correlation Results

The attention potential (as estimated in section 4), when evaluated on this Twitter dataset:

  • Is 73.61% correlated with the retweets obtained.
  • Is significantly correlated (p < 0.05).
@emaadmanzoor
emaadmanzoor / QuantifyingMonotonyAversion.md
Last active August 29, 2015 14:18
Quantifying Monotony Aversion

See the project website for more details.

Please report any issues to [email protected].

Execution

Running this requires having the following files in the same directory as calculate_cluster_statistics.py:

  • all_links.p
  • all_tweets.p
@emaadmanzoor
emaadmanzoor / 00-StreamSpot-Bootstrap-Clusters.md
Last active February 18, 2016 01:18
StreamSpot Bootstrap Clusters

StreamSpot Bootstrap Clusters

www3.cs.stonybrook.edu/~emanzoor/streamspot/

Below are the bootstrap clusters used for the experiments in the StreamSpot paper for each of following datasets:

  • all (01-C50_k10_all.txt): Chunk length of 50, 10 clusters.
  • ydc (02-C25_k5_ydc.txt): Chunk length of 25, 5 clusters.
  • gfc (03-C50_k5_gfc.txt): Chunk length of 50, 5 clusters.
#!/usr/bin/env python
# Copyright 2016 Emaad Ahmed Manzoor
# License: Apache License, Version 2.0
# http://www.eyeshalfclosed.com/blog/2016/07/22/spark-streaming-statistics/
"""
Get Spark Streaming microbatch statistics:
- Batch start time
- Scheduling delay (in seconds) for each microbatch
@emaadmanzoor
emaadmanzoor / 95-865-Model_Evaluation_Demo.md
Last active February 16, 2018 16:16
95865 Model Evaluation Demo
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
[Unit]
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=gb760
ExecStart=/bin/sh -c '/home/gb760/kafka/bin/kafka-server-start.sh /home/gb760/kafka/config/server.properties > /home/gb760/kafka/kafka.log 2>&1'
ExecStop=/home/gb760/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal