Skip to content

Instantly share code, notes, and snippets.

@jxnl
Last active August 29, 2015 14:10
Show Gist options
  • Save jxnl/69f80560a8898375b02b to your computer and use it in GitHub Desktop.
Save jxnl/69f80560a8898375b02b to your computer and use it in GitHub Desktop.
Want to speak with me? I'm jason at jxnl.co

# First things first, Fizzbuzz
for i in range(1, 101): print "Fizz" * (not i % 3) + "Buzz" * (not i % 5) or i

# Or if you want... Simpson's Rule
def simpson(a, b, f, N):
    return (1.0 / 3.0) * (2 * (((b - a) / N) * sum(f(v) for v in [i * ((b - a) \
    / (2. * N)) for i in range(2 * N + 1)][1:2 * N + 1:2])) + (((b - a) / N) * \
    ((f(a) + f(b)) / 2.0 + sum (f(v * ((b - a) / N) + a) for v in xrange(1, N)))
    ))
    
I swear I write good code.

Work Experience

Hack the North -- Data Scientist

Sole data scientist working on the Hack the North team.

  • Exploratory data analysis and visulation to study the interests of hackathon participants to improve upcoming event.
  • Designed experiments on various biases that might exist in the application/judging process.
  • Created reports and summary statistics of the event and it's participants and gave suggestions to the directors.

NYU Global Institute of Public Health -- Research Intern (Current Position)

Supervisor: Dr. Rumi Chunara

Currently working on a paper discussing the use of machine learning models to find patterns of alcohol abuse on social media.

  • Exploratory analysis using Python and Gensim for topic modeling.
  • Built complex preprocessing and ingestion pipeline for machine learning with scikit-learn and Gensim.
  • Developed informative ipython notebooks that outline and document the body of work produced during the research project.
  • Using Amazon Mechanical Turk and crowdsource techniques to develop training data from raw twitter firehose.

Sysomos -- Data Scientist

  • Prototyped a proof of concept advertisement recommendation platform for offline targeted audience generation.
  • Developed extensible interfaces to our community detection and k-armed bandit service layer.
  • Improved clustering performance for twitter community detection on Sysomos MAP and Heartbeat.
  • Created MapReduce and Spark applications for Audience generation and various ad-hoc ETL.

Education

University of Waterloo 3B Honors B.Math, transfered from Mathematical Physics Computational Mathematics, Statistics Minor CO-OP

  • Applied Probability&Statistics, Linear Modeling, Data Visualization, Data Structures, Linear Algebra, Real Analysis, Computational Physics.

MOOCS (Coursera)

  • Machine Learning, from Andrew Ng
  • Data Science, from Bill Howe
  • Natural Language Processing, from Dan Jurafsky
  • Probabistic Graphical Models, from Daphne Koller

Projects

Mark Sweep is a collection of services for automating Facebook group moderation. Prototype is currently being rewritten into a service oriented design for reuseability in other applications.

  • Won "Best use of Machine Learning" and placed Top20 at PennAppsX.
  • Weighed reservoir sampling was used for group based topic classification as a novel way to capture recency.
  • Topic/Troll detection done using using sklearn's SVC bag of word features, and watchwords.
  • Spam classification using hand engineered features.
Freelance
  • Provided various data services including web crawling, data analysis, and consulting.
  • Maintains 70% paid clientwork and 30% pro bono non profit.
Experiments
  • Itpy -- Lazy evaluated list processing with chained transformations such as streaming variance, groupbys, map, filter and more.
  • Reservoir -- Python module for uniform, exponential, and weighted reservoir sampling.
  • Bandit -- Java Implementation of various bandit algorithms.
  • Data Mining Canadian Goverment press releases (2002-2015) to uncover potential data journalism stories.

internet me : quora // github // dribbble // linkedin // twitter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment