Skip to content

Instantly share code, notes, and snippets.

@thinrhino
thinrhino / transactions_subset.py
Created April 20, 2014 19:19
Kaggle: Acquire Valued Shoppers Challenge: reducing the dataset from 22GB to 1GB
import pandas
df = pandas.read_csv('offers.csv.gz', compression='gzip')
categories = df.category.tolist()
subset = open('subset.csv', 'w')
fl = open('transactions.csv', 'r')
fl.readline()
while True:
l = fl.readline()
if l == '':
@thinrhino
thinrhino / raw_data_mixpanel.py
Created May 2, 2014 10:18
A piece of code to retrieve raw data from mixpanel and dump into a bucket on AWS S3
"""
Code to download and upload raw data from mix-panel
"""
import hashlib
import datetime
import time
import tempfile
import os
import bz2
# Script to populate data into MongoDB
import twitter
import time
import logging
from pymongo import MongoClient
CONSUMER_KEY = '<twitter_consumer_key>'
CONSUMER_SECRET = '<twitter_secret_key>'
@thinrhino
thinrhino / send_email.py
Created May 2, 2014 10:34
Send mails using Mandrill API
from mandrill import Mandrill
import base64
mail_client = Mandrill('<api_key>')
frm_email = '[email protected]'
frm_name = 'Given Name'
# Sending image as attachment
img_attachment = base64.b64encode(open('~/sample_image.jpg', 'rb').read())
@thinrhino
thinrhino / zipfs.py
Created May 5, 2014 13:07
Plot zipf's law
from collections import defaultdict
import matplotlib.pyplot as plt
data = open('<data_file>', 'r')
r_data = []
# reading relevant data
while True:
l = data.readline()
if l == '':
@thinrhino
thinrhino / tf-idf.py
Created May 5, 2014 13:09
Calculate tf-idf
# ref: http://www.tfidf.com/
# Example:
# Consider a document containing 100 words wherein the word cat appears 3 times.
# The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we
# have 10 million documents and the word cat appears in one thousand of these.
# Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4.
# Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12.
#
# Hence:
# 1. Calculate term frequency
@thinrhino
thinrhino / vagrant-scp
Last active August 29, 2015 14:07 — forked from geedew/vagrant-scp
#!/bin/sh
# Change these settings to match what you are wanting to do
FILE=/File/To/Copy
SERVER=localhost
PATH=/Where/To/Put/File
OPTIONS=`vagrant ssh-config | awk -v ORS=' ' '{print "-o " $1 "=" $2}'`
scp ${OPTIONS} $FILE vagrant@$SERVER:$PATH
$(document).ready(function() {
$(chart_id).highcharts({
chart: chart,
title: title,
xAxis: xAxis,
yAxis: yAxis,
series: series
});
});
@thinrhino
thinrhino / gae_shell.py
Last active August 29, 2015 14:13 — forked from djm/gae_shell.py
#!/usr/bin/env python -i
"""
A local interactive IPython shell for Google App Engine on Mac OSX.
Usage:
cd /to/project/folder/with/app.yaml
python gae_shell.py
Notes:
@thinrhino
thinrhino / emacs
Last active August 29, 2015 14:13
Enable mouse in emacs
;; Enable mouse support
(unless window-system
(require 'mouse)
(xterm-mouse-mode t)
(global-set-key [mouse-4] '(lambda ()
(interactive)
(scroll-down 1)))
(global-set-key [mouse-5] '(lambda ()
(interactive)
(scroll-up 1)))