Skip to content

Instantly share code, notes, and snippets.

View ghl3's full-sized avatar

George Lewis ghl3

View GitHub Profile
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMCPYpv9eNlyQxJ5UvUBEvflePCnR5AAglmjH4oOa8nEa5FJ+dHUBRRWwaJ8Vho0KHKplLZoFeYhsvOn/NrlRKjN+h4OAEEPLdPSYKEpid4f4LoUj2vTuqbKC7EK67z7m6nrZw18+vWY13N/pxq0y3APV3qL8YJ+a3m5FRmdpGdgsJVoJEsFIZKgUmAL9NcrlYl3OPGcv24EJ0Wg38RcOv7pVQtfBCM4n5rUUGqVbbRoMmNWZnHDChFEooEruGnVA0jT0gaE7zXpVCbXSV5ULLPZ7K7ZjqR7PpG0SHy/q2xE5AZnK+1YSCKcfUajrEjgAZKo05+IB0ZAVefoc+AWz1 [email protected]
@ghl3
ghl3 / convert_nominal.py
Created April 20, 2014 00:57
Convert nominal (string, object) features in a pandas dataframe to integers
import pandas as pd
def get_nominal_integer_dict(nominal_vals):
d = {}
for val in nominal_vals:
if val not in d:
current_max = max(d.values()) if len(d) > 0 else -1
d[val] = current_max+1
return d
@ghl3
ghl3 / grouping.md
Last active August 29, 2015 13:55
pandas tricks
@ghl3
ghl3 / gist:6473103
Created September 7, 2013 05:46
Random String
import random
import string
def random_string():
N = random.randint(1, 10)
return ''.join(random.choice(string.ascii_letters + string.digits) for x in range(N))
@ghl3
ghl3 / gist:6369556
Last active December 21, 2015 21:39
PSQL to CSV
/* Copy to a file as a csv */
Copy (SELECT * FROM foo) To '/tmp/test.csv' With CSV;
/* Copy to std::out as tab separated values */
COPY (SELECT * FROM foo) TO STDOUT WITH DELIMITER E'\t' CSV HEADER;
@ghl3
ghl3 / gist:5605126
Created May 18, 2013 17:01
Get next matplotlib subplot
def get_next_subplot():
"""
Create and move to the next subplot
in a grid of matplotlib subplots
"""
# Get the current plot
ax = plt.gca()
# Check if the current plot is a subplot
@ghl3
ghl3 / gist:5407006
Created April 17, 2013 19:25
Remove ^M using emacs
M-x replace-string RET C-q C-m RET RET
@ghl3
ghl3 / gist:5201220
Created March 19, 2013 23:50
Get set of keys from an hstore
SELECT DISTINCT k FROM (SELECT skeys(our_database) as k FROM risk_profile) as dt;
@ghl3
ghl3 / pandas_plotting.py
Last active December 14, 2015 18:09
Standardized plotting of features for pandas data frame.
def scatter_plots(df, class_title, feature_names=None, cmap=None):
# Create a pandas group view for each class
class_names = list(set(df[class_title]))
groups = [df[(df[class_title] == name)] for name in class_names]
if feature_names==None:
feature_names = [col for col in df.columns if col != class_title]
NUM_COLORS = len(class_names)
@ghl3
ghl3 / gist:4611323
Created January 23, 2013 18:30
Open a local file or, if it doesn't exist, grab data from a url, cache it as a local file, and then open that file
data_url = "https://raw.github.com/pydata/pydata-book/master/ch02/usagov_bitly_data2012-03-16-1331923249.txt"
data_path = "usagov_bitly_data2012-03-16-1331923249.txt"
try:
data = open(data_path)
except IOError:
request = urllib2.Request(data_url)
data = urllib2.urlopen(request).read()
with open(data_path, "w+") as data_file:
data_file.write(data)