Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
@iaincollins
iaincollins / Bills with tags
Last active August 29, 2015 14:14
Combining UK Parliament data + NLP + BBC Things in node.js to tag Bills by topic
Tags for Employment Practices Bill:
{ 'Scottish Parliament':
{ label: 'Scottish Parliament',
hint: 'The Scottish Parliament is the devolved national, unicameral legislature of Scotland, located in the Holyrood area of the capital, Edinburgh. ',
uri: 'http://www.bbc.co.uk/things/59ab9b46-cb29-4394-bea7-59b2d6c74bc2#id',
properties: [Function] },
Wales:
{ label: 'Wales',
hint: 'a nation of the United Kingdom of Great Britain and Northern Ireland',
uri: 'http://www.bbc.co.uk/things/00eb010f-568a-4b89-bbfe-799d5b812bed#id',
# aggregate 100 random walks, with
# different start points:
# in each case take a walk of ten steps
# and add a 100-dimensional vector
# to an aggregator (allwalks)
# that has ten different entries,
# for the ten possible steps of each
# walk
@dbamman
dbamman / gist:e7da85f8ee7d7b76061f
Last active August 29, 2015 14:12
PCA on random walk data
# generate 100-dimensional random walk data so that each data point in a sequence is similar to the last data point
import numpy as np
last=np.random.normal(0, .1, 100)
for i in range(1000):
new=last+np.random.normal(0, .1, 100)
last=new
print ' '.join(str(x) for x in new)
@Newmu
Newmu / model.py
Created December 11, 2014 01:14
~0.96 on Kaggle IMDB using stupid learning instead of "deep learning"
import numpy as np
import pandas as pd
from lxml import html
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LogisticRegression as LR
from sklearn.feature_extraction.text import TfidfVectorizer
def clean(text):
return html.fromstring(text).text_content().lower().strip()
@randyzwitch
randyzwitch / aa-ggplot2.R
Created November 7, 2014 02:08
Adobe Analytics Anomaly Detection ggplot
#Plot data using ggplot2
library(ggplot2)
#Calculate points crossing UCL or LCL
pageviews_w_forecast$outliers <-
ifelse(pageviews_w_forecast$pageviews > pageviews_w_forecast$upperBound.pageviews, pageviews_w_forecast$pageviews,
ifelse(pageviews_w_forecast$pageviews < pageviews_w_forecast$lowerBound.pageviews, pageviews_w_forecast$pageviews, NA))
#Add LCL and UCL labels
LCL <- vector(mode = "character", nrow(pageviews_w_forecast))
@greenwoodma
greenwoodma / mp_twitter_accounts.csv
Last active September 10, 2015 10:30 — forked from iaincollins/mp_twitter_accounts.csv
A hand corrected and updated list of MP twitter accounts and DBpedia pages, produced as part of a Nesta funded project.
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
"full_name";"party";"official_post";"constituency";"twitter_handle";"twitter_user_id";"uri";"last_updated";"notes"
"Ms Diane Abbott MP";"Labour";;"Hackney North and Stoke Newington";"https://twitter.com/HackneyAbbott";153810216;"http://dbpedia.org/resource/Diane_Abbott";"2014-10-18T10:04:00+01:00";
"Debbie Abrahams MP";"Labour";;"Oldham East and Saddleworth";"https://twitter.com/Debbie_abrahams";225857392;"http://dbpedia.org/resource/Debbie_Abrahams";"2014-10-18T10:04:00+01:00";
"Nigel Adams MP";"Conservative";;"Selby and Ainsty";"TWITTER_UNKNOWN";-1;"http://dbpedia.org/resource/Nigel_Adams";"2014-10-18T10:04:00+01:00";
"Adam Afriyie MP";"Conservative";;"Windsor";"https://twitter.com/AdamAfriyie";22031058;"http://dbpedia.org/resource/Adam_Afriyie";"2014-10-18T10:04:00+01:00";
"Rt Hon Bob Ainsworth MP";"Labour";;"Coventry North East";"TWITTER_UNKNOWN";-1;"http://dbpedia.org/resource/Bob_Ainsworth";"2014-10-18T10:04:00+01:00";
"Peter Aldous MP";"Conservative";;"Waveney";"https://twitter.com/peter_aldous";255998
@acolyer
acolyer / service-checklist.md
Last active September 24, 2025 07:57
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?
@bsweger
bsweger / useful_pandas_snippets.md
Last active October 6, 2025 13:44
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

import numpy as np
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import six
class MarisaCountVectorizer(CountVectorizer):
# ``CountVectorizer.fit`` method calls ``fit_transform`` so
# ``fit`` is not provided
def fit_transform(self, raw_documents, y=None):