Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
@casebeer
casebeer / ema_gen.py
Last active July 6, 2022 07:54
Exponential moving average generator example in Python
def consumer(func):
'''
Decorator taking care of initial next() call to "sending" generators
From PEP-342
http://www.python.org/dev/peps/pep-0342/
'''
def wrapper(*args,**kw):
gen = func(*args, **kw)
next(gen)
import numpy as np
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import six
class MarisaCountVectorizer(CountVectorizer):
# ``CountVectorizer.fit`` method calls ``fit_transform`` so
# ``fit`` is not provided
def fit_transform(self, raw_documents, y=None):
@bsweger
bsweger / useful_pandas_snippets.md
Last active October 6, 2025 13:44
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@acolyer
acolyer / service-checklist.md
Last active September 24, 2025 07:57
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?
@greenwoodma
greenwoodma / mp_twitter_accounts.csv
Last active September 10, 2015 10:30 — forked from iaincollins/mp_twitter_accounts.csv
A hand corrected and updated list of MP twitter accounts and DBpedia pages, produced as part of a Nesta funded project.
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
"full_name";"party";"official_post";"constituency";"twitter_handle";"twitter_user_id";"uri";"last_updated";"notes"
"Ms Diane Abbott MP";"Labour";;"Hackney North and Stoke Newington";"https://twitter.com/HackneyAbbott";153810216;"http://dbpedia.org/resource/Diane_Abbott";"2014-10-18T10:04:00+01:00";
"Debbie Abrahams MP";"Labour";;"Oldham East and Saddleworth";"https://twitter.com/Debbie_abrahams";225857392;"http://dbpedia.org/resource/Debbie_Abrahams";"2014-10-18T10:04:00+01:00";
"Nigel Adams MP";"Conservative";;"Selby and Ainsty";"TWITTER_UNKNOWN";-1;"http://dbpedia.org/resource/Nigel_Adams";"2014-10-18T10:04:00+01:00";
"Adam Afriyie MP";"Conservative";;"Windsor";"https://twitter.com/AdamAfriyie";22031058;"http://dbpedia.org/resource/Adam_Afriyie";"2014-10-18T10:04:00+01:00";
"Rt Hon Bob Ainsworth MP";"Labour";;"Coventry North East";"TWITTER_UNKNOWN";-1;"http://dbpedia.org/resource/Bob_Ainsworth";"2014-10-18T10:04:00+01:00";
"Peter Aldous MP";"Conservative";;"Waveney";"https://twitter.com/peter_aldous";255998
@randyzwitch
randyzwitch / aa-ggplot2.R
Created November 7, 2014 02:08
Adobe Analytics Anomaly Detection ggplot
#Plot data using ggplot2
library(ggplot2)
#Calculate points crossing UCL or LCL
pageviews_w_forecast$outliers <-
ifelse(pageviews_w_forecast$pageviews > pageviews_w_forecast$upperBound.pageviews, pageviews_w_forecast$pageviews,
ifelse(pageviews_w_forecast$pageviews < pageviews_w_forecast$lowerBound.pageviews, pageviews_w_forecast$pageviews, NA))
#Add LCL and UCL labels
LCL <- vector(mode = "character", nrow(pageviews_w_forecast))
@Newmu
Newmu / model.py
Created December 11, 2014 01:14
~0.96 on Kaggle IMDB using stupid learning instead of "deep learning"
import numpy as np
import pandas as pd
from lxml import html
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LogisticRegression as LR
from sklearn.feature_extraction.text import TfidfVectorizer
def clean(text):
return html.fromstring(text).text_content().lower().strip()
@dbamman
dbamman / gist:e7da85f8ee7d7b76061f
Last active August 29, 2015 14:12
PCA on random walk data
# generate 100-dimensional random walk data so that each data point in a sequence is similar to the last data point
import numpy as np
last=np.random.normal(0, .1, 100)
for i in range(1000):
new=last+np.random.normal(0, .1, 100)
last=new
print ' '.join(str(x) for x in new)