Skip to content

Instantly share code, notes, and snippets.

@mittenchops
mittenchops / alldigits.py
Created February 12, 2014 19:45
regexes for a number with an optional decimal place+decimal digits
# It took 15 tries.
digits = re.findall(r'[0-9]+', cleaned_text)
digits2 = re.findall(r'[0-9]\d*(\.\d+)?', cleaned_text)
digits3 = re.findall(r'[0-9]+((\.([0-9]+))?', cleaned_text)
digits4 = re.findall(r'[0-9]+(\.)?([0-9]+)?', cleaned_text)
digits5 = re.findall(r'\d+\.?\d*?', cleaned_text)
digits6 = re.findall(r'\d+(\.\d*)?', cleaned_text)
digits7 = re.findall(r'\d+(\.?\d*)?', cleaned_text)
digits8 = re.findall(r'\d+(\.?\d*)', cleaned_text)
digits9 = re.findall(r'\d+(\.{1}\d*)?', cleaned_text)
@mittenchops
mittenchops / htmlclean.py
Created February 9, 2014 18:45
HTML Cleaner
import requests
from lxml.html.clean import Cleaner
url = "http://en.wikipedia.org/wiki/Zipf%27s_law"
html = requests.get(url).text
cleaner = Cleaner(allow_tags=[''], remove_unknown_tags=False, remove_tags=['<div>','</div>'])
cleaner.scripts = True
cleaner.page_structure = True
cleaner.javascript = True
cleaner.style = True
@mittenchops
mittenchops / rsplit.R
Created February 6, 2014 21:57
rsplit for R
rsplit <- function(mydf, chr){
sapply(sapply(mydf, strsplit,chr, USE.NAMES=F),function(x){x[length(x)]})
}
# USAGE
# val <- rsplit(df$long_url,"/")
@mittenchops
mittenchops / count_hist.sh
Created February 4, 2014 20:45
Count unique urls where url is the first field in a json, get the top 10 repeats
cat file.json | awk '{print $2}' | sort | uniq -c | sort -rn | head
@mittenchops
mittenchops / to_acronym.sh
Last active January 4, 2016 01:58
sed to turn text to acronyms
# from http://ask.metafilter.com/255675/Decoding-cancer-addled-ramblings
sed 's/\B\w*//g;s/\s//g' full_file.txt > acronym_file.txt
@mittenchops
mittenchops / emacs.txt
Last active January 2, 2016 00:59
emacs review!
# navigation
C-a # beginning of line
C-e # end of line
M-> # eof
M-< # head of file
M-g g 100 # goto line 100
# general
C-h f thing-name # describe a function named thing-name
M-x load-library RET icicles RET
@mittenchops
mittenchops / defaultdictionaries.py
Created December 20, 2013 17:04
Function defaults done correctly in python
"""
# In addition, the use of mutable objects as default values may lead to unintended behavior:
def foo(x, items=[]):
items.append(x)
return items
foo(1) # returns [1]
foo(2) # returns [1, 2]
foo(3) # returns [1, 2, 3]
@mittenchops
mittenchops / phoneparser.py
Created October 11, 2013 20:13
Useful phone number parser. Clean to USA!
# all straight from here:
# https://github.com/daviddrysdale/python-phonenumbers
import phonenumbers
def numberizer(num):
z = phonenumbers.parse(num,"US")
return(phonenumbers.format_number(z,phonenumbers.PhoneNumberFormat.NATIONAL))
@mittenchops
mittenchops / argmax.py
Last active December 25, 2015 07:19
argmax function: return the entry from list X where the argument arg is at a maximum.
def argmax(arg, X):
"""
>>> zee = [{"cool":{"stuff":1,"things":0.5}},{"cool":{"stuff":2,"things":0.25}}]
>>> argmax("cool.stuff",zee)
{'cool': {'things': 0.25, 'stuff': 2}}
>>> argmax("cool.things",zee)
{'cool': {'things': 0.5, 'stuff': 1}}
"""
ranked = sorted(X, key=lambda x: -getByDot(x,arg))
leader = filter(lambda x: getByDot(x,arg) == reduce(max,[getByDot(r, arg) for r in ranked]),ranked)[0]
@mittenchops
mittenchops / unicodesolved.py
Created October 9, 2013 19:47
encoding fixed everywhere. Start every file with this to ensure that both interactive and script-mode always work.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
######################################
#### THIS IS IMPORTANT EVERYWHERE ####
import sys #
reload(sys) #
sys.setdefaultencoding("UTF-8") #
#sys.getdefaultencoding() #
######################################