Skip to content

Instantly share code, notes, and snippets.

View neilkod's full-sized avatar

neil kodner neilkod

View GitHub Profile
bash-3.1$ cat test.py
import gzip
datafile="/var/opt/sports_dw/dev/nz_sports/done/flat_customer_sport_preference_fct1.out.gz"
opn=gzip.open(datafile)
for i, line in enumerate(opn):
pass
print i+1
bash-3.1$ time python test.py
[nkodner@c1 ~]$ sh -x measure_awk_performance.sh flat_community_cumulative_week_fct1.out.gz.bak
+ FILENAME=flat_community_cumulative_week_fct1.out.gz.bak
+ ls -ltrh /var/opt/sports_dw/live/nz_sports/done/flat_community_cumulative_week_fct1.out.gz.bak
-rwxrwxrwx 1 dwsports dwsports 4.6G Jul 21 10:16 /var/opt/sports_dw/live/nz_sports/done/flat_community_cumulative_week_fct1.out.gz.bak
+ echo 'zcat > /dev/null'
zcat > /dev/null
+ zcat /var/opt/sports_dw/live/nz_sports/done/flat_community_cumulative_week_fct1.out.gz.bak
real 3m49.037s
user 3m44.276s
@neilkod
neilkod / gist:1442908
Created December 7, 2011 14:01
tee + named pipe to gunzip and count number of lines in a file
attempt 1 - 3gb file
-rw-r--r-- 1 dwsports dwsports 3.0G Aug 25 16:58 flat_customer_dim1.out.gz
# count # of lines
-bash-3.1$ time gunzip -c flat_customer_dim1.out.gz | wc -l
39422185
real 1m53.595s
user 1m44.468s
@neilkod
neilkod / logging_with_date.py
Created November 30, 2011 14:32
how do i log to both stdout and to a file?
bash-3.1$ cat logging_with_date.py
import logging
logging_format = '%(asctime)s %(levelname)s:%(message)s'
logging.basicConfig(format=logging_format, filename = 'myprogram.log', level=logging.DEBUG)
logging.debug('is when %s event was logged.', 'out of memory')
logging.warning('is when %s event was logged.', 'out of memory')
logging.info('is when %s event was logged.', 'out of memory')
logging.critical('is when %s event was logged.', 'out of memory')
bash-3.1$ python logging_with_date.py
@neilkod
neilkod / gist:1386098
Created November 22, 2011 16:30
what my ~ says about me, besides the fact that I'm unorganized
-bash-3.1$ ls |awk -F'.' '{print $NF}'|sort|uniq -c|sort -nr
17 sh
16 pig
8 py
5 gz
4 log
3 txt
3 ini
3 csv
2 yaml
> quantile(all_tweets$cnt,c(.10,.20,.30,.40,.50,.60,.70,.80,.90,.99,.999,.9999,.99999,.999999,1))
10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 99.9% 99.99% 99.999% 99.9999% 100%
1.00 1.00 1.00 1.00 2.00 2.00 3.00 4.00 9.00 346.84 24177.48 224496.43 775218.76 855021.08 863888.00
@neilkod
neilkod / gist:1319966
Created October 27, 2011 15:56
my stopwords code, optimized for social media
given a text file containing stopwords, return a python list of its contents.
this list is optimized for twitter/social media and filters out stuff
like RT, nowplaying, lastfm, 4sq etc.
def get_stopwords(file='stopwords.txt'):
words = open(file,'r')
stopwords = [word.strip() for word in words]
return set(stopwords)
>>> text.generate()
Bless you You are a genius. His death affects me to think differently.
Thank you Steve. I do n't come so far , limitless and unquantifiable.
You actually changed entire industries through his creative vision.
Let the world we live peacefully. This world would not be forgotten by
me , for the technology industry and the amazing songs , photos ,
communicate , what 's most complicated things " from news " email that
I understood what the launch of itunes earlier that day. While I never
had the privelage to have had an appreciation and fascination with the
world
@neilkod
neilkod / gist:1299438
Created October 19, 2011 19:42
steve job tribute scraper
# downloaded messages can be found
# at http://www.neilkodner.com/stevejobs_tribute.txt
#!/usr/bin/python
import urllib2
import simplejson as json
import time
import codecs
# scrapes messages from http://www.apple.com/stevejobs/
@neilkod
neilkod / python_from_pig.py
Created September 9, 2011 14:53
python_from_pig.py
#!/opt/cnet-python/default-2.6/bin/python
from org.apache.pig.scripting import *
P = Pig.compile("""
raw = load '$input_file' using PigStorage();
grpd = GROUP raw ALL;
cntd = FOREACH grpd GENERATE COUNT(raw);
store cntd INTO '$output_dir' USING PigStorage();
""")