Skip to content

Instantly share code, notes, and snippets.

View hellpanderrr's full-sized avatar
🎯
Focusing

hellpanderrr

🎯
Focusing
View GitHub Profile
@hellpanderrr
hellpanderrr / OHE.py
Last active February 21, 2019 08:15
One hot encode a dataframe in pandas and sklearn
from sklearn.feature_extraction import DictVectorizer as DV
vectorizer = DV(sparse = False)
v = vectorizer.fit_transform(df.T.to_dict().values())
new_df = pd.DataFrame(v, columns=vectorizer.feature_names_)
@hellpanderrr
hellpanderrr / string_agg.py
Last active January 26, 2016 13:54
Django PostgreSQL string_agg
#models.py
from django.db.models import Aggregate
class Concat(Aggregate):
def add_to_query(self, query, alias, col, source, is_summary):
#we send source=CharField to prevent Django from casting string to int
aggregate = SQLConcat(col, source=models.CharField(), is_summary=is_summary, **self.extra)
query.aggregates[alias] = aggregate
def __init__(self, col, distinct=False, **extra):
@hellpanderrr
hellpanderrr / .py
Created February 11, 2016 06:36
Read Cyrillic from PyPDF2
def fix_string(string):
ret = ''
for char in string:
try:
ret += char.encode('cp1252').decode('cp1251')
except:
ret += char
return ret
@hellpanderrr
hellpanderrr / isiterable.py
Last active April 5, 2016 20:16
python object is iterable
#'Iterable' means object has an __iter__() method.
isiterable = lambda obj: isinstance(obj, basestring) or bool(getattr(obj, '__iter__', False))
from collections import Iterable
isiterable = lambda obj: isinstance(obj, Iterable)
#https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable
@hellpanderrr
hellpanderrr / drop_duplicates.py
Last active February 21, 2019 08:17
python drop duplicates from an iterable while preserving the order
### http://www.peterbe.com/plog/uniqifiers-benchmark
def uniqify(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
@hellpanderrr
hellpanderrr / django server setup.md
Last active March 25, 2016 07:00
Ubuntu 14.04 Django1.9 Nginx Gunicorn Setup without virtualenv

sudo apt-get install nginx

sudo pip install gunicorn

/home/ubuntu/my_project -- path to the folder containing manage.py

/home/ubuntu/my_project/my_project -- path to the folder containing settings.py

$USER -- your username

@hellpanderrr
hellpanderrr / cling.bash
Created April 7, 2016 22:48
installing cling on ubuntu 14
# you might need to install cmake
#!/bin/bash
#
# axel@cern.ch, 2014-02-07
# which is not ideal, see http://stackoverflow.com/a/677212/1392758
python=`which python`
if type python2 > /dev/null 2>&1; then
@hellpanderrr
hellpanderrr / python list split.py
Last active February 21, 2019 08:15
Python split list into list of sublists. A list of break words is used as the separator, the list is split between each break word.
def break_list_by_code_words(l, words):
'''Breaks 1-d list in groups by code words.
['Code_word_1,'abc','123','sd','Code_word_2','34253','123','Code_word_3','1231','4545'] -->
[[Code_word_1,'abc','123','sd'], ['Code_word_2','34253','123'] ,['Code_word_3','1231','4545'] ]
'''
flag = False
ret = []
block = []
for i in l:
@hellpanderrr
hellpanderrr / gist:a751d16905d0c27d61ea334f6e9212c9
Last active May 8, 2016 20:24
pdfminer-20140328 profiling
DSD_Y41-650-D-PermitsCompleted.pdf 34 pages
Sun May 08 22:53:13 2016 restats_w
53819963 function calls (53796078 primitive calls) in 187.163 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
Mon May 09 05:55:14 2016 old
28055818 function calls (28031179 primitive calls) in 108.555 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3666302 33.412 0.000 55.143 0.000 utils.py:307(find)
6232384 12.518 0.000 16.739 0.000 utils.py:266(_getrange)
165520 8.038 0.000 64.496 0.000 layout.py:601(isany)