Skip to content

Instantly share code, notes, and snippets.

View stas00's full-sized avatar

Stas Bekman stas00

View GitHub Profile
@stas00
stas00 / bart-perf-test.ipynb
Last active August 1, 2020 02:50
wip test
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stas00
stas00 / url_unshortener.py
Last active March 11, 2020 03:17
url shortener unshortener (resolves url)
# this code handles redirects, failed requests, etc. can be tweaked to return some non-final url as well.
from urllib.parse import urlsplit
import requests
headers = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
def resolve_url_req(url):
""" if `url` is redirected returns the new url, otherwise None is returned """
try:
r = requests.head(url, headers=headers, allow_redirects=False, timeout=10)
if r.status_code in [301, 302]:
@stas00
stas00 / gist:0ba5d30f0109967324f122bfcc8b52f5
Last active March 24, 2020 15:55
bert training loop w/ validation loss reporting (and more compact)
# drop in replacement for the training loop in https://mccormickml.com/2019/07/22/BERT-fine-tuning/
# ---- cell 1 ----
import random
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
# Set the seed value all over the place to make this reproducible.
seed_val = 42
@stas00
stas00 / pmi_discount
Last active December 13, 2019 03:39
vectorized fast implementation of PMI contextual discounting in pandas+numpy
def pmi_discount(df):
""" Turney and Pantel (2010)
From Frequency to Meaning: Vector Space Models of Semantics
arXiv:1003.1141 [cs.CL] https://arxiv.org/abs/1003.1141
p. 158 "contextual discounting" extension of PMI
rc_min = min(rowsum, colsum)
l = cell / (cell + 1) * rc_min / (rc_min + 1)
newpmi = pmi * l
in: pmi pandas df
@stas00
stas00 / conftest.py
Created March 14, 2019 00:27
pytest: report general memory leakage of tests (beyond threshold)
# pytest: report general memory leakage of tests (beyond threshold)
# from https://nvbn.github.io/2017/02/02/pytest-leaking/
# add the following code to tests/conftest.py in your test suite and run `pytest`
import os
from psutil import Process
_proc = Process(os.getpid())
def get_consumed_ram():
return _proc.memory_info().rss
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stas00
stas00 / pytest-skeleton-autogenerate.md
Last active March 7, 2019 22:02
quickly generate pytest test suite skeletons - useful for bug reports

quick-n-dirty pytest test suite creation

useful for bug reports and quick tests

test suite with conftest.py session-wide fixture that runs automatically at the end of the test suite

cd /tmp
mkdir tests
echo -e "import pytest\nfrom warnings import warn\[email protected](scope='session', autouse=True)\ndef run_check(request): yield; warn('\\\\n\\\\n*** This is global warning ***\\\\n')" > tests/conftest.py
echo -e "def test_a(): assert True\ndef test_b(): assert True\ndef test_c(): assert True" > tests/test_1.py
@stas00
stas00 / gist:0ba25525df65497cc16fdea9fbda5bc4
Last active February 23, 2019 05:05
A plea for github to fix the CLA signing issue on the user side
Here is a support letter I have just sent to github [2019-02-22]:
--------------------->8---------------->8-------------------->8------------------
Hi,
I contacted you some 6 months ago and it doesn't look like this is of a
priority, but it's of a huge priority to tens of thousands of projects that now
require PR submitters to sign CLA before their PR can be accepted.
You probably don't realize that if I looked at the PR changes and the user then
@stas00
stas00 / str2func.py
Last active February 20, 2019 06:54
Convert a string of a fully qualified function, class or module into its correspong python object, if such exists. See examples at the end.
import sys
def str2func(name):
"Convert a string of a fully qualified function, class or module into its python object"
if isinstance(name, str):
subpaths = name.split('.')
else:
return None
module = subpaths.pop(0)
if module in sys.modules: