Skip to content

Instantly share code, notes, and snippets.

View stas00's full-sized avatar

Stas Bekman stas00

View GitHub Profile
@stas00
stas00 / pytest-skeleton-autogenerate.md
Last active March 7, 2019 22:02
quickly generate pytest test suite skeletons - useful for bug reports

quick-n-dirty pytest test suite creation

useful for bug reports and quick tests

test suite with conftest.py session-wide fixture that runs automatically at the end of the test suite

cd /tmp
mkdir tests
echo -e "import pytest\nfrom warnings import warn\[email protected](scope='session', autouse=True)\ndef run_check(request): yield; warn('\\\\n\\\\n*** This is global warning ***\\\\n')" > tests/conftest.py
echo -e "def test_a(): assert True\ndef test_b(): assert True\ndef test_c(): assert True" > tests/test_1.py
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stas00
stas00 / conftest.py
Created March 14, 2019 00:27
pytest: report general memory leakage of tests (beyond threshold)
# pytest: report general memory leakage of tests (beyond threshold)
# from https://nvbn.github.io/2017/02/02/pytest-leaking/
# add the following code to tests/conftest.py in your test suite and run `pytest`
import os
from psutil import Process
_proc = Process(os.getpid())
def get_consumed_ram():
return _proc.memory_info().rss
@stas00
stas00 / pmi_discount
Last active December 13, 2019 03:39
vectorized fast implementation of PMI contextual discounting in pandas+numpy
def pmi_discount(df):
""" Turney and Pantel (2010)
From Frequency to Meaning: Vector Space Models of Semantics
arXiv:1003.1141 [cs.CL] https://arxiv.org/abs/1003.1141
p. 158 "contextual discounting" extension of PMI
rc_min = min(rowsum, colsum)
l = cell / (cell + 1) * rc_min / (rc_min + 1)
newpmi = pmi * l
in: pmi pandas df
@stas00
stas00 / gist:0ba5d30f0109967324f122bfcc8b52f5
Last active March 24, 2020 15:55
bert training loop w/ validation loss reporting (and more compact)
# drop in replacement for the training loop in https://mccormickml.com/2019/07/22/BERT-fine-tuning/
# ---- cell 1 ----
import random
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
# Set the seed value all over the place to make this reproducible.
seed_val = 42
@stas00
stas00 / url_unshortener.py
Last active March 11, 2020 03:17
url shortener unshortener (resolves url)
# this code handles redirects, failed requests, etc. can be tweaked to return some non-final url as well.
from urllib.parse import urlsplit
import requests
headers = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
def resolve_url_req(url):
""" if `url` is redirected returns the new url, otherwise None is returned """
try:
r = requests.head(url, headers=headers, allow_redirects=False, timeout=10)
if r.status_code in [301, 302]:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stas00
stas00 / bart-perf-test.ipynb
Last active August 1, 2020 02:50
wip test
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stas00
stas00 / require_no_pytest_distributed.py
Last active December 30, 2020 01:43
pytest skip marker for when a test must not be run under pytest-xdist -n setting since it does something that requires say all gpus untouched
# this goes into transformers/testing_utils.py
_pytest_num_workers = 1
def set_pytest_num_workers(n):
"""
This is helper method that sets how many pytest workers are used (if under pytest-xdist's -n option)
"""
_pytest_num_workers = n