Stas Bekman stas00

Toolmaker. Author. Software creator, optimizer and harmonizer. Makes things work. Current domains: LLM/Training/Inference/Scalability/Machine Learning

1.9k followers · 7 following

Stasosphere Online Inc. /
BC, Canada
https://stasosphere.com/machine-learning/
@StasBekman

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

stas00 / hf-hub-model-download-by-arch-stats.py

Last active December 10, 2021 01:47

most popular HF model downloads by architecture (thanks to @LysandreJik)

	from transformers import CONFIG_MAPPING
	from huggingface_hub import HfApi
	api = HfApi()
	keys = list(CONFIG_MAPPING.keys())
	downloads = {}
	for key in keys:
	models = api.list_models(filter=key)
	total_downloads = sum(model.downloads if hasattr(model, "downloads") else 0 for model in models)
	downloads[key] = total_downloads
	ordered = sorted(downloads.items(), reverse=True, key=lambda t: t[1])

stas00 / profiler_performance_analysis.ipynb

Created July 6, 2021 05:04

this is a rough beginning of an applied torch.profiler tutorial

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

stas00 / profiler.py

Created June 24, 2021 00:18

build_table: put name column last, make cols more narrow

	# torch/autograd/profiler.py
	def build_table(
	events,
	sort_by=None,
	header=None,
	row_limit=100,
	max_src_column_width=75,
	with_flops=False,
	profile_memory=False,
	top_level_events_only=False):

stas00 / conftest.py

Created May 11, 2021 17:15

pytest start/stop tracer - when needing to figure out which tests didn't finish

	# conftest.py
	# to run:
	# TRACE_START_STOP=1 pytest tests/test_trainer.py
	import pytest
	import os
	trace = os.environ.get('TRACE_START_STOP', "")
	@pytest.hookimpl(tryfirst=True, hookwrapper=True)
	def pytest_runtest_makereport(item, call):
	outcome = yield
	res = outcome.get_result()

stas00 / dropout_abs_max_values.py

Created April 15, 2021 18:01

experiment at trying to overcome overflow when using bf16-trained model with fp16 mixed precision (bf16-trained model leads to huge activations)

	# Samyam: I have three thoughts here:
	# 1) would dropping off large activations push the network towards producing smaller activations? I don't the answer but it feels unlikely as the network is not getting penalized in anyway for producing large activations,
	# 2) dropout is meant to be used as a regularization but by dropping out only large values, it's introducing a bias. It may have unexpected impact on convergence,
	# 3) if 1 does not happen then during time of inference where there is no dropout, we have the inf again


	def dropout_abs_max_values(x, p=0.2):
	""" Like Dropout but instead of random sampling, this one zeroth the p fraction of the biggest absolute values """
	topk = int(p * x.shape[-1])
	indices = torch.topk(x.abs(), topk, dim=-1, largest=True)[1]

stas00 / composable_actions.py

Created March 2, 2021 18:56 — forked from mnm364/composable_actions.py

Composable Python argparse actions

	import argparse

	def compose_actions(*actions):
	"""Compose many argparse actions into one callable action.

	Args:
	*actions: The actions to compose.

	Returns:
	argparse.Action: Composed action.

stas00 / all_reduce_bench.py

Last active August 14, 2023 15:58 — forked from jeffra/all_reduce_bench.py

Need to adapt it to the newer version from https://gist.github.com/jeffra/b5e80466b4c86be00ea3b6f130fb7a36

	# python -m torch.distributed.launch --nproc_per_node=2 all_reduce_bench.py

	import torch
	import torch.distributed as dist
	import time
	import argparse
	import os
	import fcntl

	TRIALS = 5

stas00 / test-threads-cuda.max_memory_allocated-to-dev.py

Last active January 24, 2021 17:39

	# same as the other script, but this time each thread allocates on a different device
	# still reports correctly
	import threading
	import time
	import torch

	def print_mem_usage(prefix):
	n_gpus = torch.cuda.device_count()
	for id in range(n_gpus):
	with torch.cuda.device(id):

stas00 / require_no_pytest_distributed.py

Last active December 30, 2020 01:43

pytest skip marker for when a test must not be run under pytest-xdist -n setting since it does something that requires say all gpus untouched

	# this goes into transformers/testing_utils.py

	_pytest_num_workers = 1

	def set_pytest_num_workers(n):
	"""
	This is helper method that sets how many pytest workers are used (if under pytest-xdist's -n option)
	"""
	_pytest_num_workers = n

stas00 / bart-perf-test-with-baseline-diff.ipynb

Created August 1, 2020 03:11

test2

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Newer Older