Skip to content

Instantly share code, notes, and snippets.

View vadimkantorov's full-sized avatar
💭
looking for an internship for summer/fall 2021

Vadim Kantorov vadimkantorov

💭
looking for an internship for summer/fall 2021
View GitHub Profile
@vadimkantorov
vadimkantorov / git_private_fork.sh
Created May 16, 2025 15:26
Create a private fork of verl
# reference: https://gist.github.com/0xjac/85097472043b697ab57ba1b1c7530274
git clone --bare [email protected]:volcengine/verl.git
cd verl.git
# create a bare repo vaidmkantorov/verl
git push --mirror [email protected]:vadimkantorov/verl.git
cd .. && rm -rf verl.git
git clone [email protected]:vadimkantorov/verl.git
@vadimkantorov
vadimkantorov / tqdm.py
Last active May 13, 2025 13:05
Extremely simplified single-file, 20 LOC version of https://tqdm.github.io/docs/tqdm/ for debugging tqdm bugs like https://github.com/tqdm/tqdm/issues/760 or dropping the full dependency
# Save as tqdm.py in project dir, then `from tqdm import tqdm; from tqdm.auto import tqdm` should pick up this class, if fails use export PYTHONPATH=.
# Test run: python tqdm.py
import os, sys
# huggingface_hub/hf_api.py:
# from tqdm.auto import tqdm as base_tqdm
# from tqdm.contrib.concurrent import thread_map
# https://tqdm.github.io/docs/shortcuts/#tqdmauto
sys.modules['tqdm.auto'] = sys.modules[__name__]
@vadimkantorov
vadimkantorov / pip_install_dependencies_from_pyproject_toml.sh
Last active April 30, 2025 14:28
Install only pip dependencies from pyproject.toml (e.g. from from https://github.com/augustepoiroux/LeanInteract )
# https://github.com/pypa/pip/issues/11440
# https://github.com/pypa/pip/issues/7822
# https://stackoverflow.com/a/79598932/445810
# tomllib is available starting from python --version >= 3.11
python -m pip install $(python -c 'import tomllib;print(*tomllib.load(open("pyproject.toml","rb"))["project"]["dependencies"])') # --user --break-system-packages
@vadimkantorov
vadimkantorov / sitecustomize.py
Last active April 22, 2025 19:55
Python trace urllib HTTP requests
import http.client
http.client.HTTPConnection.debuglevel = 1
@vadimkantorov
vadimkantorov / ssh.sh
Last active May 12, 2025 13:24
Various ssh commands
# https://superuser.com/questions/1687960/over-ssh-can-you-use-the-same-private-key-on-the-host-side-for-other-purposes
alias sshagentssh='ssh-agent ssh -A -o AddKeysToAgent=yes'
# generate ssh key for github
# https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
ssh-keygen -t ed25519 -b 4096 -C "[email protected]" -f ./id_ed25519 -N="" # -q
# https://stackoverflow.com/questions/4565700/how-to-specify-the-private-ssh-key-to-use-when-executing-shell-command-on-git
# https://github.com/settings/ssh/new
export GIT_SSH_COMMAND="ssh -o IdentitiesOnly=yes -i $PWD/id_ed25519"
@vadimkantorov
vadimkantorov / parquet2npyztsv.py
Last active April 25, 2025 13:37
Convert Parquet tables to npy (as record array) or npz (as columns) or tsv (as text columns)
# Usage: python parquet2npyztsv.py test.npy data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.npz data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.tsv data/train-*-of-*.parquet
import sys
import numpy as np
import pyarrow.parquet as pq
output_path, *input_paths = sys.argv[1:]
@vadimkantorov
vadimkantorov / git_lfs_clone_dedup.sh
Last active April 18, 2025 13:30
A simple git lfs dedup impl done with hard links to avoid duplication of data object files (suitable for readonly cloned repos like models/datasets from HuggingFace, leaves the repo in an invalid state)
# Usage: bash git_lfs_clone_dedup.sh https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324
# Usage: bash git_lfs_clone_dedup.sh [email protected]:deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324
# https://github.com/git-lfs/git-lfs/discussions/6029
GIT_LFS_SKIP_SMUDGE=1 git clone $1 $2
cd $2
git lfs fetch
git lfs ls-files -l | while read SHA DASH FILEPATH; do rm "$FILEPATH" && ln ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done
#git lfs ls-files -l | while read SHA DASH FILEPATH; do mv ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done
@vadimkantorov
vadimkantorov / download_hf_deepseek.sh
Last active April 15, 2025 17:51
Downloads DeepSeek model weights from HF without weight file duplication in .git/lfs/objects
sudo apt-get install git-lfs
git lfs install
# git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
# du -sh DeepSeek-V3-0324
# # 1.3T DeepSeek-V3-0324/
# du -sh DeepSeek-V3-0324/.git/lfs
# # 642G DeepSeek-V3-0324/.git/lfs
# https://github.com/git-lfs/git-lfs/discussions/6029
@vadimkantorov
vadimkantorov / yaml_loads.js
Created March 13, 2025 11:24
JavaScript function for parsing simple YAML (supports only strings, lists, dicts)
// based on simplified version of Python snippet: https://gist.github.com/vadimkantorov/b26eda3645edb13feaa62b874a3e7f6f
function yaml_loads(frontamtter_str)
{
const procval = s => (s.length >= 2 && s[0] == '"' && s[s.length - 1] == '"') ? s.slice(1, s.length - 1) : (s.length >= 2 && s[0] == "'" && s[s.length - 1] == "'") ? s.slice(1, s.length - 1) : s;
for(const line of frontmatter_str.split('\n'))
{
const line_strip = line.trim();
const is_list_item = line_strip.startsWith('- ');
@vadimkantorov
vadimkantorov / svgdataurify.js
Created February 22, 2025 17:06
Conversion of SVG to data-uri format with prefix data:image/svg+xml - a primer in JavaScript
// based on https://github.com/tigt/mini-svg-data-uri/issues/24
// Usage: cat myicon.svg | node svgdataurify.js
let svg = "";
process.stdin.on("data", (chunk) => { svg += chunk; });
process.stdin.on("end", async () =>
{
const reWhitespace = /\s+/g, reUrlHexPairs = /%[\dA-F]{2}/g, hexDecode = {'%20': ' ', '%3D': '=', '%3A': ':', '%2F': '/'}, specialHexDecode = match => hexDecode[match] || match.toLowerCase();
if(svg.charCodeAt(0) === 0xfeff) svg = svg.slice(1);
svg = svg.trim().replace(reWhitespace, ' ').replaceAll('"', '\'');