Skip to content

Instantly share code, notes, and snippets.

View vadimkantorov's full-sized avatar
💭
looking for an internship for summer/fall 2021

Vadim Kantorov vadimkantorov

💭
looking for an internship for summer/fall 2021
View GitHub Profile
@vadimkantorov
vadimkantorov / sitecustomize.py
Last active April 22, 2025 19:55
Python trace urllib HTTP requests
import http.client
http.client.HTTPConnection.debuglevel = 1
@vadimkantorov
vadimkantorov / ssh.sh
Last active April 22, 2025 19:55
Various ssh commands
# https://superuser.com/questions/1687960/over-ssh-can-you-use-the-same-private-key-on-the-host-side-for-other-purposes
alias sshagentssh='ssh-agent ssh -A -o AddKeysToAgent=yes'
# generate ssh key for github
# https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
ssh-keygen -t ed25519 -b 4096 -C "[email protected]" -f ./id_ed25519 -N="" # -q
# https://stackoverflow.com/questions/4565700/how-to-specify-the-private-ssh-key-to-use-when-executing-shell-command-on-git
# https://github.com/settings/ssh/new
export GIT_SSH_COMMAND="ssh -o IdentitiesOnly=yes -i $PWD/id_ed25519"
@vadimkantorov
vadimkantorov / parquet2npyztsv.py
Last active April 25, 2025 13:37
Convert Parquet tables to npy (as record array) or npz (as columns) or tsv (as text columns)
# Usage: python parquet2npyztsv.py test.npy data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.npz data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.tsv data/train-*-of-*.parquet
import sys
import numpy as np
import pyarrow.parquet as pq
output_path, *input_paths = sys.argv[1:]
@vadimkantorov
vadimkantorov / git_lfs_clone_dedup.sh
Last active April 18, 2025 13:30
A simple git lfs dedup impl done with hard links to avoid duplication of data object files (suitable for readonly cloned repos like models/datasets from HuggingFace, leaves the repo in an invalid state)
# Usage: bash git_lfs_clone_dedup.sh https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324
# Usage: bash git_lfs_clone_dedup.sh [email protected]:deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324
# https://github.com/git-lfs/git-lfs/discussions/6029
GIT_LFS_SKIP_SMUDGE=1 git clone $1 $2
cd $2
git lfs fetch
git lfs ls-files -l | while read SHA DASH FILEPATH; do rm "$FILEPATH" && ln ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done
#git lfs ls-files -l | while read SHA DASH FILEPATH; do mv ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done
@vadimkantorov
vadimkantorov / download_hf_deepseek.sh
Last active April 15, 2025 17:51
Downloads DeepSeek model weights from HF without weight file duplication in .git/lfs/objects
sudo apt-get install git-lfs
git lfs install
# git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
# du -sh DeepSeek-V3-0324
# # 1.3T DeepSeek-V3-0324/
# du -sh DeepSeek-V3-0324/.git/lfs
# # 642G DeepSeek-V3-0324/.git/lfs
# https://github.com/git-lfs/git-lfs/discussions/6029
@vadimkantorov
vadimkantorov / yaml_loads.js
Created March 13, 2025 11:24
JavaScript function for parsing simple YAML (supports only strings, lists, dicts)
// based on simplified version of Python snippet: https://gist.github.com/vadimkantorov/b26eda3645edb13feaa62b874a3e7f6f
function yaml_loads(frontamtter_str)
{
const procval = s => (s.length >= 2 && s[0] == '"' && s[s.length - 1] == '"') ? s.slice(1, s.length - 1) : (s.length >= 2 && s[0] == "'" && s[s.length - 1] == "'") ? s.slice(1, s.length - 1) : s;
for(const line of frontmatter_str.split('\n'))
{
const line_strip = line.trim();
const is_list_item = line_strip.startsWith('- ');
@vadimkantorov
vadimkantorov / svgdataurify.js
Created February 22, 2025 17:06
Conversion of SVG to data-uri format with prefix data:image/svg+xml - a primer in JavaScript
// based on https://github.com/tigt/mini-svg-data-uri/issues/24
// Usage: cat myicon.svg | node svgdataurify.js
let svg = "";
process.stdin.on("data", (chunk) => { svg += chunk; });
process.stdin.on("end", async () =>
{
const reWhitespace = /\s+/g, reUrlHexPairs = /%[\dA-F]{2}/g, hexDecode = {'%20': ' ', '%3D': '=', '%3A': ':', '%2F': '/'}, specialHexDecode = match => hexDecode[match] || match.toLowerCase();
if(svg.charCodeAt(0) === 0xfeff) svg = svg.slice(1);
svg = svg.trim().replace(reWhitespace, ' ').replaceAll('"', '\'');
@vadimkantorov
vadimkantorov / wslv1nodeinstall.sh
Last active February 13, 2025 20:00
Proper install command of node/npm on WSLv1 Ubuntu
# from https://github.com/microsoft/WSL/issues/8151#issuecomment-2276363014
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
@vadimkantorov
vadimkantorov / prependfrontmatter.sh
Created February 10, 2025 14:32
Sed script to prepend a Jekyll/Liquid front matter to a file
# prependfrontmatter ./index.html
alias prependfrontmatter="sed -i '1i---\n---'"
# https://unix.stackexchange.com/questions/99350/how-to-insert-text-before-the-first-line-of-a-file
@vadimkantorov
vadimkantorov / citygeocoder.py
Last active February 12, 2025 00:12
Queries WikiData / SPARQL endpoint for the GPS coordinates of world's 5000 most populated cities
# python citygeocoder.py > '~citygeocoder.json'
# https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial/en
# https://github.com/OSMNames/OSMNames, http://github.com/OSMNames/OSMNames/issues/208
# https://osmnames.org/download/
# https://stackoverflow.com/questions/74261733/how-to-fetch-gps-coordinates-of-worlds-largest-cities-from-wikidata-via-sparql
# FIXME: for some reason misses Helsinki
import sys
import json
import urllib.parse