Skip to content

Instantly share code, notes, and snippets.

View egorsmkv's full-sized avatar
🌍
world, hello

Yehor Smoliakov egorsmkv

🌍
world, hello
View GitHub Profile
@egorsmkv
egorsmkv / create_ds.py
Created March 21, 2025 13:32
Upload MP3 to HF
import json
from glob import glob
from os.path import basename
files_all = glob("data/*.mp3")
results = []
for idx, filename in enumerate(files_all):
duration = 0
results.append({'file_name': basename(filename), 'duration': duration, 'transcription': '-'})
@egorsmkv
egorsmkv / main.rs
Last active March 14, 2025 14:00
Fast Inverse Square Root written in Rust, translated to Coq using https://github.com/formal-land/coq-of-rust
fn q_rsqrt(number: f32) -> f32 {
let threehalfs: f32 = 1.5;
let x2: f32 = number * 0.5;
let mut y: f32 = number;
let i: u32 = y.to_bits(); // safely get the bit representation of the float
let i: u32 = 0x5f3759df - (i >> 1); // what the heck?
y = f32::from_bits(i); // safely convert bits back to float
y * (threehalfs - (x2 * y * y)) // 1st iteration
WhisperForConditionalGeneration(
  (model): WhisperModel(
    (encoder): WhisperEncoder(
      (conv1): Conv1d(128, 1280, kernel_size=(3,), stride=(1,), padding=(1,))
      (conv2): Conv1d(1280, 1280, kernel_size=(3,), stride=(2,), padding=(1,))
      (embed_positions): Embedding(1500, 1280)
      (layers): ModuleList(
        (0-31): 32 x WhisperEncoderLayer(
          (self_attn): WhisperSdpaAttention(
import torchaudio
from speechbrain.pretrained import VAD
VAD = VAD.from_hparams(source="speechbrain/vad-crdnn-libriparty", savedir="pretrained_models/vad-crdnn-libriparty")
test_file = 'a.wav'
boundaries = VAD.get_speech_segments(test_file)
segments = VAD.get_segments(boundaries, test_file)
"""
Python implementation of Viterbi algorithm for word segmentation
A clean-up of this: http://norvig.com/ngrams/ch14.pdf
-
You also need 'unigrams.txt' and 'bigrams.txt' to run the segmentation. The ngrams
used in this implementation is from the 'count_1w.txt' and 'count_2w.txt' provided
here: http://norvig.com/ngrams/
-
Usage:
>>> from segment import viterbi
@egorsmkv
egorsmkv / flashlight-coreweave.md
Last active July 9, 2022 12:59
Installation of Facebook's Flashlight (former wav2letter++), install CUDA 10 on Ubuntu 18.04, tested with a Tesla V100 on CoreWeave GPU Cloud
  • GPU: Tesla v100
  • Ubuntu 18.04
apt update
apt install cmake gcc-7 liblzma-dev libbz2-dev

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
@egorsmkv
egorsmkv / mphdict_words_forms.py
Created April 28, 2021 14:39
mphdict words forms generator in python
"""
Generator of words forms for LinguisticAndInformationSystems/mphdict
Source code: https://github.com/LinguisticAndInformationSystems/mphdict/blob/master/src/mphdict/mphDb.cs#L214
License: https://github.com/LinguisticAndInformationSystems/mphdict/blob/master/LICENSE.txt
Copyright: uSofTrod
Output is like the following:
@egorsmkv
egorsmkv / quantum_resources.md
Last active March 21, 2021 10:13
This is my list of resources on the Quantum Technologies topic. You can suggest your links in the Comments section.

Quantum Resources

Websites

  • [Full-Stack Quantum Computation][3]
  • [Quantum Computing on Stack Exchange][15]

Social Groups

  • [Quantum Computing][4]
@egorsmkv
egorsmkv / algo.py
Last active August 31, 2020 10:15
An algorithm to search longest date intervals among a list of dates in Python (currently searches longest date intervals in months)
from datetime import datetime
from typing import List
def solve(date_items: List[datetime]):
"""
Get the longest date interval from the date_items list.
:param date_items:
:return:
import sys
ETC_SYSCONFIG_NE = 'OPTIONS="--web.listen-address=:{port} --collector.textfile.directory ' \
'/var/lib/node_exporter/textfile_collector --collector.systemd --collector.processes"'
ETC_SYSTEMD_NE = '''[Unit]
Description=Node Exporter
After=network.target
[Service]