Skip to content

Instantly share code, notes, and snippets.

@DaniruKun
DaniruKun / whisper-transcribe.bash
Last active November 7, 2024 07:15
Transcribe (and translate) any VOD (e.g. from Youtube) using Whisper from OpenAI and embed subtitles!
#!/usr/bin/env bash
# Small shell script to more easily automatically download and transcribe live stream VODs.
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggerganov/whisper.cpp
# Use `./transcribe-vod help` to print help info.
# MIT License
# Copyright (c) 2022 Daniils Petrovs
@kinoc
kinoc / j6b_train_hf_ds.py
Last active September 17, 2024 18:53
So now you want to finetune that GPT-J-6B on a 3090/TITAN GPU ... okay, using HF and DeepSpeed too
# So now you want to finetune that GPT-J-6B on a 3090/TITAN GPU ... okay
# More exploratory coding. It uses the Huggingface model port, deepspeed and reads all text/md files from a target directory
# It is a fragment of a larger system with remote editing, but that's another story
# This is the raw, training tester. Items to look out for:
# - uses DeepSpeed and has a DS config
# - to save space uses SGD instead of ADAM
# - uses gradient checkpointing
# - freezes 25% of the layers to fit
# Assumes you can already run https://gist.github.com/kinoc/2d636a68876cd3de7b6e9c9452b61089
@ghing
ghing / README.md
Created March 23, 2020 18:13
Identifying which PDFs are image vs. text

Identifying which PDFs are images vs. text

Someone who was in my PDF text extraction session at NICAR 2020 asked how to identify image vs. text PDFs when you have thousands of files and they're a mixture of formats with the end goal of only running OCR software on the image PDFs.

This is how I would approach the problem using command-line tools.

Assumptions

  • You’re working on a Mac or Linux machine where you have access to some common command-line utilities such as find and sed
  • This should work under the Windows Subshell for Linux under Windows also
@Gordin
Gordin / cd_for_windows_paths.sh
Last active September 7, 2024 04:53
If you put this in your .bashrc/.zshrc you will be able to use cd to Windows style paths. This is probably only useful for WSL users.
cd() {
# Check if no arguments to make just typing cd<Enter> work
# Also check if the first argument starts with a - and let cd handle it
if [ $# -eq 0 ] || [[ $1 == -* ]]
then
builtin cd $@
return
fi
# If path exists, just cd into it
# (also, using $* and not $@ makes it so you don't have to escape spaces any more)
@JoeyBurzynski
JoeyBurzynski / 55-bytes-of-css.md
Last active April 8, 2025 14:18
58 bytes of css to look great nearly everywhere

58 bytes of CSS to look great nearly everywhere

When making this website, i wanted a simple, reasonable way to make it look good on most displays. Not counting any minimization techniques, the following 58 bytes worked well for me:

main {
  max-width: 38rem;
  padding: 2rem;
  margin: auto;
}
@sloanlance
sloanlance / jq_jsonl_conversion.md
Last active April 18, 2025 14:28
jq: JSONL ↔︎ JSON conversion

jq: JSONL ↔︎ JSON conversion

Prerequisites

  • jqhttps://jqlang.github.io/jq/ — "like sed for JSON data"

    There are several options available for installing jq. I prefer to use Homebrew: brew install jq

  1. JSONL → JSON

@VirtuBox
VirtuBox / nginx-geoip-module.md
Last active January 24, 2024 08:44
How to configure GeoIP module for Nginx

Create a folder to store the databases :

mkdir -p /usr/share/GeoIP

Download Country IP database

wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
gunzip GeoIP.dat.gz
@claczny
claczny / fuzzymatch_titles.py
Created January 6, 2017 15:45
Python code to fuzzy match two files (A and B) of titles to find missing titles in B, i.e., multiplications in A. Not very efficient, but does the job.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from collections import Counter
A_title_file = "/tmp/A_titles.txt"
B_title_file = "/tmp/B_titles.txt"
# Open the files and get the titles
A_titles = []
with open(A_title_file) as f:
@Nilpo
Nilpo / Using Git to Manage a Live Web Site.md
Last active April 18, 2025 15:39
Using Git to Manage a Live Web Site

Using Git to Manage a Live Web Site

Overview

As a freelancer, I build a lot of web sites. That's a lot of code changes to track. Thankfully, a Git-enabled workflow with proper branching makes short work of project tracking. I can easily see development features in branches as well as a snapshot of the sites' production code. A nice addition to that workflow is that ability to use Git to push updates to any of the various sites I work on while committing changes.

Contents

@zhujunsan
zhujunsan / Using Github Deploy Key.md
Last active April 18, 2025 16:10
Using Github Deploy Key

What / Why

Deploy key is a SSH key set in your repo to grant client read-only (as well as r/w, if you want) access to your repo.

As the name says, its primary function is to be used in the deploy process in replace of username/password, where only read access is needed. Therefore keep the repo safe from the attack, in case the server side is fallen.

How to

  1. Generate a ssh key