Skip to content

Instantly share code, notes, and snippets.

@akhan619
akhan619 / tokenizers.md
Last active October 31, 2023 10:22
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

@manuelmazzuola
manuelmazzuola / restclient.go
Last active March 10, 2025 17:55
how to implement a RESTClientGetter for helm action pkg
package main
import (
"k8s.io/apimachinery/pkg/api/meta"
"k8s.io/client-go/discovery"
"k8s.io/client-go/discovery/cached/memory"
"k8s.io/client-go/rest"
"k8s.io/client-go/restmapper"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/client-go/tools/clientcmd/api"
@willprice
willprice / README.md
Last active April 28, 2025 16:42
Install OpenCV 4.1.2 for Raspberry Pi 3 or 4 (Raspbian Buster)

Install OpenCV 4.1.2 on Raspbian Buster

$ chmod +x *.sh
$ ./download-opencv.sh
$ ./install-deps.sh
$ ./build-opencv.sh
$ cd ~/opencv/opencv-4.1.2/build
$ sudo make install
@mozillazg
mozillazg / app.py
Created December 5, 2017 00:06
A simple demo for how to use flask-paginate.
from flask import Flask, render_template
from flask_paginate import Pagination, get_page_args
app = Flask(__name__)
app.template_folder = ''
users = list(range(100))
def get_users(offset=0, per_page=10):
@hediet
hediet / main.md
Last active April 7, 2025 18:00
Proof that TypeScript's Type System is Turing Complete
type StringBool = "true"|"false";


interface AnyNumber { prev?: any, isZero: StringBool };
interface PositiveNumber { prev: any, isZero: "false" };

type IsZero<TNumber extends AnyNumber> = TNumber["isZero"];
type Next<TNumber extends AnyNumber> = { prev: TNumber, isZero: "false" };
type Prev<TNumber extends PositiveNumber> = TNumber["prev"];
@simonw
simonw / recover_source_code.md
Last active September 28, 2024 08:10
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb
@joepie91
joepie91 / index.js
Last active December 17, 2024 10:45
Breaking CloudFlare's "I'm Under Attack" challenge
'use strict';
const parseExpression = require("./parse-expression");
function findAll(regex, target) {
let results = [], match;
while (match = regex.exec(target)) {
results.push(match);
}
@markwallsgrove
markwallsgrove / go-update.go
Created February 18, 2016 20:21
go-update tutorial, fast start
package main
// https://godoc.org/github.com/inconshreveable/go-update
// This gist documents go-install using a SHA-256 checksum,
// elliptic curve (prime 256) encrypted signature & bsdiff
// formatted patch.
// These steps took me over a hour to figure. go-install currently
// doesn't include a tutorial or quick start guide, so I created this
@shmup
shmup / torrents.md
Last active May 7, 2025 17:50
transmission blocklist guide

Transmission Blocklist

The Transmission torrent client has an option to set a Blocklist, which helps protect you from getting caught and having the DMCA send a letter/email.

It's as simple as downloading and installing the latest client:

@abhishektomar
abhishektomar / elk.sh
Last active October 12, 2022 10:02
Bash Script to Install Elastic Search, Logstash and Kibana
#!/bin/bash
# Checking whether user has enough permission to run this script
sudo -n true
if [ $? -ne 0 ]
then
echo "This script requires user to have passwordless sudo access"
exit
fi