Skip to content

Instantly share code, notes, and snippets.

View willard-yuan's full-sized avatar
🎯
Focusing

Yong Yuan willard-yuan

🎯
Focusing
View GitHub Profile
@luistung
luistung / tokenization.cpp
Created October 11, 2019 12:02
c++ version of bert tokenize
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include <utf8proc.h>
//https://unicode.org/reports/tr15/#Norm_Forms
//https://ssl.icu-project.org/apiref/icu4c/uchar_8h.html
@madaan
madaan / safetensors_to_pytorch_ckpt.py
Last active July 8, 2025 02:30
Safetensors to pytorch checkpoint
from safetensors.torch import load_file
from glob import glob
import torch
from tqdm import tqdm
def main(base_path: str):
"""
Convert safetensors files to pytorch checkpoints files.