Skip to content

Instantly share code, notes, and snippets.

View asehmi's full-sized avatar

Arvindra Sehmi asehmi

View GitHub Profile
@asehmi
asehmi / compute_embeddings_e5.py
Created January 23, 2024 07:19 — forked from pszemraj/compute_embeddings_e5.py
helper script using just transformers/torch to compute text embeddings (for e5 models https://huggingface.co/intfloat/e5-base-v2 )
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
from pandas import DataFrame
from typing import List, Union
from tqdm.auto import tqdm, trange
@asehmi
asehmi / grammar_synthesis.py
Created January 23, 2024 07:17 — forked from pszemraj/grammar_synthesis.py
basic implementation of a custom wrapper class for using the grammar synthesis text2text models
"""
Class for correcting text using a pretrained model grammar synthesis model.
- models are available here: https://hf.co/models?other=grammar%20synthesis
requirements for this snippet:
pip install -U transformers accelerate
NOTE: if you want to use 9-bit to fit the model on a smaller GPU, you need bitsandbytes:
pip install -U transformers accelerate bitsandbytes
@asehmi
asehmi / hf_repo_download.py
Created January 23, 2024 07:15 — forked from pszemraj/hf_repo_download.py
huggingface hub - download a full snapshot of a repository without using git
"""
hf_hub_download.py
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.
Usage:
python hf_hub_download.py <repo_id> [options]
Arguments:
<repo_id> Repository ID in the format "organization/repository".
@asehmi
asehmi / nougat_em.sh
Created January 23, 2024 07:12 — forked from pszemraj/nougat_em.sh
bash script to apply facebookresearch/nougat on a directory of PDFs
#!/bin/bash
# pip install nougat-ocr
# see https://github.com/facebookresearch/nougat for details and license
DEFAULT_BATCHSIZE=4
usage() {
echo "Usage: $0 <path_to_directory> [--batchsize BATCHSIZE]"
exit 1
@asehmi
asehmi / download_URLs_in_file.py
Created January 23, 2024 07:10 — forked from pszemraj/download_URLs_in_file.py
pdf downloading utils
import os
import argparse
import requests
from urllib.parse import urlparse
from tqdm import tqdm
from joblib import Parallel, delayed
from tenacity import retry, stop_after_attempt, wait_fixed
@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
@asehmi
asehmi / printarr
Created February 26, 2023 00:48 — forked from nmwsharp/printarr
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.