Skip to content

Instantly share code, notes, and snippets.

View asehmi's full-sized avatar

Arvindra Sehmi asehmi

View GitHub Profile
@asehmi
asehmi / printarr
Created February 26, 2023 00:48 — forked from nmwsharp/printarr
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc.
@asehmi
asehmi / download_URLs_in_file.py
Created January 23, 2024 07:10 — forked from pszemraj/download_URLs_in_file.py
pdf downloading utils
import os
import argparse
import requests
from urllib.parse import urlparse
from tqdm import tqdm
from joblib import Parallel, delayed
from tenacity import retry, stop_after_attempt, wait_fixed
@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
@asehmi
asehmi / nougat_em.sh
Created January 23, 2024 07:12 — forked from pszemraj/nougat_em.sh
bash script to apply facebookresearch/nougat on a directory of PDFs
#!/bin/bash
# pip install nougat-ocr
# see https://github.com/facebookresearch/nougat for details and license
DEFAULT_BATCHSIZE=4
usage() {
echo "Usage: $0 <path_to_directory> [--batchsize BATCHSIZE]"
exit 1
@asehmi
asehmi / hf_repo_download.py
Created January 23, 2024 07:15 — forked from pszemraj/hf_repo_download.py
huggingface hub - download a full snapshot of a repository without using git
"""
hf_hub_download.py
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.
Usage:
python hf_hub_download.py <repo_id> [options]
Arguments:
<repo_id> Repository ID in the format "organization/repository".
@asehmi
asehmi / grammar_synthesis.py
Created January 23, 2024 07:17 — forked from pszemraj/grammar_synthesis.py
basic implementation of a custom wrapper class for using the grammar synthesis text2text models
"""
Class for correcting text using a pretrained model grammar synthesis model.
- models are available here: https://hf.co/models?other=grammar%20synthesis
requirements for this snippet:
pip install -U transformers accelerate
NOTE: if you want to use 9-bit to fit the model on a smaller GPU, you need bitsandbytes:
pip install -U transformers accelerate bitsandbytes
@asehmi
asehmi / compute_embeddings_e5.py
Created January 23, 2024 07:19 — forked from pszemraj/compute_embeddings_e5.py
helper script using just transformers/torch to compute text embeddings (for e5 models https://huggingface.co/intfloat/e5-base-v2 )
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
from pandas import DataFrame
from typing import List, Union
from tqdm.auto import tqdm, trange
@asehmi
asehmi / remind.md
Created January 23, 2024 07:42 — forked from pszemraj/remind.md
guess what it does

You:

I want you to act as a hypothetical python package generator and engineer. Create a python package called “remind”

ChatGPT:

As a text-based AI, I am unable to create actual Python packages or code files. However, I can help you design the structure of the package and provide you with code snippets that you can use to create the package yourself.

Package name: remind

Description: A Python package to create, manage, and schedule reminders for tasks and events.

@asehmi
asehmi / caption_image.py
Created January 23, 2024 07:42 — forked from pszemraj/caption_image.py
basic captioning example using the lavis llibrary
# -*- coding: utf-8 -*-
"""
caption_image.py - basic captioning example using lavis
usage: caption_image.py [-h] -i IMAGE_PATH [-m MODEL_TYPE] [-d DEVICE] [-v VERBOSE]
# lavis
https://github.com/salesforce/LAVIS
"""
@asehmi
asehmi / extract_article.py
Created January 23, 2024 07:43 — forked from pszemraj/extract_article.py
Given a URL to a website, extracts all the text article from the URL formatted as markdown.
"""
define fn for extracting articles
Example usage:
url = 'https://www.businessinsider.com/hundreds-google-workers-walk-out-zurich-protest-over-layoffs-2023-3'
markdown = extract_article(url)
print(markdown)
"""
@asehmi
asehmi / imdir2pdf.py
Created January 23, 2024 07:43 — forked from pszemraj/imdir2pdf.py
image to pdf processing for printing art
"""
a script in Python that given a directory path as input, converts all images in the directory to PDF files. Several other argswitches are available for convenience.
"""
import argparse
import logging
import os
import pprint as pp
import sys