This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
import torch.nn.functional as F | |
from torch import Tensor | |
from transformers import AutoTokenizer, AutoModel | |
from pandas import DataFrame | |
from typing import List, Union | |
from tqdm.auto import tqdm, trange | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Class for correcting text using a pretrained model grammar synthesis model. | |
- models are available here: https://hf.co/models?other=grammar%20synthesis | |
requirements for this snippet: | |
pip install -U transformers accelerate | |
NOTE: if you want to use 9-bit to fit the model on a smaller GPU, you need bitsandbytes: | |
pip install -U transformers accelerate bitsandbytes |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
hf_hub_download.py | |
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model. | |
Usage: | |
python hf_hub_download.py <repo_id> [options] | |
Arguments: | |
<repo_id> Repository ID in the format "organization/repository". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# pip install nougat-ocr | |
# see https://github.com/facebookresearch/nougat for details and license | |
DEFAULT_BATCHSIZE=4 | |
usage() { | |
echo "Usage: $0 <path_to_directory> [--batchsize BATCHSIZE]" | |
exit 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import argparse | |
import requests | |
from urllib.parse import urlparse | |
from tqdm import tqdm | |
from joblib import Parallel, delayed | |
from tenacity import retry, stop_after_attempt, wait_fixed | |
@retry(stop=stop_after_attempt(5), wait=wait_fixed(2)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pretty print tables summarizing properties of tensor arrays in numpy, pytorch, jax, etc. |
NewerOlder