Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / README.md
Last active November 9, 2022 16:45
basic huggingface evaluate-based perplexity resources. All credit to https://huggingface.co/spaces/evaluate-metric/perplexity

How to Use

The metric takes a list of text as input, as well as the name of the model used to compute the metric:

from evaluate import load
perplexity = load("perplexity", module_type="metric")
results = perplexity.compute(predictions=predictions, model_id='gpt2')
@pszemraj
pszemraj / batch_whisper_USAGE.md
Last active May 8, 2023 19:00
apply whisper to a folder of video files. cuts out silence from videos with FFMPEG and converts to audio, then runs whisper.

whisper audio transcription - batched on directory

This gist is a super basic helper script for OpenAI's Whisper model and associated CLI to transcribe a directory of video files to text.

install

for linux/ubuntu you can run the helper script:

bash linux_setup.sh
@pszemraj
pszemraj / run_cohere_summarization.py
Last active February 27, 2023 04:00
script to test summarization with the Cohere API
"""
run_cohere_summarization.py - Summarize text files with Co.Summarize API as a python script
"""
import argparse
import json
import logging
import os
import pprint as pp
import random
import shutil
@pszemraj
pszemraj / enable_tf32.py
Created March 7, 2023 22:22
basic rough fn
import subprocess
import torch
def check_ampere_gpu():
"""Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does."""
cmd = "nvidia-smi --query-gpu=name --format=csv,noheader"
output = subprocess.check_output(cmd, shell=True, universal_newlines=True)
gpu_name = output.strip()
if "A100" in gpu_name or "A6000" in gpu_name or "RTX 30" in gpu_name:
torch.backends.cuda.matmul.allow_tf32 = True
@pszemraj
pszemraj / dl_gauntlet.sh
Last active March 15, 2023 22:56
download "gauntlet" for summarization (peter's version) and run summarization inference on it with the textsum package
URL=https://www.dropbox.com/sh/zu1p7rhg5238a5y/AABsJN_pCYf9plSDZY8ziKATa?dl=1
wget -O docs.zip $URL
unzip -B -j docs.zip -d gauntlet && rm -rf docs.zip
@pszemraj
pszemraj / eval_summaries.py
Last active September 4, 2023 16:38
unsupervised summary eval using several metrics, including a new 'max salient similarity' score to compute faithfulness w.r.t. original document.
"""
eval_summaries.py - evaluate summary/document pairs via a variety of metrics,
Metrics include max salient similarity, topic similarity, compression factor,
readability scores, and spelling error fraction
details:
python eval_summaries.py --help
this script was developed while evaluating summaries generated with the textsum package
@pszemraj
pszemraj / inference_openai.py
Last active March 10, 2025 08:10
basic openai chat completion example
"""
inference_openai.py - text generation with OpenAI API
See https://platform.openai.com/docs/quickstart for more details.
Usage:
python inference_openai.py --prompt "The quick brown fox jumps over the lazy dog." --model "gpt-3.5-turbo" --temperature 0.5 --max_tokens 256 --n 1 --stop "."
Detailed usage:
python inference_openai.py --help
@pszemraj
pszemraj / bot_readme.md
Last active May 22, 2023 17:45
for Gio's slugbot project

Image Classification Telegram Bot

This script runs a Telegram bot that classifies images using a pre-trained model. The bot handles /start and /help commands, as well as photo messages. When a photo message is received, the bot downloads the photo, classifies it, and sends a message with the prediction.

The original intended use case is to classify if an image contains a slug or not:

is it a slug

@pszemraj
pszemraj / hf_repo_download.py
Last active January 23, 2024 07:15
huggingface hub - download a full snapshot of a repository without using git
"""
hf_hub_download.py
This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.
Usage:
python hf_hub_download.py <repo_id> [options]
Arguments:
<repo_id> Repository ID in the format "organization/repository".

reference for run_summarization

reference for transformers 4.30.0.dev0

about

The below options are additional configuration parameters that can be used when training a model with Hugging Face Transformers. These options control various aspects of the training process, such as the optimizer to use, data loading settings, memory management, model evaluation, checkpointing, and integration with the Hugging Face Model Hub.

Here is a summary of the high-level functionalities provided by some of the options: