Peter pszemraj

How to Use

The metric takes a list of text as input, as well as the name of the model used to compute the metric:

from evaluate import load
perplexity = load("perplexity", module_type="metric")
results = perplexity.compute(predictions=predictions, model_id='gpt2')

whisper audio transcription - batched on directory

This gist is a super basic helper script for OpenAI's Whisper model and associated CLI to transcribe a directory of video files to text.

install

for linux/ubuntu you can run the helper script:

bash linux_setup.sh

Image Classification Telegram Bot

This script runs a Telegram bot that classifies images using a pre-trained model. The bot handles /start and /help commands, as well as photo messages. When a photo message is received, the bot downloads the photo, classifies it, and sends a message with the prediction.

The original intended use case is to classify if an image contains a slug or not:

credit to this demo https://huggingface.co/spaces/MasleK/Snails_Snakes_Slugs

reference for run_summarization

reference for transformers 4.30.0.dev0

about

The below options are additional configuration parameters that can be used when training a model with Hugging Face Transformers. These options control various aspects of the training process, such as the optimizer to use, data loading settings, memory management, model evaluation, checkpointing, and integration with the Hugging Face Model Hub.

Here is a summary of the high-level functionalities provided by some of the options:

	"""
	run_cohere_summarization.py - Summarize text files with Co.Summarize API as a python script
	"""
	import argparse
	import json
	import logging
	import os
	import pprint as pp
	import random
	import shutil

	import subprocess
	import torch

	def check_ampere_gpu():
	"""Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does."""
	cmd = "nvidia-smi --query-gpu=name --format=csv,noheader"
	output = subprocess.check_output(cmd, shell=True, universal_newlines=True)
	gpu_name = output.strip()
	if "A100" in gpu_name or "A6000" in gpu_name or "RTX 30" in gpu_name:
	torch.backends.cuda.matmul.allow_tf32 = True

	"""
	eval_summaries.py - evaluate summary/document pairs via a variety of metrics,

	Metrics include max salient similarity, topic similarity, compression factor,
	readability scores, and spelling error fraction

	details:
	python eval_summaries.py --help

	this script was developed while evaluating summaries generated with the textsum package

	"""
	inference_openai.py - text generation with OpenAI API

	See https://platform.openai.com/docs/quickstart for more details.

	Usage:
	python inference_openai.py --prompt "The quick brown fox jumps over the lazy dog." --model "gpt-3.5-turbo" --temperature 0.5 --max_tokens 256 --n 1 --stop "."

	Detailed usage:
	python inference_openai.py --help

	"""
	hf_hub_download.py

	This script allows you to download a snapshot repository from the Hugging Face Hub to a local directory without needing Git or loading the model.

	Usage:
	python hf_hub_download.py <repo_id> [options]

	Arguments:
	<repo_id> Repository ID in the format "organization/repository".

	URL=https://www.dropbox.com/sh/zu1p7rhg5238a5y/AABsJN_pCYf9plSDZY8ziKATa?dl=1
	wget -O docs.zip $URL
	unzip -B -j docs.zip -d gauntlet && rm -rf docs.zip