pszemraj’s gists

pszemraj / load_and_ensure_tokens.py

Last active January 17, 2024 02:36

loads a Hugging Face Transformers tokenizer, checks for essential special tokens, adds them if necessary

	from transformers import AutoTokenizer


	def load_and_ensure_tokens(model_name):
	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Essential special tokens with their default values
	essential_tokens = {
	"pad_token": "<pad>",

pszemraj / hf_repofolder_watchdog.py

Created January 16, 2024 02:53

upload a folder to Hugging Face Hub and other utils

	import argparse
	import logging
	import time
	from datetime import datetime
	from pathlib import Path
	from typing import Optional

	from huggingface_hub import upload_folder
	from watchdog.events import PatternMatchingEventHandler
	from watchdog.observers import Observer

pszemraj / textgen_inference_code.py

Created January 6, 2024 23:38

example inference script for beecoder-220M-python

	import logging
	import random
	import time
	from pathlib import Path

	import fire
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	logging.basicConfig(format="%(levelname)s - %(message)s", level=logging.INFO)

pszemraj / hf_repofolder_watchdog.py

Created December 12, 2023 01:43

The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.

	"""
	The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.


	pip install huggingface-hub watchdog
	"""
	import argparse
	import logging
	import time
	from pathlib import Path

pszemraj / format2alpaca.py

Created December 8, 2023 23:18

quick formatting function given instruction/input/response cols -> make 'text' col

	import os
	import random

	from datasets import load_dataset


	def format_dataset(example):
	"""Formats the dataset example into a single 'text' field."""

	# Add input only if it is longer than 2 characters

pszemraj / tf32_activate.py

Created December 6, 2023 04:47

sort of manual - Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does.

	import logging
	import subprocess

	import torch


	def check_ampere_gpu():
	"""Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does."""

	# Check if CUDA is available

pszemraj / test_synthsumm.py

Created December 6, 2023 03:16

test out synthsumm summarization models via the free inference api

	import os
	import time

	import requests


	class Timer:
	"""Basic timer utility."""

	def __enter__(self):

pszemraj / ubuntu_util_pkgs.md

Created November 29, 2023 22:53

some ubuntu packages helpful for CPU things related to ML

Useful misc installs

Details

Kernel and Low-Level Tools

Microcode Update: Keeping your CPU microcode updated can help in better performance and security. You can install the AMD microcode package by running:

sudo apt install amd64-microcode

pszemraj / query_wellformedness_score.py

Created November 29, 2023 18:50

inference with a model trained on query well-formedness

	"""
	inference with a model trained on query well-formedness

	https://huggingface.co/Ashishkr/query_wellformedness_score

	pip transformers install accelerate optimum -q
	"""
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

pszemraj / run_summarization_langchain.py

Created November 24, 2023 17:17

summarization with langchain + openai

	"""
	run_langchain_summarization.py - Generate summaries using langchain + LLMs

	For usage details, run `python run_langchain_summarization.py --help` and fire will print the usage details.

	Notes:
	- you need to have OPENAI_API_KEY set as an environment variable (easiest way is export OPENAI_API_KEY=memes123)
	- install the dependencies using the requirements.txt file or below

	pip install fire langchain clean-text tqdm tiktoken

Peter pszemraj

Useful misc installs

Details

Kernel and Low-Level Tools