Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / textgen_inference_code.py
Created January 6, 2024 23:38
example inference script for beecoder-220M-python
import logging
import random
import time
from pathlib import Path
import fire
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
logging.basicConfig(format="%(levelname)s - %(message)s", level=logging.INFO)
@pszemraj
pszemraj / hf_repofolder_watchdog.py
Created December 12, 2023 01:43
The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.
"""
The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.
pip install huggingface-hub watchdog
"""
import argparse
import logging
import time
from pathlib import Path
@pszemraj
pszemraj / format2alpaca.py
Created December 8, 2023 23:18
quick formatting function given instruction/input/response cols -> make 'text' col
import os
import random
from datasets import load_dataset
def format_dataset(example):
"""Formats the dataset example into a single 'text' field."""
# Add input only if it is longer than 2 characters
@pszemraj
pszemraj / tf32_activate.py
Created December 6, 2023 04:47
sort of manual - Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does.
import logging
import subprocess
import torch
def check_ampere_gpu():
"""Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does."""
# Check if CUDA is available
@pszemraj
pszemraj / test_synthsumm.py
Created December 6, 2023 03:16
test out synthsumm summarization models via the free inference api
import os
import time
import requests
class Timer:
"""Basic timer utility."""
def __enter__(self):
@pszemraj
pszemraj / ubuntu_util_pkgs.md
Created November 29, 2023 22:53
some ubuntu packages helpful for CPU things related to ML

Useful misc installs

Details

Kernel and Low-Level Tools

  1. Microcode Update: Keeping your CPU microcode updated can help in better performance and security. You can install the AMD microcode package by running:

sudo apt install amd64-microcode

@pszemraj
pszemraj / query_wellformedness_score.py
Created November 29, 2023 18:50
inference with a model trained on query well-formedness
"""
inference with a model trained on query well-formedness
https://huggingface.co/Ashishkr/query_wellformedness_score
pip transformers install accelerate optimum -q
"""
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
@pszemraj
pszemraj / run_summarization_langchain.py
Created November 24, 2023 17:17
summarization with langchain + openai
"""
run_langchain_summarization.py - Generate summaries using langchain + LLMs
For usage details, run `python run_langchain_summarization.py --help` and fire will print the usage details.
Notes:
- you need to have OPENAI_API_KEY set as an environment variable (easiest way is export OPENAI_API_KEY=memes123)
- install the dependencies using the requirements.txt file or below
pip install fire langchain clean-text tqdm tiktoken
@pszemraj
pszemraj / eval_101m_gqa.log
Created November 15, 2023 18:45
eval logs for smol_llama variants courtesy of @Tanmay
This file has been truncated, but you can view the full file.
BEE-spoke-data/smol_llama-101M-GQA with Model Revision: .
Output dir: 101mgqa
Batch Size: 64
Device ID: cuda:2
Setting number of workers to 4
[dynet] random seed: 1234
[dynet] allocating memory: 32MB
[dynet] memory allocation done.
Token indices sequence length is longer than the specified maximum sequence length for this model (1377 > 1024). Running this sequence through the model will result in indexing errors
Selected Tasks: ['arc_easy', 'boolq', 'lambada_openai', 'openbookqa', 'piqa', 'winogrande']
@pszemraj
pszemraj / process_audio_distil_whisper.py
Created November 9, 2023 03:14
# Function to process audio using distil-whisper
from pathlib import Path
import logging
from typing import Optional, Union
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
# Function to process audio using distil-whisper
def process_audio_distil_whisper(
audio_path: Union[str, Path],