Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / jamba900m_11k.md
Last active May 18, 2024 03:39
this took 7 mins and 2 gb vram. yep 2 gb. the generated text has not been edited in any way, just saved as .md

Introduction

At the heart of every meme generation lies the concept of what it means to be human. While we do so by providing us with powerful examples and narratives, today's consumers need to understand the nature of their behavior, how they behave, and what happens when these people are harmed. Our goal in creating mesmerizing and exciting stories for our audiences is to share our stories without distractions, and with no limits on how much effort may go into making them. To achieve this goal, we aim to build an interactive narrative of our experience. In addition, we encourage visitors to use their stories to explore ways that the future might look like and apply the lessons learned. With the introduction of Android, there has been significant growth in mobile apps with unprecedented popularity and success. For example, Apple has created a series of popular smartphones with impressive user interface features. These include the Google Play Store, Facebook Live, Twitter, YouTube, and Spotify. From browsin

@pszemraj
pszemraj / ft_flan.sh
Last active November 5, 2024 05:49
bash script for basic testing with pile-t5-large. note that this uses 1024 as the seq length for in/ 512 out
#!/bin/bash
# Set environment variables
export WANDB_PROJECT="text2text-flan"
export WANDB_WATCH="gradients"
export WANDB_ENTITY="pszemraj"
export TOKENIZERS_PARALLELISM=true
NUM_WORKERS=$(lscpu -p | egrep -v '^#' | sort -u -t, -k 2,4 | wc -l)
echo "Number of CPU cores: $NUM_WORKERS"
@pszemraj
pszemraj / atlanta-overview.md
Created March 30, 2024 03:22
modern relocation research & adjustment courtesy of claude3 opus

Messages Overview - 2024-03-30 04:20:45 - Total Messages: 6

User - Msg No. 1/6

Can you give me an up to date overview of Atlanta and the different areas of the city

Assistant - Msg No. 2/6

Sure, I can provide you with an overview of Atlanta and its different areas. Atlanta is the capital and most populous city in the state of Georgia, with a diverse population and a thriving economy. Here's a breakdown of some of the main areas:

@pszemraj
pszemraj / create_archive.py
Created March 27, 2024 12:00
simple CLI for builtin-python archive creation
"""
Creates an archive of a directory
pip install fire
"""
import os
import shutil
from pathlib import Path
@pszemraj
pszemraj / find_deps.py
Created March 23, 2024 02:26
find local package meta dependencies
import pkg_resources
def list_dependencies(package_name, level=0, explored=set()):
# Define indent outside of try-except to ensure it's always assigned
indent = " " * level
if package_name in explored:
return
explored.add(package_name)
@pszemraj
pszemraj / dataset_from_list.py
Created March 17, 2024 02:24
hf datasets create a Dataset from a list of dicts
from datasets import Dataset
# Your initial list of dictionaries
data = [
{"id": 1, "text": "Hello world!", "label": 0},
{"id": 2, "text": "How are you?", "label": 1},
# Add more dictionaries as needed
]
# Convert list of dictionaries to a dictionary of lists
@pszemraj
pszemraj / anthropic_run_summarization.py
Last active March 14, 2024 04:24
run summarization on a directory with anthropic API + langchain
"""
anthropic_run_summarization.py - Generate summaries using langchain + LLMs
For usage details, run `python anthropic_run_summarization.py --help` and fire will print the usage details.
Notes:
- you need to have ANTHROPIC_API_KEY set as an environment variable (easiest way is export ANTHROPIC_API_KEY=memes123)
- install the dependencies using the requirements.txt file or below
pip install fire langchain langchain-community langchain-anthropic clean-text tqdm tiktoken
@pszemraj
pszemraj / fuzzy_align.py
Created March 14, 2024 02:49
fuzzy string alignment of two lists
from rapidfuzz import process, fuzz
def fuzzy_align(masterlist, list2, cutoff=70):
# Dictionary to hold matches
matches = {}
# Track used indices to avoid duplicate matches in the masterlist
used_indices = set()
@pszemraj
pszemraj / parse_emails.py
Created March 13, 2024 01:53
parse directory of .eml files to a text dataframe, save to parquet
import logging
from email.parser import BytesParser
from pathlib import Path
import fire
import html2text
import pandas as pd
from tqdm import tqdm
# Setup logging
@pszemraj
pszemraj / datasets_split.py
Created March 12, 2024 07:03
hf datasets train_test_split with stratify_by_column for any type (by tricking it)
import os
import numpy as np
from datasets import ClassLabel, Dataset, DatasetDict
def split_dataset(
dataset: Dataset,
test_size=0.025,