Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / sncl.md
Created August 5, 2024 03:26
Schrödinger's Non-Commercial License (SNCL) v1.0 draft

Schrödinger's Non-Commercial License (SNCL) v1.0

Preamble:

This license is designed to allow users to freely use, modify, and distribute the software for non-commercial purposes. It recognizes the challenges in defining what constitutes commercial activity and offers guidance and flexibility for users who are unsure about the nature of their activities.

1. Grant of License

Subject to the terms and conditions of this License, the Licensor hereby grants to the Licensee a worldwide, royalty-free, non-exclusive license to use, modify, and distribute the Software, provided that such activities are conducted for Non-Commercial Purposes, as defined below.

import streamlit as st
import pandas as pd
from datasets import load_from_disk
import textwrap
import json
# Constants
ROWS_PER_PAGE = 100
LOGO_URL = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/datasets_logo.png"
DOCS_URL = "https://huggingface.co/docs/datasets/index"
@pszemraj
pszemraj / parse_nanoT5_log.py
Last active August 10, 2024 13:56
Process log file from nanoT5
"""
parses the standard main.log from nanoT5 and makes some plots
pip install matplotlib pandas seaborn
"""
import argparse
import logging
import os
import re
from pathlib import Path
@pszemraj
pszemraj / recursive_model_summary.py
Created July 23, 2024 04:19
print out a summary of a pytorch model
from typing import List, Tuple, Optional, Set
import torch.nn as nn
from transformers import PreTrainedModel
def model_summary(
model: PreTrainedModel, max_depth: int = 4, show_input_size: bool = False
) -> None:
"""
Prints an accurate summary of the model, avoiding double-counting of parameters.
@pszemraj
pszemraj / is_image_url.py
Created July 13, 2024 22:07
simple fn using regex to check if a string url points to an image file
import re
# List of common image file extensions
image_extensions = [
'jpg', 'jpeg', 'jpe', 'jif', 'jfif', 'jfi', # JPEG
'png', # PNG
'gif', # GIF
'webp', # WebP
'tiff', 'tif', # TIFF
'bmp', # BMP
@pszemraj
pszemraj / matmul_free.md
Created June 6, 2024 03:35
Technical Overview and Explanation of "Scalable MatMul-free Language Modeling" by gpt-4o

Technical Overview and Explanation of "Scalable MatMul-free Language Modeling"

Introduction

This paper presents a novel approach to large language models (LLMs) that eliminates matrix multiplication (MatMul) operations, which are typically the most computationally expensive part of such models. By doing so, the authors aim to significantly reduce memory usage and improve computational efficiency, enabling the models to scale up to billions of parameters while maintaining performance comparable to state-of-the-art Transformers.

Key Contributions

  1. MatMul-Free Dense Layers: The core innovation lies in replacing MatMul operations in dense layers with addition operations using ternary weights. These ternary weights take values from {-1, 0, +1}, which allows matrix multiplications to be transformed into simple additions and subtractions.

We shall also address some key issues related to space optimization, such as the use of microelectronic technologies (e.g., electronic sensors) and computational modeling. Finally, we'll consider another challenge facing the next edition of our series--the sea of underwater buildings.

2 What Is Ocean Architecture?

Ocean architecture refers to how objects move through time. While many ideas have emerged since antiquity, archaeologists believe that the structure of the earth was constructed using mechanical and chemical processes. At one level, there were two basic types of structures - mechanical systems and gravity systems. Metamorphoses may be thought of as building blocks of modern society:

a) The physical organization of each object. An organism moves through its own frame when it moves, but does not move or carry out any actions; or b) The interaction between different parts of the body depends on the environment surrounding it. The process of motion determines how far away the animal lives and wha

@pszemraj
pszemraj / USAGE.md
Last active March 10, 2025 05:41
how to use unsloth grad checkpointing

usage

Credit/source: here

how to use unsloth grad checkpointing

steps

To integrate the provided monkey patch for offloading gradient checkpointing into the Hugging Face transformers library, you need to follow these steps:

@pszemraj
pszemraj / jamba900m_11k.md
Last active May 18, 2024 03:39
this took 7 mins and 2 gb vram. yep 2 gb. the generated text has not been edited in any way, just saved as .md

Introduction

At the heart of every meme generation lies the concept of what it means to be human. While we do so by providing us with powerful examples and narratives, today's consumers need to understand the nature of their behavior, how they behave, and what happens when these people are harmed. Our goal in creating mesmerizing and exciting stories for our audiences is to share our stories without distractions, and with no limits on how much effort may go into making them. To achieve this goal, we aim to build an interactive narrative of our experience. In addition, we encourage visitors to use their stories to explore ways that the future might look like and apply the lessons learned. With the introduction of Android, there has been significant growth in mobile apps with unprecedented popularity and success. For example, Apple has created a series of popular smartphones with impressive user interface features. These include the Google Play Store, Facebook Live, Twitter, YouTube, and Spotify. From browsin

@pszemraj
pszemraj / ft_flan.sh
Last active November 5, 2024 05:49
bash script for basic testing with pile-t5-large. note that this uses 1024 as the seq length for in/ 512 out
#!/bin/bash
# Set environment variables
export WANDB_PROJECT="text2text-flan"
export WANDB_WATCH="gradients"
export WANDB_ENTITY="pszemraj"
export TOKENIZERS_PARALLELISM=true
NUM_WORKERS=$(lscpu -p | egrep -v '^#' | sort -u -t, -k 2,4 | wc -l)
echo "Number of CPU cores: $NUM_WORKERS"