Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / image_tiling_vlm.py
Created December 4, 2025 21:51
slightly optimized image tiling for VLMs based on "Jina-VLM Small Multilingual Vision Language Model"
"""
slightly optimized image tiling for vlms based on "Jina-VLM: Small Multilingual Vision Language Model"
Based on the pseudocode in Appendix A.1: https://arxiv.org/abs/2512.04032
"""
import math
from typing import List, Tuple
import torch
import torch.nn.functional as F
@pszemraj
pszemraj / colab_output_format.py
Created December 2, 2025 23:07
put this in its own cell at the start of your notebook
#@markdown add auto-Colab formatting with `IPython.display`
from IPython.display import HTML, display
# colab formatting
def set_css():
display(
HTML(
"""
<style>
pre {
white-space: pre-wrap;
@pszemraj
pszemraj / eggroll_numpy.py
Last active November 25, 2025 01:09
working eggroll impl from various LLMs & yours truly
"""
EGGROLL: Evolution Guided General Optimization via Low-rank Learning
NumPy Implementation - Direct translation of working PyTorch code
Paper: arXiv:2511.16652v1
"""
import numpy as np
from dataclasses import dataclass
from typing import Tuple, Optional
@pszemraj
pszemraj / ul2.py
Created November 15, 2025 00:39
UL2 Data Collator for PyTorch + Transformers
"""
UL2 Data Collator for PyTorch + Transformers
==============================================
Standalone implementation of UL2 (Unified Language Learner) denoising objectives
for encoder-decoder models (T5, UL2, Flan-T5, etc.).
Based on: "Unifying Language Learning Paradigms" (Tay et al., 2022)
https://arxiv.org/abs/2205.05131
@pszemraj
pszemraj / vid_dedupe_gve.py
Created November 6, 2025 01:18
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Apache-2.0
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
Design highlights
- Embeddings: GVE (Qwen2.5-VL based) last-token pooled + ℓ2-normalized (bf16/float16), per paper/model card.
- Test-time policy: 8 frames baseline, scale with duration (16/32/48) for long videos; ~200 visual tokens per frame.
@pszemraj
pszemraj / llama.cpp-issue.md
Created November 5, 2025 00:40
issue with llama.cpp server (multimodal) lfm2-vl

Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

  • Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
  • Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
  • Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
  • Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.
@pszemraj
pszemraj / inference_example_lfm2vl_3b.py
Last active November 5, 2025 04:51
inference with 3b
"""
example script for inference with LFM2-VL-3B model
https://hf.co/LiquidAI/LFM2-VL-3B
"""
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.image_utils import load_image
# Load model and processor
@pszemraj
pszemraj / encoding_visualizer.py
Last active October 1, 2025 13:21
helper scripts for tokenizer encoding viz
import argparse
import webbrowser
from pathlib import Path
from typing import Any, Callable, Optional, Union
from tokenizers import Tokenizer as RustTokenizer
from tokenizers.tools import EncodingVisualizer
from transformers import AutoTokenizer, PreTrainedTokenizerBase
SAMPLE_TEXT = '''class DyT(nn.Module):
@pszemraj
pszemraj / clipboard_helper_xclip.sh
Last active November 18, 2025 16:22
cz() two letter clipboard helper for linux/xclip
# Copy file contents or stdin to clipboard
# Usage: cz [file]
# cz file.txt - copy file to clipboard
# cmd | cz - copy stdin to clipboard
# Fails on: binary files, files >10MB, non-existent files
cz() {
if [ -z "$1" ]; then
xclip -selection clipboard
elif [ -f "$1" ]; then
# Check if it's a text file