Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / compress_PDF_recursive.sh
Created December 11, 2025 03:14
gs-based PDF compressor, default is heavy compression for VLM input
#!/usr/bin/env bash
set -euo pipefail
# Modern PDF compressor with LLM-optimized defaults
# Requires: ghostscript (gs)
VERSION="1.0.0"
SCRIPT_NAME=$(basename "$0")
# Defaults (LLM preset)
@pszemraj
pszemraj / rnj_inference.py
Created December 10, 2025 04:43
inference with rnj-1
import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
# 1. Basic Timer Context Manager
class Timer:
def __init__(self, name="Task"):
self.name = name
@pszemraj
pszemraj / image_tiling_vlm.py
Created December 4, 2025 21:51
slightly optimized image tiling for VLMs based on "Jina-VLM Small Multilingual Vision Language Model"
"""
slightly optimized image tiling for vlms based on "Jina-VLM: Small Multilingual Vision Language Model"
Based on the pseudocode in Appendix A.1: https://arxiv.org/abs/2512.04032
"""
import math
from typing import List, Tuple
import torch
import torch.nn.functional as F
@pszemraj
pszemraj / colab_output_format.py
Created December 2, 2025 23:07
put this in its own cell at the start of your notebook
#@markdown add auto-Colab formatting with `IPython.display`
from IPython.display import HTML, display
# colab formatting
def set_css():
display(
HTML(
"""
<style>
pre {
white-space: pre-wrap;
@pszemraj
pszemraj / eggroll_numpy.py
Last active November 25, 2025 01:09
working eggroll impl from various LLMs & yours truly
"""
EGGROLL: Evolution Guided General Optimization via Low-rank Learning
NumPy Implementation - Direct translation of working PyTorch code
Paper: arXiv:2511.16652v1
"""
import numpy as np
from dataclasses import dataclass
from typing import Tuple, Optional
@pszemraj
pszemraj / ul2.py
Created November 15, 2025 00:39
UL2 Data Collator for PyTorch + Transformers
"""
UL2 Data Collator for PyTorch + Transformers
==============================================
Standalone implementation of UL2 (Unified Language Learner) denoising objectives
for encoder-decoder models (T5, UL2, Flan-T5, etc.).
Based on: "Unifying Language Learning Paradigms" (Tay et al., 2022)
https://arxiv.org/abs/2205.05131
@pszemraj
pszemraj / vid_dedupe_gve.py
Created November 6, 2025 01:18
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Apache-2.0
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
Design highlights
- Embeddings: GVE (Qwen2.5-VL based) last-token pooled + ℓ2-normalized (bf16/float16), per paper/model card.
- Test-time policy: 8 frames baseline, scale with duration (16/32/48) for long videos; ~200 visual tokens per frame.
@pszemraj
pszemraj / llama.cpp-issue.md
Created November 5, 2025 00:40
issue with llama.cpp server (multimodal) lfm2-vl

Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

  • Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
  • Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
  • Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
  • Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.
@pszemraj
pszemraj / inference_example_lfm2vl_3b.py
Last active November 5, 2025 04:51
inference with 3b
"""
example script for inference with LFM2-VL-3B model
https://hf.co/LiquidAI/LFM2-VL-3B
"""
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.image_utils import load_image
# Load model and processor