Fred N. Garvin FNGarvin

Here's a simple script intended to aid tuning hyperparameters for quantization processes. Provided two or more safetensors, the script compares the quantization applied to each layer and attempts to extrapolate the sensitivity of a particular layer to quantization. This, naturally, assumes that at least some of the input quants were created by someone with deep knowledge of the model. A small amount of logic exists to undermine the influence of models that bluntly quantize everything and to exaggerate the influence of models that employ broader ranges of dtypes. It's theoretically possible for a carefully tuned fp8 model that carefully selects layers to preserve at fp32 to outperform a fp16 model that uniformly downsamples. It is hoped that this logic captures that design pattern.

Example usage: strategize_quants.py z_image_turbo_bf16.safetensors z-image-turbo_fp8_scaled_e4m3fn_KJ.safetensors z_image_turbo_nvfp4.safetensors

=========================================================================

Quick example for a short music video:

yt-dlp https://www.youtube.com/watch?v=zn7-fVtT16k -t mp4 -o "Albert Einstein vs Stephen Hawking. Epic Rap Battles of History.mp4"
podman run --rm --user 0:0 -v .:/input ghcr.io/fngarvin/pyscenedetect:fng-infra-docker-ci -i "/input/Albert Einstein vs Stephen Hawking. Epic Rap Battles of History.mp4" detect-content list-scenes -o /input
./analyze_scenes.py "Albert Einstein vs Stephen Hawking. Epic Rap Battles of History-Scenes.csv"

If you don't have the required cuda dev kit already, installing it is a pita in itself (assuming apt-based os and cu13 target, adjust as necessary):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y cuda-toolkit-13-0

Now clone and build Nunchaku:

I own Nvidia and have a preference for Podman containers, so I'm using the stable-diffusion.cpp official cuda image. The same general idea works for running the binaries directly, though.

I am using flux.2 Klein 4b in Q4 for this example because it's tiny, new, and because it supports edit features. You probably need a ~16GB GPU to do 1MP as done here, but even running on the CPU alone should work at ~512x512 (very slowly) if you have 12GB+ of system RAM.

Directory structure for this particular example is setup like this:

 .
├──  models
│   ├──  4b.gguf
│   ├──  qwen3.gguf

I frequently use Podman (it's a Docker alternative that doesn't require a daemon and was quicker to adopt rootless setups with better host security). It's relevant because Podman and Docker both ONLY support Linux. So if you're on Windows, they both require you to install WSL2 (Windows Subsystem for Linux v2). WSL2 is using parts of the Hyper-V architecture to run a full-fledged Linux kernel along the Windows kernel and it does so in a way that gets you easy and well supported access to your hardware without complicated passthrough. Assuming you're using Nvidia, the Nvidia Container Toolkit is specifically oriented towards allowing GPU access in containers, it is extremely robust, and widely used in commercial and enterprise deployments. In WSL, it basically tricks the Linux kernel to use your Windows system drivers.

Everything works GREAT and the performance is basically identical to using native Windows or

Torture test Produced via Gemini and rendered via Nano Banana:

A hyper-realistic full-body fashion editorial shot of a woman wearing a complex, multi-layered avant-garde gown that merges 1930s Surrealism with Brutalist architecture.

The Garment: The base is a floor-length, bias-cut slip made of heavyweight 40mm charcoal silk crepe de chine that clings to the form with liquid-like drape. Over this is an external, architectural cage-crinoline constructed from matte-black structural steel ribs, creating a rigid geometric skeleton around the lower body. Attached to the steel frame are non-repeating laser-cut panels of obsidian-colored cavallino (pony hair) leather in a mathematical Voronoi pattern. The entire ensemble is shrouded in a fine layer of iridescent, translucent silk organza that catches a spectrum of oil-slick light.

The Technical Detail: Focus on the mechanical intersection where the steel ribs meet the silk; specify visible brass industrial rivets and tension-spring fasteners. The leather must sho

The EASIEST way to get going creating images RIGHT NOW is with stable-diffusion.cpp. It has no external dependencies, so you don't have to fight Python or torch. Download the cuda binaries and exe if you have NVidia or otherwise grab this for Vulkan that runs on Intel, NVidia, and AMD.

Unzip and it should look something like this:

Download some models to the same folder. Getting these three would be a good start: model, vae, llm.

Write a batch

	#!/usr/bin/env python3
	"""
	FNGarvin
	Batch run ComfyUI workflow for multiple animals.
	TODO: INSERT LICENSE
	2026
	"""

	import json
	import requests

	#!/usr/bin/env python3
	"""
	Author: FNGarvin
	License: MIT
	Usage: snoop_overlays

	Maps every folder in the overlay directory to Podman objects with
	accurate size reporting using a robust find-sum method.
	"""

	rm -f manifest_aware.txt ; find . \( -name .git -o -name models -o -name .venv -o -name __pycache__ -o -name manifest* \) -prune -o -type f -exec sha256sum {} + \| sort -k 2 > manifest_dir1.txt

	rm -f manifest_aware.txt ; find . \( -name .git -o -name models -o -name .venv -o -name __pycache__ -o -name manifest* \) -prune -o -type f -exec sha256sum {} + \| sort -k 2 > manifest_dir2.txt

	etc, then inspect, diff, or compare checksums of the two manifests.