Ollin Boer Bohan madebyollin

jupyterlab_vim adds a truly unfortunate keybinding where, if you are in normal mode and press -, the current cell splits into two. I have never intentionally activated this feature yet have unintentionally activated it many dozens of times. It's awful.

Unfortunately, the logical route to disable the split-cell feature (modifying keybinding config files) doesn't work. I don't know why.

Therefore, I now manually locate the vim_bindings file and patch it:

#!/usr/bin/env python3
from pathlib import Path

Single-pass Superconditioning

Motivation

Guided diffusion sampling typically uses two forward passes per step:

One caption-conditional forward pass, to compute E[flow | noisy image, noise level, caption]
One unconditional forward pass, to compute E[flow | noisy image, noise level]

These results are then linearly combined to form a single guided/superconditioned flow prediction.

These useful concepts show up in specific areas of NN-training literature but can be applied pretty broadly.

Non-leaky augmentations: you can add arbitrary augmentations during training, without substantially biasing in-domain performance, by adding a secondary input that tells the network which augmentations were used. This technique shows up the Karras et al image generation papers (ex. https://arxiv.org/pdf/2206.00364) but it's applicable whenever you want good performance on limited data.
Batch-stratified sampling: rather than generating per-sample random numbers with e.g. torch.rand(batch_size), you can use th.randperm(batch_size).add_(th.rand(batch_size)).div_(batch_size) instead, which has the same distribution but lower variance, and therefore trains more stably. This shows up in k-diffusion https://github.com/crowsonkb/k-diffusion/commit/a2b7b5f1ea0d3711a06661ca9e41b4e6089e5707, but it's applicable whenever you're randomizing data across the batch axis.
Replay buffers: when y

Reviewing the Claims of DC-AE

TL;DR - I think the paper is a good contribution and basically holds up, but Figure 2 seems suspicious and the released repo doesn't include the pieces (AE training code and pretrained 4096-element AEs) that would be needed to make DC-AE practically competitive with SD/SDXL VAEs.

DC-AE is an MIT / Tsinghua / NVIDIA paper about improving generative autoencoders (like the SD VAE) under the high-spatial-compression ratio regime.

mysterious software bugs I encounter, like, daily

if I add/destroy enough video tags (i.e. in periodic output display from a long-running jupyter notebook) Safari will eventually stop rendering all video tags across all pages until I reboot the browser
if I interleave autocast and no_grad scopes in the wrong way (I think - autocast with no_grad inside followed by not-no-grad stuff?) pytorch will silently disable grad for the entire autocast scope
if I edit colab notebooks (specific ones? on a specific connection? idk), I intermittently get "this notebook has been modified" conflict-resolution dialogs despite being the only editor of the notebook
if I don't manually disable the collaborative extension on jupyterlab, it will intermittently teleport my cursor focus back to the first cell in the notebook

need to eventually find repro instructions and chase all of these down to fix

Objective

Informal (vibes-based) evaluation of the following vision-language-model captioners:

Florence-2-base-ft
CogVLM2
BLIP-2
MoonDream2
Share-Captioner
Florence-2-SD3-Captioner

	from IPython.display import HTML

	def get_pred_original_sample(sched, model_output, timestep, sample):
	return sample - sched.sigmas[(sched.timesteps == timestep).nonzero().item()] * model_output

	# TODO: fix awful globals
	prev_img_str = None

	def pil_to_html(pil_img, h=IM_HEIGHT, w=IM_WIDTH):
	global prev_img_str

	#!/usr/bin/env python3
	import gradio as gr
	import numpy as np
	import random
	import torch
	from diffusers import (
	StableDiffusion3Pipeline,
	SD3Transformer2DModel,
	FlowMatchEulerDiscreteScheduler,
	AutoencoderTiny,

	def add_profiling_markers(model):
	"""Monkey-patch profiling markers into an nn.Module.

	Args:
	model: an nn.Module

	Effect:
	all model.named_module() forward calls get wrapped in their
	own profiling scope, making traces easier to understand.
	"""