These useful concepts show up in specific areas of NN-training literature but can be applied pretty broadly.
- Non-leaky augmentations: you can add arbitrary augmentations during training, without substantially biasing in-domain performance, by adding a secondary input that tells the network which augmentations were used. This technique shows up the Karras et al image generation papers (ex. https://arxiv.org/pdf/2206.00364) but it's applicable whenever you want good performance on limited data.
- Batch-stratified sampling: rather than generating per-sample random numbers with e.g.
torch.rand(batch_size)
, you can useth.randperm(batch_size).add_(th.rand(batch_size)).div_(batch_size)
instead, which has the same distribution but lower variance, and therefore trains more stably. This shows up in k-diffusion https://github.com/crowsonkb/k-diffusion/commit/a2b7b5f1ea0d3711a06661ca9e41b4e6089e5707, but it's applicable whenever you're randomizing data across the batch axis. - Replay buffers: when y