Skip to content

Instantly share code, notes, and snippets.

@tysam-code
tysam-code / condensed-ml-tidbits.txt
Last active December 11, 2023 04:22
TODOTODOTODOTODO # workinprogress <3 :'))))
# [IN-DEV currently]
# Maintained/Initially created by Fern. Say hi to me and feel free to ask any questions as needed! <3 :'))))
# If anything here is self-cited/has no citation, that means that it's a conclusion I arrived at over time, or in
# deriving something from the basics, however, there may be work elaborating it in further detail (feel free to comment if there's an especially relevant link).
# Misc
- LayerNorm/RMSNorm might be acting as lateral inhibition, a paradigm attempted in many 2000's and surrounding ML papers (Fern, {relevant sources needed})
- 'Soft' (pre-determined or pre-compiled) architectures in the weights of your network can greatly increase convergences times and/or generalization.
- Downcasting dtypes to a lower bit depth in your dot products can be a 'free' efficiency improvement in some circumstances.