Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts and experience preferred (super rare at this point).
My own notes from a few months back.
- Survey of LLMS
- Self-attention and transformer networks
- What are embeddings
- The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)
- Catching up on the weird world of LLMS
- Attention is all you Need
- Scaling Laws for Neural Language Models
- BERT
- Language Models are Unsupervised Multi-Task Learners
- Training Language Models to Follow Instructions
- Language Models are Few-Shot Learners
- Why host your own LLM?
- How to train your own LLMs
- Training Compute-Optimal Large Language Models
- Opt-175B Logbook
- The case for GZIP Classifiers and more on nearest neighbors algos
- Meta Recsys Using and extending Word2Vec
- The State of GPT (YouTube)
- What is ChatGPT doing and why does it work
- How is LlamaCPP Possible?
- On Prompt Engineering
- Transformers from Scratch
- Building LLM Applications for Production
- Challenges and Applications of Large Language Models
- All the Hard Stuff Nobody talks about when building products with LLMs
- Scaling Kubernetes to run ChatGPT
- Numbers every LLM Developer should know
Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.