# Anti-hype LLM reading list Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts and experience preferred (super rare at this point). [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) ## Background + [Survey of LLMS](https://arxiv.org/abs/2303.18223) + [Self-attention and transformer networks](https://sebastianraschka.com/blog/2021/dl-course.html#l19-self-attention-and-transformer-networks) + [What are embeddings](https://vickiboykis.com/what_are_embeddings/) + [The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)](https://www.youtube.com/watch?v=ISPId9Lhc1g) + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms) ## Foundational Papers + [Attention is all you Need](https://arxiv.org/abs/1706.03762) + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361) + [BERT](https://arxiv.org/abs/1810.04805) + [Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) + [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155) + [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) ## Training Your Own + [Why host your own LLM?](http://marble.onl/posts/why_host_your_own_llm.html) + [How to train your own LLMs](https://blog.replit.com/llm-training) + Training [Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556) + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf) ## Algos + [The case for GZIP Classifiers](https://nlpnewsletter.substack.com/p/flashier-attention-gzip-classifiers) and [more on nearest neighbors algos](https://magazine.sebastianraschka.com/p/large-language-models-and-nearest) + [Meta Recsys Using and extending Word2Vec](https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system) + [The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A) + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) + [How is LlamaCPP Possible?](https://finbarr.ca/how-is-llama-cpp-possible/) + [On Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/) + [Transformers from Scratch](https://e2eml.school/transformers.html) ## Deployment + [Building LLM Applications for Production](https://huyenchip.com/2023/04/11/llm-engineering.html) + [Challenges and Applications of Large Language Models](https://arxiv.org/abs/2307.10169) + [All the Hard Stuff Nobody talks about when building products with LLMs ](https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm) + [Scaling Kubernetes to run ChatGPT](https://openai.com/research/scaling-kubernetes-to-7500-nodes) + [Numbers every LLM Developer should know](https://github.com/ray-project/llm-numbers) ## Evaluation + [Interpretable Machine Learning](https://arxiv.org/abs/2103.11251) + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/) + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023) ## UX + [Generative Interfaces Beyond Chat (YouTube)](https://www.youtube.com/watch?v=rd-J3hmycQs) + [Why Chatbots are not the Future](https://wattenberger.com/thoughts/boo-chatbots) Thanks to everyone who added suggestions on [Twitter](https://twitter.com/vboykis/status/1691530859575214081), [Mastodon](https://jawns.club/@vicki/110895263087386568), and Bluesky.