# Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts and experience preferred (super rare at this point).

[My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) 

## Background

+ [Survey of LLMS](https://arxiv.org/abs/2303.18223)
+ [Self-attention and transformer networks](https://sebastianraschka.com/blog/2021/dl-course.html#l19-self-attention-and-transformer-networks) 
+ [What are embeddings](https://vickiboykis.com/what_are_embeddings/)
+ [The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)](https://www.youtube.com/watch?v=ISPId9Lhc1g)
+ [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)


## Foundational Papers

+ [Attention is all you Need](https://arxiv.org/abs/1706.03762)
+ [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
+ [BERT](https://arxiv.org/abs/1810.04805)
+ [Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
+ [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
+ [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) 


## Training Your Own
+ [Why host your own LLM?](http://marble.onl/posts/why_host_your_own_llm.html)
+ [How to train your own LLMs](https://blog.replit.com/llm-training)
+ Training [Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)
+ [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf)

## Algos

+ [The case for GZIP Classifiers](https://nlpnewsletter.substack.com/p/flashier-attention-gzip-classifiers) and [more on nearest neighbors algos](https://magazine.sebastianraschka.com/p/large-language-models-and-nearest)
+ [Meta Recsys Using and extending Word2Vec](https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system)
+ [The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A)
+ [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
+ [How is LlamaCPP Possible?](https://finbarr.ca/how-is-llama-cpp-possible/)
+ [On Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
+ [Transformers from Scratch](https://e2eml.school/transformers.html)

## Deployment 

+ [Building LLM Applications for Production](https://huyenchip.com/2023/04/11/llm-engineering.html)
+ [Challenges and Applications of Large Language Models](https://arxiv.org/abs/2307.10169)
+ [All the Hard Stuff Nobody talks about when building products with LLMs ](https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm)
+ [Scaling Kubernetes to run ChatGPT](https://openai.com/research/scaling-kubernetes-to-7500-nodes)
+ [Numbers every LLM Developer should know](https://github.com/ray-project/llm-numbers)

## Evaluation

+ [Interpretable Machine Learning](https://arxiv.org/abs/2103.11251)
+ [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/)
+ [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023)


## UX

+ [Generative Interfaces Beyond Chat (YouTube)](https://www.youtube.com/watch?v=rd-J3hmycQs)
+ [Why Chatbots are not the Future](https://wattenberger.com/thoughts/boo-chatbots)

Thanks to everyone who added suggestions on [Twitter](https://twitter.com/vboykis/status/1691530859575214081), [Mastodon](https://jawns.club/@vicki/110895263087386568), and Bluesky.