- Mastering LLM Techniques: Inference Optimization
- LLM Inference Series: 1. Introduction
- How Transformers Work: A Detailed Exploration of Transformer Architecture
- DeepSpeed Deep Dive
- Transformers Explained Visually
- The Illustrated Transformer
- All you need to know about ‘Attention’ and ‘Transformers’ — In-depth Understanding — Part 1
- All you need to know about ‘Attention’ and ‘Transformers’ — In-depth Understanding — Part 2
- Build your own Transformer from scratch using Pytorch
- A collection of resources to study Transformers in depth
- Numbers every LLM Developer should know
- LLM Inference Series: 3. KV caching explained
- The GPT-3 Architecture, on a Napkin
- Transformer Inference Arithmetic
- A guide to LLM inference and performance
- LLM Inference Sizing and Performance Guidance
Last active
March 26, 2025 11:51
-
-
Save aagontuk/9d407a54422de0f4418fc4c762ef2e95 to your computer and use it in GitHub Desktop.
LLM resources
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment