Llama 3.1 405B, 70B, 8B is officially out. Llama 3.1 405B is the first openly available model that matches or beats the best closed models across many benchmarks.
The performance of 405B model is very similar to Claude 3.5 Sonnet. It beats GPT4 on every single benchmark but one.
70B model has an even more impressive performance. It is significantly better than GPT-3.5 Turbo and beats Nemotron 4 340B on many tests.
Try 405B at meta.ai, on WhatsApp or on HuggingChat.
Notable improvements:
- 128k context length.
- Multilingual abilities.
- Function calling and tool use.
- Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere π₯
- 8B and 70B code generation performance improved up to 12%.
- FP8 quantized version available for efficient inference. (Hugging Face provides GPTQ and AWQ quants.)
- Llama Stack API for easy integration.
Important facts:
- Pre-training cut-off date of Dec 2023.
- 405B trained on 15.6T tokens and fine-tuned on 25M human and synthetic examples.
- Leveraged the 405B model to improve the post-training quality of 70B and 8B models.
- TikToken-based tokenizer.
Llama 3.1 collection of large language models (LLMs) will make history with the largest and most capable open model ever released. Thank you for making AI and LLM more accessible.
Blog post: https://ai.meta.com/blog/meta-llama-3-1/
Llama home: https://llama.meta.com
Download weights on llama.meta.com and Hugging Face
Cloud providers playgrounds:
- Groq (70B-versatile): https://console.groq.com/playground
- Together AI: https://api.together.xyz/playground
Paper: https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ (It's so cool to see an exhaustive and extensive technical report.)
Model card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md
GitHub Repo: https://github.com/meta-llama/llama-models/tree/main/models/llama3_1
All details about Llama 3.1 such as VRAM requirements on Hugging Face blog. Learn how to quantize, fine-tune, distil, run inference and more in this blog post. (Overwhelm? If you can only read one thing, let it be this)