Skip to content

Instantly share code, notes, and snippets.

@cedrickchee
Last active July 24, 2024 04:54
Show Gist options
  • Save cedrickchee/4c3fc733eb1205cb69b12231d4e857bd to your computer and use it in GitHub Desktop.
Save cedrickchee/4c3fc733eb1205cb69b12231d4e857bd to your computer and use it in GitHub Desktop.
🐐 Llama 3.1 405B matches or beats the best closed models

🐐 Llama 3.1 405B matches or beats the best closed models

Llama 3.1 405B, 70B, 8B is officially out. Llama 3.1 405B is the first openly available model that matches or beats the best closed models across many benchmarks.

Model evaluations

The performance of 405B model is very similar to Claude 3.5 Sonnet. It beats GPT4 on every single benchmark but one.

70B model has an even more impressive performance. It is significantly better than GPT-3.5 Turbo and beats Nemotron 4 340B on many tests.

405B evals 70B and 8B evals

Try 405B at meta.ai, on WhatsApp or on HuggingChat.

Notable improvements:

  • 128k context length.
  • Multilingual abilities.
  • Function calling and tool use.
  • Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere πŸ”₯
  • 8B and 70B code generation performance improved up to 12%.
  • FP8 quantized version available for efficient inference. (Hugging Face provides GPTQ and AWQ quants.)
  • Llama Stack API for easy integration.

Important facts:

  • Pre-training cut-off date of Dec 2023.
  • 405B trained on 15.6T tokens and fine-tuned on 25M human and synthetic examples.
  • Leveraged the 405B model to improve the post-training quality of 70B and 8B models.
  • TikToken-based tokenizer.

Llama 3.1 collection of large language models (LLMs) will make history with the largest and most capable open model ever released. Thank you for making AI and LLM more accessible.

Blog post: https://ai.meta.com/blog/meta-llama-3-1/

Llama home: https://llama.meta.com

Download weights on llama.meta.com and Hugging Face

Cloud providers playgrounds:

Paper: https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ (It's so cool to see an exhaustive and extensive technical report.)

Model card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md

GitHub Repo: https://github.com/meta-llama/llama-models/tree/main/models/llama3_1

All details about Llama 3.1 such as VRAM requirements on Hugging Face blog. Learn how to quantize, fine-tune, distil, run inference and more in this blog post. (Overwhelm? If you can only read one thing, let it be this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment