Skip to content

Instantly share code, notes, and snippets.

View TheLonelyDevil9's full-sized avatar

TheLonelyDevil9

View GitHub Profile
@Artefact2
Artefact2 / README.md
Last active October 21, 2025 02:42
GGUF quantizations overview
@kalomaze
kalomaze / llm_samplers_explained.md
Last active September 26, 2025 17:02
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4