Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.
- Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
- 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
- Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4