Temperature
Temperature is the most important output control for an LLM. It sets how random or focused the generated text is β from strict and predictable to wild and creative β by reshaping the model's probability distribution before a token is chosen. It's a single number that can dramatically change the feel of the output, using the very same model.
π‘ In one line: Temperature controls randomness β low keeps output focused and predictable, high makes it diverse and creative.
What Temperature Does
At each step, the model produces a probability distribution over possible next tokens (via softmax β see Next Token Prediction). Temperature scales the logits before that softmax, changing how peaked or flat the distribution becomes:
- A peaked distribution β the top token almost always wins β focused output.
- A flat distribution β less likely tokens get a real chance β varied output.
How It Works (the Math)
Temperature divides the logits before softmax:
probabilities = softmax( logits / T )- T < 1 β divides by a small number β scores spread apart β distribution sharper β focused, deterministic.
- T = 1 β the model's natural distribution, unchanged.
- T > 1 β scores pulled together β distribution flatter β diverse, creative, riskier.
- T β 0 β effectively greedy: always pick the single most likely token.
The Effect in Practice
| Temperature | Behaviour | Good for |
|---|---|---|
| 0 β 0.3 | Focused, precise, consistent | Code, math, extraction, factual Q&A |
| 0.4 β 0.7 | Balanced | General chat, explanations |
| 0.8 β 1.2 | Creative, varied, surprising | Brainstorming, stories, marketing |
| > 1.5 | Often incoherent | Rarely useful β can produce gibberish |
Temperature = 0 (Deterministic)
At T = 0, decoding is greedy β the model always picks the top token, so the same input gives the same output every time. This is ideal when you need reproducibility or consistency (e.g. structured data extraction).
A Quick Example
Prompt: "The sky is ___"
- Low temperature: "blue." β almost every time.
- High temperature: "blue," "vast," "a canvas of fading light," "endless" β different each run.
Same model, same prompt β temperature decides how adventurous it gets.
Code Example
This shows the same logits producing very different probabilities as T changes.
Tips
- Default β 0.7 for general-purpose chat.
- Lower it for accuracy and consistency; raise it for creativity.
- Don't crank everything at once β usually pair a sensible temperature with top-p rather than maximising both.
Summary
- Temperature controls the randomness of an LLM's output.
- It works by dividing logits by T before softmax: low = peaked/focused, high = flat/creative.
- T = 0 is deterministic (greedy); T > 1.5 often becomes incoherent.
- Use low for factual/precise tasks and higher for creative ones.
- It's one of several output controls β alongside top-p, top-k, and max tokens (next).