Temperature

Temperature is the most important output control for an LLM. It sets how random or focused the generated text is β€” from strict and predictable to wild and creative β€” by reshaping the model's probability distribution before a token is chosen. It's a single number that can dramatically change the feel of the output, using the very same model.

πŸ’‘ In one line: Temperature controls randomness β€” low keeps output focused and predictable, high makes it diverse and creative.

What Temperature Does

At each step, the model produces a probability distribution over possible next tokens (via softmax β€” see Next Token Prediction). Temperature scales the logits before that softmax, changing how peaked or flat the distribution becomes:

  • A peaked distribution β†’ the top token almost always wins β†’ focused output.
  • A flat distribution β†’ less likely tokens get a real chance β†’ varied output.

How It Works (the Math)

Temperature divides the logits before softmax:

probabilities = softmax( logits / T )
  • T < 1 β€” divides by a small number β†’ scores spread apart β†’ distribution sharper β†’ focused, deterministic.
  • T = 1 β€” the model's natural distribution, unchanged.
  • T > 1 β€” scores pulled together β†’ distribution flatter β†’ diverse, creative, riskier.
  • T β†’ 0 β€” effectively greedy: always pick the single most likely token.

The Effect in Practice

TemperatureBehaviourGood for
0 – 0.3Focused, precise, consistentCode, math, extraction, factual Q&A
0.4 – 0.7BalancedGeneral chat, explanations
0.8 – 1.2Creative, varied, surprisingBrainstorming, stories, marketing
> 1.5Often incoherentRarely useful β€” can produce gibberish

Temperature = 0 (Deterministic)

At T = 0, decoding is greedy β€” the model always picks the top token, so the same input gives the same output every time. This is ideal when you need reproducibility or consistency (e.g. structured data extraction).

A Quick Example

Prompt: "The sky is ___"

  • Low temperature: "blue." β€” almost every time.
  • High temperature: "blue," "vast," "a canvas of fading light," "endless" β€” different each run.

Same model, same prompt β€” temperature decides how adventurous it gets.

Code Example


This shows the same logits producing very different probabilities as T changes.

Tips

  • Default β‰ˆ 0.7 for general-purpose chat.
  • Lower it for accuracy and consistency; raise it for creativity.
  • Don't crank everything at once β€” usually pair a sensible temperature with top-p rather than maximising both.

Summary

  • Temperature controls the randomness of an LLM's output.
  • It works by dividing logits by T before softmax: low = peaked/focused, high = flat/creative.
  • T = 0 is deterministic (greedy); T > 1.5 often becomes incoherent.
  • Use low for factual/precise tasks and higher for creative ones.
  • It's one of several output controls β€” alongside top-p, top-k, and max tokens (next).