Temperature

Last updated: Jun 30, 2026

Author :

Vinay Adari

Temperature

Temperature is the most important output control for an LLM. It sets how random or focused the generated text is — from strict and predictable to wild and creative — by reshaping the model's probability distribution before a token is chosen. It's a single number that can dramatically change the feel of the output, using the very same model.

💡 In one line: Temperature controls randomness — low keeps output focused and predictable, high makes it diverse and creative.

What Temperature Does

At each step, the model produces a probability distribution over possible next tokens (via softmax — see Next Token Prediction). Temperature scales the logits before that softmax, changing how peaked or flat the distribution becomes:

A peaked distribution → the top token almost always wins → focused output.
A flat distribution → less likely tokens get a real chance → varied output.

How It Works (the Math)

Temperature divides the logits before softmax:

probabilities = softmax( logits / T )

T < 1 — divides by a small number → scores spread apart → distribution sharper → focused, deterministic.
T = 1 — the model's natural distribution, unchanged.
T > 1 — scores pulled together → distribution flatter → diverse, creative, riskier.
T → 0 — effectively greedy: always pick the single most likely token.

The Effect in Practice

Temperature	Behaviour	Good for
0 – 0.3	Focused, precise, consistent	Code, math, extraction, factual Q&A
0.4 – 0.7	Balanced	General chat, explanations
0.8 – 1.2	Creative, varied, surprising	Brainstorming, stories, marketing
> 1.5	Often incoherent	Rarely useful — can produce gibberish

Temperature = 0 (Deterministic)

At T = 0, decoding is greedy — the model always picks the top token, so the same input gives the same output every time. This is ideal when you need reproducibility or consistency (e.g. structured data extraction).

A Quick Example

Prompt: "The sky is ___"

Low temperature: "blue." — almost every time.
High temperature: "blue," "vast," "a canvas of fading light," "endless" — different each run.

Same model, same prompt — temperature decides how adventurous it gets.

Code Example

This shows the same logits producing very different probabilities as T changes.

Tips

Default ≈ 0.7 for general-purpose chat.
Lower it for accuracy and consistency; raise it for creativity.
Don't crank everything at once — usually pair a sensible temperature with top-p rather than maximising both.

Summary

Temperature controls the randomness of an LLM's output.
It works by dividing logits by T before softmax: low = peaked/focused, high = flat/creative.
T = 0 is deterministic (greedy); T > 1.5 often becomes incoherent.
Use low for factual/precise tasks and higher for creative ones.
It's one of several output controls — alongside top-p, top-k, and max tokens (next).