Tokens

Tokens are the basic unit an LLM reads, generates, and is measured in. Almost everything about working with LLMs is counted in tokens, not words: the context window, the price, the rate limits, and the speed. Understanding tokens helps you estimate cost, stay within limits, and write efficient prompts.

πŸ’‘ In one line: A token is a chunk of text (often a word or sub-word) β€” and tokens are the currency LLMs use to measure length, cost, and limits.

What is a Token? (Quick Recap)

A token is a chunk of text β€” usually a word or a sub-word piece. Common words are a single token; longer or rarer words get split:

  • "cat" β†’ 1 token
  • "tokenization" β†’ "token" + "ization" (2 tokens)

A tokenizer does this splitting before the model sees the text. (For how tokens then become vectors, see the Tokens & Embeddings article.)

Tokens β‰  Words

A handy rule of thumb for English:

  • 1 token β‰ˆ ΒΎ of a word β‰ˆ 4 characters.
  • So 100 tokens β‰ˆ 75 words.

This varies with the content and language. Rough examples:

TextApprox. tokens
"cat"1
"Hello, world!"~4
A 500-word essay~650–700

Why Tokens Matter

  • Context window β€” the input + output limit is measured in tokens (see Context Window).
  • Pricing β€” most LLM APIs charge per token, often with different rates for input and output.
  • Rate limits β€” usage caps are frequently set in tokens per minute.
  • Speed & cost β€” more tokens means slower responses and higher cost.

In short, tokens are the unit of account for LLMs.

What Affects Token Count

  • Common vs. rare words β€” common words are 1 token; rare/long ones split into several.
  • Spaces and punctuation β€” they count too (often attached to a word's token).
  • Numbers, code, and emojis β€” frequently use more tokens than you'd expect.
  • Non-English languages β€” often use more tokens per word, since tokenizers are usually optimised for English.

Tokenization Algorithms

Tokenizers build their vocabulary of sub-words using methods like:

  • BPE (Byte-Pair Encoding) β€” merges frequent character pairs into tokens.
  • WordPiece β€” used by BERT.
  • SentencePiece β€” language-agnostic, common in multilingual models.

All aim to balance a manageable vocabulary with the ability to represent any text.

Counting Tokens


(Install with pip install tiktoken.) Counting tokens lets you estimate cost and check limits before sending a request.

Tips to Use Fewer Tokens

  • Be concise β€” trim unnecessary words from prompts.
  • Summarise history β€” compress earlier conversation instead of resending it all.
  • Avoid pasting huge unneeded text β€” include only what's relevant (this is where RAG helps).

Summary

  • Tokens are the chunks of text LLMs process β€” usually words or sub-words.
  • Token count differs from word count (~1 token β‰ˆ ΒΎ word β‰ˆ 4 characters in English).
  • Context limits, pricing, and rate limits are all measured in tokens.
  • Numbers, code, emojis, and non-English text often use more tokens.
  • Counting tokens (e.g. with tiktoken) helps you manage cost and stay within limits.