Tokens
Tokens are the basic unit an LLM reads, generates, and is measured in. Almost everything about working with LLMs is counted in tokens, not words: the context window, the price, the rate limits, and the speed. Understanding tokens helps you estimate cost, stay within limits, and write efficient prompts.
π‘ In one line: A token is a chunk of text (often a word or sub-word) β and tokens are the currency LLMs use to measure length, cost, and limits.
What is a Token? (Quick Recap)
A token is a chunk of text β usually a word or a sub-word piece. Common words are a single token; longer or rarer words get split:
"cat"β 1 token"tokenization"β"token"+"ization"(2 tokens)
A tokenizer does this splitting before the model sees the text. (For how tokens then become vectors, see the Tokens & Embeddings article.)
Tokens β Words
A handy rule of thumb for English:
- 1 token β ΒΎ of a word β 4 characters.
- So 100 tokens β 75 words.
This varies with the content and language. Rough examples:
| Text | Approx. tokens |
|---|---|
| "cat" | 1 |
| "Hello, world!" | ~4 |
| A 500-word essay | ~650β700 |
Why Tokens Matter
- Context window β the input + output limit is measured in tokens (see Context Window).
- Pricing β most LLM APIs charge per token, often with different rates for input and output.
- Rate limits β usage caps are frequently set in tokens per minute.
- Speed & cost β more tokens means slower responses and higher cost.
In short, tokens are the unit of account for LLMs.
What Affects Token Count
- Common vs. rare words β common words are 1 token; rare/long ones split into several.
- Spaces and punctuation β they count too (often attached to a word's token).
- Numbers, code, and emojis β frequently use more tokens than you'd expect.
- Non-English languages β often use more tokens per word, since tokenizers are usually optimised for English.
Tokenization Algorithms
Tokenizers build their vocabulary of sub-words using methods like:
- BPE (Byte-Pair Encoding) β merges frequent character pairs into tokens.
- WordPiece β used by BERT.
- SentencePiece β language-agnostic, common in multilingual models.
All aim to balance a manageable vocabulary with the ability to represent any text.
Counting Tokens
(Install with pip install tiktoken.) Counting tokens lets you estimate cost and check limits before sending a request.
Tips to Use Fewer Tokens
- Be concise β trim unnecessary words from prompts.
- Summarise history β compress earlier conversation instead of resending it all.
- Avoid pasting huge unneeded text β include only what's relevant (this is where RAG helps).
Summary
- Tokens are the chunks of text LLMs process β usually words or sub-words.
- Token count differs from word count (~1 token β ΒΎ word β 4 characters in English).
- Context limits, pricing, and rate limits are all measured in tokens.
- Numbers, code, emojis, and non-English text often use more tokens.
- Counting tokens (e.g. with tiktoken) helps you manage cost and stay within limits.