Tokens

Last updated: Jun 29, 2026

Author :

Vinay Adari

Tokens

Tokens are the basic unit an LLM reads, generates, and is measured in. Almost everything about working with LLMs is counted in tokens, not words: the context window, the price, the rate limits, and the speed. Understanding tokens helps you estimate cost, stay within limits, and write efficient prompts.

💡 In one line: A token is a chunk of text (often a word or sub-word) — and tokens are the currency LLMs use to measure length, cost, and limits.

What is a Token? (Quick Recap)

A token is a chunk of text — usually a word or a sub-word piece. Common words are a single token; longer or rarer words get split:

"cat" → 1 token
"tokenization" → "token" + "ization" (2 tokens)

A tokenizer does this splitting before the model sees the text. (For how tokens then become vectors, see the Tokens & Embeddings article.)

Tokens ≠ Words

A handy rule of thumb for English:

1 token ≈ ¾ of a word ≈ 4 characters.
So 100 tokens ≈ 75 words.

This varies with the content and language. Rough examples:

Text	Approx. tokens
"cat"	1
"Hello, world!"	~4
A 500-word essay	~650–700

Why Tokens Matter

Context window — the input + output limit is measured in tokens (see Context Window).
Pricing — most LLM APIs charge per token, often with different rates for input and output.
Rate limits — usage caps are frequently set in tokens per minute.
Speed & cost — more tokens means slower responses and higher cost.

In short, tokens are the unit of account for LLMs.

What Affects Token Count

Common vs. rare words — common words are 1 token; rare/long ones split into several.
Spaces and punctuation — they count too (often attached to a word's token).
Numbers, code, and emojis — frequently use more tokens than you'd expect.
Non-English languages — often use more tokens per word, since tokenizers are usually optimised for English.

Tokenization Algorithms

Tokenizers build their vocabulary of sub-words using methods like:

BPE (Byte-Pair Encoding) — merges frequent character pairs into tokens.
WordPiece — used by BERT.
SentencePiece — language-agnostic, common in multilingual models.

All aim to balance a manageable vocabulary with the ability to represent any text.

Counting Tokens

(Install with pip install tiktoken.) Counting tokens lets you estimate cost and check limits before sending a request.

Tips to Use Fewer Tokens

Be concise — trim unnecessary words from prompts.
Summarise history — compress earlier conversation instead of resending it all.
Avoid pasting huge unneeded text — include only what's relevant (this is where RAG helps).

Summary

Tokens are the chunks of text LLMs process — usually words or sub-words.
Token count differs from word count (~1 token ≈ ¾ word ≈ 4 characters in English).
Context limits, pricing, and rate limits are all measured in tokens.
Numbers, code, emojis, and non-English text often use more tokens.
Counting tokens (e.g. with tiktoken) helps you manage cost and stay within limits.