Parameters & Model Size
When people say a model has "7 billion parameters," what does that actually mean? Parameters are the internal values a model learns during training, and model size is usually measured by how many of them there are. Parameter count is the most common shorthand for how big β and roughly how capable β a Generative AI model is.
π‘ In one line: Parameters are the learned numbers inside a model, and model size is simply how many of them there are.
What Are Parameters?
A neural network is full of small numbers that get adjusted as it learns. These numbers are the parameters, and they come in two kinds:
- Weights β how strongly each connection between neurons matters.
- Biases β a per-neuron offset that shifts the output.
Remember the neuron formula: output = f(wβxβ + wβxβ + β¦ + b). Every w and b in that expression is a parameter. They start as random values, and training adjusts them until the model produces good outputs. In effect, a model's parameters are where its "knowledge" is stored β everything it has learned lives in these numbers.
More parameters means more capacity to capture complex patterns β which is why bigger models can often do more.
What is Model Size?
Model size is the total number of parameters a model has. The counts get large fast, so they're written with units:
- K = thousand, M = million, B = billion, T = trillion.
So a "7B model" has 7 billion parameters. Modern Generative AI models range from a few million parameters up to hundreds of billions or more.
Why Model Size Matters
- Capacity β more parameters let a model represent more complex patterns and store more knowledge.
- Scaling laws β research has shown that model performance improves in a fairly predictable way as you increase parameters, training data, and compute together.
- Emergent abilities β very large models can suddenly display skills (like multi-step reasoning) that smaller models simply don't have.
This is a big reason the field raced toward ever-larger models.
The Cost of Size
Bigger models are more capable, but size comes at a steep price:
- Memory β every parameter must be stored. At half precision (~2 bytes each), a 7-billion-parameter model needs roughly 14 GB of memory just to load.
- Compute β more parameters mean more calculations, so training and even running the model is slower and more expensive.
- Energy β large models consume significant power to train and serve.
- Data β bigger models need correspondingly huge datasets to train well.
| Aspect | Smaller Models | Larger Models |
|---|---|---|
| Capability | Good for focused tasks | Broad, more powerful |
| Speed | Fast | Slower |
| Cost | Cheap to run | Expensive |
| Hardware | Can run on a phone/laptop | Needs powerful GPUs |
| Data needed | Less | Much more |
Don't Confuse These "Sizes"
Parameter count is just one measure. Don't mix it up with:
- Training data size β how much data the model learned from (measured in tokens), not the same as parameters.
- Context window β how much text the model can read at once.
- File size β depends on the precision used to store each parameter (e.g. 32-bit vs 8-bit), so the same model can take different amounts of disk.
Rough Model Size Tiers
| Tier | Approx. parameters | Typical use |
|---|---|---|
| Small | under ~100M | On-device, mobile, simple tasks |
| Medium | ~100M β 1B | Specialised or fine-tuned tasks |
| Large | ~1B β 100B | General-purpose LLMs and chat |
| Frontier | ~100B β 1T+ | Most capable, research-grade, costly |
(Approximate ranges β the boundaries shift as the field advances.)
Bigger Isn't Always Better
A larger model isn't automatically the right choice. A smaller, well-trained model on good data can match or beat a larger one on a specific task β while being far cheaper and faster. Techniques to shrink models without losing much quality include:
- Quantization β store parameters at lower precision (e.g. 8-bit or 4-bit instead of 32-bit).
- Pruning β remove unimportant weights.
- Distillation β train a small "student" model to imitate a large "teacher."
Right-sizing the model to the task often matters more than raw size.
Summary
- Parameters are the learned numbers (weights and biases) inside a model β where its knowledge is stored.
- Model size is the total parameter count, written in K, M, B, or T.
- More parameters generally means more capability, following scaling laws and sometimes unlocking emergent abilities.
- But size costs memory, compute, energy, and data β a 7B model needs ~14 GB just to load.
- Bigger isn't always better: quantization, pruning, and distillation can shrink models, and right-sizing to the task often wins.