Parameters & Model Size

When people say a model has "7 billion parameters," what does that actually mean? Parameters are the internal values a model learns during training, and model size is usually measured by how many of them there are. Parameter count is the most common shorthand for how big β€” and roughly how capable β€” a Generative AI model is.

πŸ’‘ In one line: Parameters are the learned numbers inside a model, and model size is simply how many of them there are.

What Are Parameters?

A neural network is full of small numbers that get adjusted as it learns. These numbers are the parameters, and they come in two kinds:

  • Weights β€” how strongly each connection between neurons matters.
  • Biases β€” a per-neuron offset that shifts the output.

Remember the neuron formula: output = f(w₁x₁ + wβ‚‚xβ‚‚ + … + b). Every w and b in that expression is a parameter. They start as random values, and training adjusts them until the model produces good outputs. In effect, a model's parameters are where its "knowledge" is stored β€” everything it has learned lives in these numbers.

More parameters means more capacity to capture complex patterns β€” which is why bigger models can often do more.

What is Model Size?

Model size is the total number of parameters a model has. The counts get large fast, so they're written with units:

  • K = thousand, M = million, B = billion, T = trillion.

So a "7B model" has 7 billion parameters. Modern Generative AI models range from a few million parameters up to hundreds of billions or more.

Why Model Size Matters

  • Capacity β€” more parameters let a model represent more complex patterns and store more knowledge.
  • Scaling laws β€” research has shown that model performance improves in a fairly predictable way as you increase parameters, training data, and compute together.
  • Emergent abilities β€” very large models can suddenly display skills (like multi-step reasoning) that smaller models simply don't have.

This is a big reason the field raced toward ever-larger models.

The Cost of Size

Bigger models are more capable, but size comes at a steep price:

  • Memory β€” every parameter must be stored. At half precision (~2 bytes each), a 7-billion-parameter model needs roughly 14 GB of memory just to load.
  • Compute β€” more parameters mean more calculations, so training and even running the model is slower and more expensive.
  • Energy β€” large models consume significant power to train and serve.
  • Data β€” bigger models need correspondingly huge datasets to train well.
AspectSmaller ModelsLarger Models
CapabilityGood for focused tasksBroad, more powerful
SpeedFastSlower
CostCheap to runExpensive
HardwareCan run on a phone/laptopNeeds powerful GPUs
Data neededLessMuch more

Don't Confuse These "Sizes"

Parameter count is just one measure. Don't mix it up with:

  • Training data size β€” how much data the model learned from (measured in tokens), not the same as parameters.
  • Context window β€” how much text the model can read at once.
  • File size β€” depends on the precision used to store each parameter (e.g. 32-bit vs 8-bit), so the same model can take different amounts of disk.

Rough Model Size Tiers

TierApprox. parametersTypical use
Smallunder ~100MOn-device, mobile, simple tasks
Medium~100M – 1BSpecialised or fine-tuned tasks
Large~1B – 100BGeneral-purpose LLMs and chat
Frontier~100B – 1T+Most capable, research-grade, costly

(Approximate ranges β€” the boundaries shift as the field advances.)

Bigger Isn't Always Better

A larger model isn't automatically the right choice. A smaller, well-trained model on good data can match or beat a larger one on a specific task β€” while being far cheaper and faster. Techniques to shrink models without losing much quality include:

  • Quantization β€” store parameters at lower precision (e.g. 8-bit or 4-bit instead of 32-bit).
  • Pruning β€” remove unimportant weights.
  • Distillation β€” train a small "student" model to imitate a large "teacher."

Right-sizing the model to the task often matters more than raw size.

Summary

  • Parameters are the learned numbers (weights and biases) inside a model β€” where its knowledge is stored.
  • Model size is the total parameter count, written in K, M, B, or T.
  • More parameters generally means more capability, following scaling laws and sometimes unlocking emergent abilities.
  • But size costs memory, compute, energy, and data β€” a 7B model needs ~14 GB just to load.
  • Bigger isn't always better: quantization, pruning, and distillation can shrink models, and right-sizing to the task often wins.