Variational Autoencoders (VAE)
A Variational Autoencoder (VAE) is a generative version of the autoencoder. A standard autoencoder can only reconstruct data it has already seen, because it maps each input to a single fixed point in latent space. A VAE changes one crucial thing: it maps each input to a probability distribution instead of a single point. This makes the latent space smooth and continuous, so you can sample from it and generate brand-new, original data — which is exactly what makes a VAE a true generative model.
💡 In one line: A VAE is an autoencoder whose latent space is a probability distribution you can sample from — letting it generate new data, not just reconstruct.
From Autoencoder to VAE
In a normal autoencoder, the latent space can be irregular and full of gaps. If you pick a random point in it and decode it, you often get meaningless output, because the space wasn't organised for sampling.
A VAE fixes this by forcing the latent space to be smooth and well-structured. Instead of encoding an input as a single point, the encoder outputs the parameters of a distribution — a mean (μ) and a standard deviation (σ). Every input becomes a small "cloud" rather than a dot, and these clouds are packed together neatly, so any point you sample decodes into something sensible.
How a VAE Works
- Encoder — takes the input and outputs two vectors: a mean μ and a standard deviation σ describing a distribution in latent space.
- Sampling — a latent vector z is drawn from that distribution:
z = μ + σ·ε, where ε is random noise from a standard normal. This is called the reparameterization trick, and it's what keeps the network trainable (differentiable). - Decoder — reconstructs the output from the sampled z.
The VAE Loss Function
A VAE is trained with two loss terms working together:
Total Loss = Reconstruction Loss + KL Divergence- Reconstruction Loss — makes the output match the input (just like a normal autoencoder).
- KL Divergence — pulls each input's latent distribution toward a standard normal N(0, 1). This is the regulariser that keeps the latent space smooth and continuous.
The balance between these two is what lets a VAE both reconstruct well and generate cleanly.
Generating New Data
Once trained, generating new data is simple: throw away the encoder, sample a random z from N(0, 1), and pass it through the decoder. Out comes a new, original sample — a digit, a face, an image.
Because the latent space is smooth, you can even interpolate: move gradually from one point to another and watch the decoded output morph smoothly (e.g. a "4" slowly becoming a "9").
A Simple Code Example
📌 Key detail: the encoder outputs
z_meanandz_log_var(a distribution), not a single code — that's the whole difference from a plain autoencoder.
VAE vs. Standard Autoencoder
| Aspect | Standard Autoencoder | VAE |
|---|---|---|
| Latent space | Single point per input | Probability distribution (μ, σ) |
| Structure | Irregular, gaps | Smooth, continuous |
| Can generate new data? | No (reconstructs only) | Yes |
| Loss | Reconstruction only | Reconstruction + KL divergence |
| Main use | Compression, denoising | Generation, sampling |
Pros and Cons of VAEs
| ✅ Pros (Advantages) | ⚠️ Cons (Challenges) |
|---|---|
| Can generate genuinely new data | Outputs are often blurry |
| Smooth, continuous latent space | Less sharp than GANs or diffusion models |
| Allows meaningful interpolation | Balancing the two loss terms is tricky |
| Stable and reliable to train | Limited fine detail in generations |
| Useful latent representations | — |
Applications of VAEs
| Domain | Use |
|---|---|
| Image generation | Creating new faces, digits, designs |
| Data augmentation | Generating synthetic training data |
| Anomaly detection | Flagging inputs that reconstruct poorly |
| Drug discovery | Generating new molecular structures |
| Latent exploration | Smoothly interpolating between data points |
Summary
- A VAE is a generative autoencoder that encodes each input as a probability distribution (μ, σ) rather than a single point.
- It samples a latent z using the reparameterization trick (
z = μ + σ·ε) and decodes it. - Its loss combines reconstruction loss with KL divergence, which keeps the latent space smooth.
- Because the latent space is continuous, you can sample and interpolate to generate new data.
- VAEs are stable and great for generation, though their outputs are often blurrier than GANs or diffusion models.