In the previous article, we learned about PCA (Principal Component Analysis) and how it reduces dimensionality by preserving maximum variance.
However, PCA has a limitation:
Linear Relationships Only
Many real-world datasets contain:
Complex Non-Linear Patterns
that PCA may fail to capture.
To solve this problem, we use:
t-SNE
which is one of the most popular techniques for visualizing high-dimensional data.
What is t-SNE?
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear dimensionality reduction technique primarily used for visualizing high-dimensional data in two or three dimensions.
Its goal is:
Keep Similar Points Close
Keep Dissimilar Points Apart
while reducing dimensions.
Why Was t-SNE Created?
Imagine a dataset with:
100 Features
or
1000 Features
Humans cannot visualize such data.
t-SNE helps transform:
100 Dimensions
↓
2 Dimensions
while preserving neighborhood relationships.
Real-Life Analogy
Imagine a group of friends.
People with similar interests naturally form groups.
Example:
Sports Fans
Movie Fans
Gamers
If we place everyone on a map:
Friends should stay close together.
Different groups should remain separated.
This is exactly what t-SNE tries to do.
The Core Idea
t-SNE focuses on:
Local Relationships
rather than global structure.
It tries to preserve:
Nearest Neighbors
from the original dataset.
Example
Suppose:
Student A
is most similar to:
Student B
and
Student C
In the lower-dimensional representation:
A, B, C
should remain close together.
Why PCA Sometimes Fails
Consider a dataset shaped like:
Spiral
or
Curved Manifold
PCA attempts to fit straight directions.
Important structure may be lost.
Example
Original data:
Curved Shape
PCA:
Flattened Representation
t-SNE:
Preserves Groups
more effectively.
Understanding Similarity
t-SNE begins by measuring similarity between points.
Example:
Point A
Point B
Very close:
High Similarity
Far apart:
Low Similarity
Step 1: High-Dimensional Similarities
In the original space:
t-SNE computes probabilities representing:
How Likely
Points Are Neighbors
Example
| Pair | Similarity |
|---|---|
| A-B | 0.80 |
| A-C | 0.15 |
| A-D | 0.05 |
A and B are highly similar.
Step 2: Low-Dimensional Mapping
Points are randomly placed in:
2D Space
or
3D Space
Step 3: Compare Similarities
t-SNE checks whether neighbor relationships are preserved.
If not:
Move Points
closer or farther apart.
Step 4: Repeat
The algorithm continuously adjusts point positions.
Goal:
Match Neighborhood Structure
between high-dimensional and low-dimensional spaces.
Why "t" in t-SNE?
The "t" refers to:
Student's t-Distribution
used in the low-dimensional space.
Why Use a t-Distribution?
Early versions of SNE suffered from:
Crowding Problem
Points became compressed together.
The t-distribution helps:
Spread Clusters Apart
making visualization clearer.
What Does t-SNE Output Look Like?
Suppose we have images of handwritten digits.
Original dimensions:
784 Features
After t-SNE:
2 Features
Visualization may show:
Cluster of 0s
Cluster of 1s
Cluster of 2s
and so on.
Example: Customer Segmentation
Features:
- Age
- Income
- Spending Score
- Purchase History
t-SNE can reveal:
Natural Customer Groups
visually.
Example: Face Recognition
Thousands of pixel features become:
2D Visualization
showing similar faces grouped together.
Example: Bioinformatics
Gene expression datasets often contain:
Thousands of Dimensions
t-SNE helps visualize biological clusters.
Important Parameter: Perplexity
Perplexity controls:
Neighborhood Size
Typical values:
5
30
50
Small Perplexity
Focuses on:
Very Local Structure
Large Perplexity
Captures:
Broader Relationships
Advantages of t-SNE
Excellent Visualization
One of its biggest strengths.
Captures Nonlinear Structure
Handles complex patterns.
Preserves Local Relationships
Keeps neighbors together.
Reveals Hidden Clusters
Useful for exploratory analysis.
Limitations of t-SNE
Computationally Expensive
Slow on very large datasets.
Primarily a Visualization Tool
Not usually used as a preprocessing step for predictive models.
Results Can Vary
Different runs may produce different layouts.
Difficult to Interpret Distances
Global distances are not always meaningful.
Important Warning
Many beginners assume:
Large Gap Between Clusters
=
Large Real Difference
This is not always true.
t-SNE prioritizes local neighborhoods, not exact global distances.
PCA vs t-SNE
| Feature | PCA | t-SNE |
|---|---|---|
| Type | Linear | Nonlinear |
| Speed | Faster | Slower |
| Visualization | Good | Excellent |
| Local Structure | Moderate | Excellent |
| Large Datasets | Better | More Expensive |
| Interpretability | Higher | Lower |
Example
Dataset:
100 Features
PCA:
Captures Variance
t-SNE:
Captures Neighborhoods
Python Implementation
Import:
from sklearn.manifold import TSNE
Create Model:
tsne = TSNE(
n_components=2,
perplexity=30,
random_state=42
)
Transform Data:
X_tsne = tsne.fit_transform(X)
Visualize:
import matplotlib.pyplot as plt
plt.scatter(
X_tsne[:,0],
X_tsne[:,1]
)
plt.show()
Common Mistakes
Using t-SNE for Feature Selection
t-SNE is mainly for visualization.
Interpreting Global Distances
Far-apart clusters may not represent true distances.
Using Default Parameters Blindly
Perplexity significantly affects results.
Applying to Massive Datasets
Training can become slow.
Best Practices
- Standardize data before t-SNE
- Use PCA first for very high-dimensional data
- Experiment with perplexity values
- Focus on cluster structure rather than exact distances
- Use t-SNE mainly for visualization
t-SNE Workflow
High-Dimensional Data
↓
Compute Similarities
↓
Map to 2D/3D
↓
Preserve Neighbors
↓
Visualization
t-SNE Summary
| Concept | Meaning |
|---|---|
| t-SNE | Nonlinear Dimensionality Reduction |
| Goal | Preserve Neighborhoods |
| Output | 2D or 3D Visualization |
| Perplexity | Neighborhood Size |
| Strength | Cluster Visualization |
Why t-SNE is Important
t-SNE revolutionized the visualization of high-dimensional data by allowing complex structures to be displayed in two or three dimensions while preserving local relationships. It is widely used in machine learning, bioinformatics, image analysis, and exploratory data analysis to uncover hidden patterns and clusters.
Unlike PCA, which focuses on preserving variance, t-SNE focuses on preserving neighborhood relationships, making it one of the most effective tools for understanding complex datasets visually.