In the previous article, we learned about PCA (Principal Component Analysis) and how it reduces dimensionality by preserving maximum variance.

However, PCA has a limitation:

Linear Relationships Only

Many real-world datasets contain:

Complex Non-Linear Patterns

that PCA may fail to capture.

To solve this problem, we use:

t-SNE

which is one of the most popular techniques for visualizing high-dimensional data.

What is t-SNE?

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear dimensionality reduction technique primarily used for visualizing high-dimensional data in two or three dimensions.

Its goal is:

Keep Similar Points Close

Keep Dissimilar Points Apart

while reducing dimensions.

Why Was t-SNE Created?

Imagine a dataset with:

100 Features

or

1000 Features

Humans cannot visualize such data.

t-SNE helps transform:

100 Dimensions

2 Dimensions

while preserving neighborhood relationships.

Real-Life Analogy

Imagine a group of friends.

People with similar interests naturally form groups.

Example:

Sports Fans

Movie Fans

Gamers

If we place everyone on a map:

Friends should stay close together.

Different groups should remain separated.

This is exactly what t-SNE tries to do.

The Core Idea

t-SNE focuses on:

Local Relationships

rather than global structure.

It tries to preserve:

Nearest Neighbors

from the original dataset.

Example

Suppose:

Student A

is most similar to:

Student B

and

Student C

In the lower-dimensional representation:

A, B, C

should remain close together.

Why PCA Sometimes Fails

Consider a dataset shaped like:

Spiral

or

Curved Manifold

PCA attempts to fit straight directions.

Important structure may be lost.

Example

Original data:

Curved Shape

PCA:

Flattened Representation

t-SNE:

Preserves Groups

more effectively.

Understanding Similarity

t-SNE begins by measuring similarity between points.

Example:

Point A
Point B

Very close:

High Similarity

Far apart:

Low Similarity

Step 1: High-Dimensional Similarities

In the original space:

t-SNE computes probabilities representing:

How Likely
Points Are Neighbors

Example

PairSimilarity
A-B0.80
A-C0.15
A-D0.05

A and B are highly similar.

Step 2: Low-Dimensional Mapping

Points are randomly placed in:

2D Space

or

3D Space

Step 3: Compare Similarities

t-SNE checks whether neighbor relationships are preserved.

If not:

Move Points

closer or farther apart.

Step 4: Repeat

The algorithm continuously adjusts point positions.

Goal:

Match Neighborhood Structure

between high-dimensional and low-dimensional spaces.

Why "t" in t-SNE?

The "t" refers to:

Student's t-Distribution

used in the low-dimensional space.

Why Use a t-Distribution?

Early versions of SNE suffered from:

Crowding Problem

Points became compressed together.

The t-distribution helps:

Spread Clusters Apart

making visualization clearer.

What Does t-SNE Output Look Like?

Suppose we have images of handwritten digits.

Original dimensions:

784 Features

After t-SNE:

2 Features

Visualization may show:

Cluster of 0s

Cluster of 1s

Cluster of 2s

and so on.

Example: Customer Segmentation

Features:

  • Age
  • Income
  • Spending Score
  • Purchase History

t-SNE can reveal:

Natural Customer Groups

visually.

Example: Face Recognition

Thousands of pixel features become:

2D Visualization

showing similar faces grouped together.

Example: Bioinformatics

Gene expression datasets often contain:

Thousands of Dimensions

t-SNE helps visualize biological clusters.

Important Parameter: Perplexity

Perplexity controls:

Neighborhood Size

Typical values:

5

30

50

Small Perplexity

Focuses on:

Very Local Structure

Large Perplexity

Captures:

Broader Relationships

Advantages of t-SNE

Excellent Visualization

One of its biggest strengths.

Captures Nonlinear Structure

Handles complex patterns.

Preserves Local Relationships

Keeps neighbors together.

Reveals Hidden Clusters

Useful for exploratory analysis.

Limitations of t-SNE

Computationally Expensive

Slow on very large datasets.

Primarily a Visualization Tool

Not usually used as a preprocessing step for predictive models.

Results Can Vary

Different runs may produce different layouts.

Difficult to Interpret Distances

Global distances are not always meaningful.

Important Warning

Many beginners assume:

Large Gap Between Clusters
=
Large Real Difference

This is not always true.

t-SNE prioritizes local neighborhoods, not exact global distances.

PCA vs t-SNE

FeaturePCAt-SNE
TypeLinearNonlinear
SpeedFasterSlower
VisualizationGoodExcellent
Local StructureModerateExcellent
Large DatasetsBetterMore Expensive
InterpretabilityHigherLower

Example

Dataset:

100 Features

PCA:

Captures Variance

t-SNE:

Captures Neighborhoods

Python Implementation

Import:

from sklearn.manifold import TSNE

Create Model:

tsne = TSNE(
n_components=2,
perplexity=30,
random_state=42
)

Transform Data:

X_tsne = tsne.fit_transform(X)

Visualize:

import matplotlib.pyplot as plt

plt.scatter(
X_tsne[:,0],
X_tsne[:,1]
)
plt.show()

Common Mistakes

Using t-SNE for Feature Selection

t-SNE is mainly for visualization.

Interpreting Global Distances

Far-apart clusters may not represent true distances.

Using Default Parameters Blindly

Perplexity significantly affects results.

Applying to Massive Datasets

Training can become slow.

Best Practices

  • Standardize data before t-SNE
  • Use PCA first for very high-dimensional data
  • Experiment with perplexity values
  • Focus on cluster structure rather than exact distances
  • Use t-SNE mainly for visualization

t-SNE Workflow

High-Dimensional Data

Compute Similarities

Map to 2D/3D

Preserve Neighbors

Visualization

t-SNE Summary

ConceptMeaning
t-SNENonlinear Dimensionality Reduction
GoalPreserve Neighborhoods
Output2D or 3D Visualization
PerplexityNeighborhood Size
StrengthCluster Visualization

Why t-SNE is Important

t-SNE revolutionized the visualization of high-dimensional data by allowing complex structures to be displayed in two or three dimensions while preserving local relationships. It is widely used in machine learning, bioinformatics, image analysis, and exploratory data analysis to uncover hidden patterns and clusters.

Unlike PCA, which focuses on preserving variance, t-SNE focuses on preserving neighborhood relationships, making it one of the most effective tools for understanding complex datasets visually.