In the previous article, we learned that K-Nearest Neighbors (KNN) makes predictions by looking at nearby data points.

This naturally raises an important question:

How does KNN determine which points are nearest?

Consider the following points:

Point A = (2,3)

Point B = (4,5)

Point C = (20,25)

Intuitively, Point B appears much closer to Point A than Point C.

But computers cannot rely on intuition.

They need a mathematical way to measure closeness.

This measurement is called a Distance Metric.

Distance Metrics are among the most important concepts in Machine Learning because they are used in:

  • K-Nearest Neighbors (KNN)
  • K-Means Clustering
  • Recommendation Systems
  • Image Recognition
  • Anomaly Detection
  • Information Retrieval
  • Natural Language Processing

In this article, we will explore distance metrics from first principles, understand the most commonly used distance formulas, and learn when to use each one.

What is a Distance Metric?

A distance metric is a mathematical method used to measure how far apart two data points are.

The basic intuition is:

Small Distance

More Similar

Large Distance

Less Similar

KNN assumes:

Nearby points

Likely belong to the same class

Therefore, measuring distance accurately is crucial.

Why Distance Matters in KNN

Suppose we want to classify a new student.

Dataset:

Study HoursResult
2Fail
3Fail
8Pass
9Pass

New Student:

Study Hours = 4

Which examples should influence the prediction?

Clearly:

2 and 3

are closer than:

8 and 9

Distance metrics help KNN determine this mathematically.

Distance in One Dimension

Consider:

Point A = 5

Point B = 8

Distance:

85=3|8-5| = 3

Simple subtraction works.

However, real-world data usually contains multiple features.

Distance in Two Dimensions

Suppose:

A = (2,3)

B = (5,7)

How do we measure distance now?

The most common answer is:

Euclidean Distance

Euclidean Distance

Euclidean Distance is the straight-line distance between two points.

It is the distance we normally think about in geometry.

Formula:

d=(x2x1)2+(y2y1)2d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}

Visualizing Euclidean Distance

(2,7)
*
|\
| \
| \
| \
| \
*-----*
(2,3) (5,3)

The diagonal line represents Euclidean Distance.

Example Calculation

Points:

A=(2,3)A=(2,3) B=(5,7)B=(5,7)

Distance:

d=(52)2+(73)2d= \sqrt{(5-2)^2+(7-3)^2} =9+16= \sqrt{9+16} =5= 5

Euclidean Distance for Multiple Features

For n dimensions:

d=i=1n(xiyi)2d=\sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}

This is the most commonly used distance metric in KNN.

Properties of Euclidean Distance

  • Easy to understand
  • Works well for continuous numerical data
  • Most common choice in KNN
  • Sensitive to feature scaling

Why Feature Scaling Matters

Suppose:

FeatureValue
Age25
Salary500000

Salary dominates distance calculations because its values are much larger.

Therefore:

Feature scaling is often necessary before applying KNN.

Manhattan Distance

Imagine traveling through city streets arranged in a grid.

You cannot move diagonally.

You must travel horizontally and vertically.

This leads to:

Manhattan Distance

Formula:

d=i=1nxiyid=\sum_{i=1}^{n}|x_i-y_i|

Why the Name Manhattan?

The streets of Manhattan form a grid-like structure.

Travel occurs along blocks rather than diagonally.

Visualizing Manhattan Distance

Start *
|
|
|
*----*
End

Movement occurs along edges.

Example Calculation

Points:

A=(2,3)A=(2,3) B=(5,7)B=(5,7)

Distance:

52+73|5-2|+|7-3| 3+43+4 77

Euclidean vs Manhattan

For the same points:

MetricDistance
Euclidean5
Manhattan7

Different metrics produce different results.

When Manhattan Distance Works Well

Useful when:

  • Features represent independent dimensions
  • Grid-based movement exists
  • Outlier influence should be reduced

Minkowski Distance

Euclidean and Manhattan Distance are actually special cases of a more general metric called:

Minkowski Distance

Formula:

d=(xiyip)1/pd=\left(\sum |x_i-y_i|^p\right)^{1/p}

Where:

pp

controls the distance type.

Special Cases

If:

p=1p=1

Manhattan Distance.

If:

p=2p=2

Euclidean Distance.

Thus:

Minkowski generalizes both metrics.

Hamming Distance

Used for categorical or binary data.

Measures:

Number of positions where values differ.

Example

Strings:

101110

100100

Differences:

Position 3
Position 5

Hamming Distance:

22

Applications

  • DNA Analysis
  • Binary Data
  • Error Detection
  • Text Processing

Cosine Similarity

Sometimes direction matters more than distance.

Example:

Text Documents.

Two documents may have very different lengths but discuss the same topic.

In such cases, Cosine Similarity is preferred.

Formula:

cos(θ)=ABAB\cos(\theta)=\frac{A\cdot B}{||A||||B||}

Intuition

Measures:

Angle Between Vectors

rather than physical distance.

Interpretation

Cosine SimilarityMeaning
1Identical Direction
0Unrelated
-1Opposite Direction

Applications

  • Search Engines
  • NLP
  • Recommendation Systems
  • Text Similarity

Choosing the Right Distance Metric

There is no universally best metric.

Choice depends on:

  • Data type
  • Problem domain
  • Feature characteristics

Numerical Data

Preferred:

Euclidean Distance

Grid-Based Problems

Preferred:

Manhattan Distance

Binary Data

Preferred:

Hamming Distance

Text Data

Preferred:

Cosine Similarity

Example: KNN Classification

Suppose:

Training Points:

A
A
B
B

New Point:

?

KNN:

  1. Calculates distance to all points.
  2. Sorts distances.
  3. Selects nearest neighbors.
  4. Performs voting.

The distance metric directly determines which neighbors are selected.

Effect of Different Metrics

The same dataset can produce different predictions depending on the chosen metric.

Example:

Euclidean → Class A

Manhattan → Class B

This is why metric selection is important.

Python Example: Euclidean Distance

from scipy.spatial.distance import euclidean

distance = euclidean(
[2,3],
[5,7]
)

print(distance)

Python Example: Manhattan Distance

from scipy.spatial.distance import cityblock

distance = cityblock(
[2,3],
[5,7]
)

Python Example: Hamming Distance

from scipy.spatial.distance import hamming

distance = hamming(
[1,0,1,1],
[1,0,0,1]
)

Python Example: Cosine Similarity

from sklearn.metrics.pairwise import cosine_similarity

Real-World Applications

Recommendation Systems

Finding users with similar preferences.

Face Recognition

Comparing facial feature vectors.

Search Engines

Finding relevant documents.

Fraud Detection

Identifying transactions similar to known frauds.

Medical Diagnosis

Finding patients with similar characteristics.

Common Mistakes

Ignoring Feature Scaling

Different feature scales distort distance calculations.

Using Euclidean Distance Everywhere

Other metrics may be more suitable.

Mixing Numerical and Categorical Data

Distance calculations become unreliable.

Best Practices

  • Scale numerical features
  • Understand data characteristics
  • Experiment with multiple metrics
  • Use cross-validation
  • Consider domain knowledge

Distance Metrics Summary

MetricBest For
EuclideanContinuous Numerical Data
ManhattanGrid-Based Problems
MinkowskiGeneralized Distance
HammingBinary/Categorical Data
Cosine SimilarityText and Vector Similarity

Why Distance Metrics are Important

Distance metrics form the foundation of similarity-based Machine Learning algorithms. KNN, clustering methods, recommendation systems, and many modern AI applications depend heavily on how similarity is measured.

Choosing the right distance metric can dramatically affect model performance because it determines which observations are considered neighbors. Understanding these metrics helps practitioners select appropriate algorithms, preprocess data correctly, and build more reliable Machine Learning systems.

In the next article, we will study Choosing K in K-Nearest Neighbors, one of the most important decisions in KNN that directly affects model accuracy, overfitting, and underfitting.