Distance Metrics in Machine Learning

Last updated: Jun 13, 2026

Author :

Christy Harshitha Dakarapu

In the previous article, we learned that K-Nearest Neighbors (KNN) makes predictions by looking at nearby data points.

This naturally raises an important question:

How does KNN determine which points are nearest?

Consider the following points:


Point A = (2,3)

Point B = (4,5)

Point C = (20,25)

Intuitively, Point B appears much closer to Point A than Point C.

But computers cannot rely on intuition.

They need a mathematical way to measure closeness.

This measurement is called a Distance Metric.

Distance Metrics are among the most important concepts in Machine Learning because they are used in:

K-Nearest Neighbors (KNN)
K-Means Clustering
Recommendation Systems
Image Recognition
Anomaly Detection
Information Retrieval
Natural Language Processing

In this article, we will explore distance metrics from first principles, understand the most commonly used distance formulas, and learn when to use each one.

What is a Distance Metric?

A distance metric is a mathematical method used to measure how far apart two data points are.

The basic intuition is:


Small Distance
      ↓
More Similar

Large Distance
      ↓
Less Similar

KNN assumes:


Nearby points
      ↓
Likely belong to the same class

Therefore, measuring distance accurately is crucial.

Why Distance Matters in KNN

Suppose we want to classify a new student.

Dataset:

Study Hours	Result
2	Fail
3	Fail
8	Pass
9	Pass

New Student:


Study Hours = 4

Which examples should influence the prediction?

Clearly:


2 and 3

are closer than:


8 and 9

Distance metrics help KNN determine this mathematically.

Distance in One Dimension

Consider:


Point A = 5

Point B = 8

Distance:

|8-5| = 3

Simple subtraction works.

However, real-world data usually contains multiple features.

Distance in Two Dimensions

Suppose:


A = (2,3)

B = (5,7)

How do we measure distance now?

The most common answer is:

Euclidean Distance

Euclidean Distance

Euclidean Distance is the straight-line distance between two points.

It is the distance we normally think about in geometry.

Formula:

d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}

Visualizing Euclidean Distance


(2,7)
  *
  |\
  | \
  |  \
  |   \
  |    \
  *-----*
(2,3)  (5,3)

The diagonal line represents Euclidean Distance.

Example Calculation

Points:

A=(2,3)

B=(5,7)

Distance:

d= \sqrt{(5-2)^2+(7-3)^2}

= \sqrt{9+16}

= 5

Euclidean Distance for Multiple Features

For n dimensions:

$d=\sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}$

This is the most commonly used distance metric in KNN.

Properties of Euclidean Distance

Easy to understand
Works well for continuous numerical data
Most common choice in KNN
Sensitive to feature scaling

Why Feature Scaling Matters

Suppose:

Feature	Value
Age	25
Salary	500000

Salary dominates distance calculations because its values are much larger.

Therefore:

Feature scaling is often necessary before applying KNN.

Manhattan Distance

Imagine traveling through city streets arranged in a grid.

You cannot move diagonally.

You must travel horizontally and vertically.

This leads to:

Manhattan Distance

Formula:

$d=\sum_{i=1}^{n}|x_i-y_i|$

Why the Name Manhattan?

The streets of Manhattan form a grid-like structure.

Travel occurs along blocks rather than diagonally.

Visualizing Manhattan Distance


Start *
      |
      |
      |
      *----*
           End

Movement occurs along edges.

Example Calculation

Points:

A=(2,3)

B=(5,7)

Distance:

|5-2|+|7-3|

3+4

7

Euclidean vs Manhattan

For the same points:

Metric	Distance
Euclidean	5
Manhattan	7

Different metrics produce different results.

When Manhattan Distance Works Well

Useful when:

Features represent independent dimensions
Grid-based movement exists
Outlier influence should be reduced

Minkowski Distance

Euclidean and Manhattan Distance are actually special cases of a more general metric called:

Minkowski Distance

Formula:

$d=\left(\sum |x_i-y_i|^p\right)^{1/p}$

Where:

p

controls the distance type.

Special Cases

If:

p=1

Manhattan Distance.

If:

p=2

Euclidean Distance.

Thus:

Minkowski generalizes both metrics.

Hamming Distance

Used for categorical or binary data.

Measures:

Number of positions where values differ.

Example

Strings:


101110

100100

Differences:


Position 3
Position 5

Hamming Distance:

2

Applications

DNA Analysis
Binary Data
Error Detection
Text Processing

Cosine Similarity

Sometimes direction matters more than distance.

Example:

Text Documents.

Two documents may have very different lengths but discuss the same topic.

In such cases, Cosine Similarity is preferred.

Formula:

$\cos(\theta)=\frac{A\cdot B}{||A||||B||}$

Intuition

Measures:


Angle Between Vectors

rather than physical distance.

Interpretation

Cosine Similarity	Meaning
1	Identical Direction
0	Unrelated
-1	Opposite Direction

Applications

Search Engines
NLP
Recommendation Systems
Text Similarity

Choosing the Right Distance Metric

There is no universally best metric.

Choice depends on:

Data type
Problem domain
Feature characteristics

Numerical Data

Preferred:


Euclidean Distance

Grid-Based Problems

Preferred:


Manhattan Distance

Binary Data

Preferred:


Hamming Distance

Text Data

Preferred:


Cosine Similarity

Example: KNN Classification

Suppose:

Training Points:


A
A
B
B

New Point:

KNN:

Calculates distance to all points.
Sorts distances.
Selects nearest neighbors.
Performs voting.

The distance metric directly determines which neighbors are selected.

Effect of Different Metrics

The same dataset can produce different predictions depending on the chosen metric.

Example:


Euclidean → Class A

Manhattan → Class B

This is why metric selection is important.

Python Example: Euclidean Distance


from scipy.spatial.distance import euclidean

distance = euclidean(
    [2,3],
    [5,7]
)

print(distance)

Python Example: Manhattan Distance


from scipy.spatial.distance import cityblock

distance = cityblock(
    [2,3],
    [5,7]
)

Python Example: Hamming Distance


from scipy.spatial.distance import hamming

distance = hamming(
    [1,0,1,1],
    [1,0,0,1]
)

Python Example: Cosine Similarity


from sklearn.metrics.pairwise import cosine_similarity

Real-World Applications

Recommendation Systems

Finding users with similar preferences.

Face Recognition

Comparing facial feature vectors.

Search Engines

Finding relevant documents.

Fraud Detection

Identifying transactions similar to known frauds.

Medical Diagnosis

Finding patients with similar characteristics.

Common Mistakes

Ignoring Feature Scaling

Different feature scales distort distance calculations.

Using Euclidean Distance Everywhere

Other metrics may be more suitable.

Mixing Numerical and Categorical Data

Distance calculations become unreliable.

Best Practices

Scale numerical features
Understand data characteristics
Experiment with multiple metrics
Use cross-validation
Consider domain knowledge

Distance Metrics Summary

Metric	Best For
Euclidean	Continuous Numerical Data
Manhattan	Grid-Based Problems
Minkowski	Generalized Distance
Hamming	Binary/Categorical Data
Cosine Similarity	Text and Vector Similarity

Why Distance Metrics are Important

Distance metrics form the foundation of similarity-based Machine Learning algorithms. KNN, clustering methods, recommendation systems, and many modern AI applications depend heavily on how similarity is measured.

Choosing the right distance metric can dramatically affect model performance because it determines which observations are considered neighbors. Understanding these metrics helps practitioners select appropriate algorithms, preprocess data correctly, and build more reliable Machine Learning systems.

In the next article, we will study Choosing K in K-Nearest Neighbors, one of the most important decisions in KNN that directly affects model accuracy, overfitting, and underfitting.