Introduction

One of the primary goals of Machine Learning is to discover hidden patterns and structures within data. While supervised learning relies on labeled datasets, many real-world datasets do not contain predefined labels. In such situations, Unsupervised Learning techniques are used to uncover meaningful relationships within the data.

One of the most important and widely used unsupervised learning techniques is Clustering.

Clustering refers to the process of grouping similar data points together so that observations within the same group are more similar to each other than to observations in other groups.

Among the many clustering algorithms available, K-Means Clustering is one of the simplest, fastest, and most popular methods. It is widely used in customer segmentation, image compression, recommendation systems, document clustering, fraud detection, and market analysis.

The central idea behind K-Means is to divide a dataset into K distinct clusters by assigning each data point to the nearest cluster center.

In this article, we will explore K-Means Clustering in detail, understand its intuition, learn how it works, examine its mathematical foundation, discuss the Elbow Method, and explore its applications and limitations.

What is Clustering?

Clustering is an unsupervised learning technique used to group similar observations together.

The objective is:

High Similarity
Within Clusters

Low Similarity
Between Clusters

Unlike supervised learning, clustering does not require labeled data.

Instead, the algorithm discovers patterns directly from the dataset.

For example, a retail company may have customer information but no predefined customer categories. Clustering can automatically identify groups of customers with similar characteristics.

What is K-Means Clustering?

K-Means Clustering is a centroid-based clustering algorithm that partitions a dataset into K distinct clusters.

The algorithm attempts to:

Minimize
Within-Cluster Variation

while ensuring that clusters remain as distinct as possible.

The letter:

K

represents the number of clusters that the user wants to create.

For example:

K = 3

means the dataset will be divided into three clusters.

Each cluster is represented by a central point called a centroid.

Why is it Called K-Means?

The name consists of two parts.

K

Represents the number of clusters.

Example:

K = 2

creates two clusters.

K = 5

creates five clusters.

Means

Refers to the arithmetic mean of the points belonging to a cluster.

The cluster center is calculated as:

Average Position
Of Cluster Points

This average becomes the centroid.

Thus, the algorithm is called:

K-Means

because it creates K clusters using mean values as cluster centers.

Understanding the Intuition Behind K-Means

Imagine a shopping mall that wants to segment customers based on:

  • Annual Income
  • Spending Score

If we plot customers on a graph, natural groups may emerge.

For example:

Budget Customers

Regular Customers

Premium Customers

Instead of manually identifying these groups, K-Means automatically discovers them.

The algorithm repeatedly adjusts cluster centers until similar customers are grouped together.

What is a Centroid?

A centroid is the center of a cluster.

It represents the average position of all points belonging to that cluster.

Suppose a cluster contains the following values:

CustomerSpending Score
A40
B50
C60

The centroid is:

(40 + 50 + 60) / 3

which equals:

50

In higher dimensions, centroids are calculated similarly for all features.

How K-Means Works

The K-Means algorithm follows an iterative process.

Step 1: Choose K

The user selects the number of clusters.

Example:

K = 3

The algorithm will create three clusters.

Step 2: Initialize Centroids

K points are selected as initial cluster centers.

These may be chosen:

  • Randomly
  • Using advanced initialization methods such as K-Means++

Example:

Centroid 1

Centroid 2

Centroid 3

Step 3: Assign Data Points

Each observation is assigned to the nearest centroid.

Distance is usually measured using:

Euclidean Distance

The closest centroid determines cluster membership.

Step 4: Update Centroids

After assignments are made, new centroids are calculated.

Each centroid becomes the mean of all points assigned to its cluster.

Step 5: Repeat

The assignment and update steps are repeated until:

Centroids Stop Changing

or

Maximum Iterations Reached

The algorithm then converges.

Example of K-Means Clustering

Suppose we have six data points:

PointX Coordinate
A2
B3
C4
D10
E11
F12

Assume:

K = 2

Initial centroids:

3

11

The algorithm assigns:

Cluster 1:

2

3

4

Cluster 2:

10

11

12

New centroids become:

3

11

Since centroids no longer change, the algorithm converges.

Distance Measurement in K-Means

K-Means primarily uses Euclidean Distance.

The Euclidean distance between two points is:

This formula measures the straight-line distance between observations.

The nearest centroid determines cluster membership.

Objective Function of K-Means

K-Means attempts to minimize:

Within-Cluster Sum Of Squares
(WCSS)

WCSS measures how compact the clusters are.

The objective function is:

Where:

  • K is the number of clusters
  • CiC_i is a cluster
  • μi\mu_i is the cluster centroid
  • xx is a data point

The algorithm seeks centroids that minimize this value.

Convergence of K-Means

During each iteration:

  • Cluster assignments improve.
  • WCSS decreases.
  • Centroids become more stable.

Eventually:

No Significant Changes

occur.

At this point, the algorithm converges.

Although convergence is guaranteed, the solution may not always be globally optimal.

Choosing the Optimal K

One of the biggest challenges in K-Means is selecting the appropriate number of clusters.

Several methods are used.

Elbow Method

The most common approach.

The Elbow Method plots:

K
vs
WCSS

The point where the curve begins to flatten is chosen as the optimal K.

Silhouette Score

Measures how well-separated clusters are.

Higher values indicate better clustering.

Gap Statistic

Compares clustering performance with random data.

These methods help determine an appropriate number of clusters.

K-Means++

Random centroid initialization may lead to poor solutions.

K-Means++ improves initialization by selecting centroids more strategically.

Advantages include:

  • Faster convergence
  • Better clustering quality
  • Reduced sensitivity to random initialization

Most modern implementations use K-Means++ by default.

Advantages of K-Means Clustering

Simple to Understand

Easy to learn and implement.

Computationally Efficient

Works well on large datasets.

Fast Convergence

Usually requires only a few iterations.

Scalable

Can handle thousands or millions of observations.

Widely Supported

Available in most machine learning libraries.

Limitations of K-Means Clustering

Requires K in Advance

The number of clusters must be specified beforehand.

Sensitive to Initialization

Different initial centroids may produce different results.

Assumes Spherical Clusters

Performs best when clusters are roughly circular.

Sensitive to Outliers

Extreme values can distort centroid positions.

Equal Cluster Size Assumption

May struggle when clusters vary significantly in size.

K-Means vs Hierarchical Clustering

FeatureK-MeansHierarchical Clustering
Number of Clusters Required InitiallyYesNo
ScalabilityHighLower
SpeedFasterSlower
OutputCluster LabelsDendrogram
Suitable for Large DatasetsYesLimited
InterpretabilityModerateHigh

K-Means is generally preferred for large datasets, while Hierarchical Clustering provides richer structural insights.

Applications of K-Means Clustering

K-Means is used extensively across industries.

Customer Segmentation

Grouping customers with similar purchasing behavior.

Recommendation Systems

Identifying similar users and products.

Image Compression

Reducing image colors by clustering pixels.

Document Clustering

Grouping documents by topic.

Fraud Detection

Identifying unusual transaction groups.

Healthcare

Patient segmentation and disease analysis.

Social Network Analysis

Discovering communities within networks.

Real-World Example: Customer Segmentation

Suppose a retail company collects:

  • Age
  • Income
  • Spending Score

Applying K-Means may reveal groups such as:

Budget Customers

Lower spending behavior.

Regular Customers

Moderate spending patterns.

Premium Customers

High spending behavior.

These clusters can help businesses design targeted marketing campaigns and improve customer engagement.

K-Means Workflow

The complete workflow can be summarized as:

Dataset

Choose K

Initialize Centroids

Assign Points To Nearest Centroid

Update Centroids

Repeat Until Convergence

Final Clusters

This iterative process enables K-Means to discover hidden structures within data.