In the previous article, we learned how KNN uses Distance Metrics to find the nearest neighbors of a data point.

However, another important question remains:

How many neighbors should KNN consider?

This number is represented by:

KK

and it is one of the most important hyperparameters in the KNN algorithm.

A poor choice of K can significantly reduce model performance.

If K is too small, the model becomes sensitive to noise.

If K is too large, the model may ignore important local patterns.

Choosing the right value of K is therefore essential for building an accurate KNN model.

In this article, we will understand the impact of K, explore the tradeoff between small and large values, learn practical selection methods, and see how K affects overfitting and underfitting.

What is K in KNN?

K represents the number of nearest neighbors considered when making a prediction.

Example:

K = 3

The algorithm:

  1. Finds the 3 nearest neighbors.
  2. Looks at their labels.
  3. Makes a prediction based on those labels.

Example

Suppose the nearest neighbors are:

Pass
Pass
Fail

Votes:

Pass = 2

Fail = 1

Prediction:

Pass

Why K Matters

Consider the same data point.

Using:

K = 1

Prediction depends on only one neighbor.

Using:

K = 25

Prediction depends on twenty-five neighbors.

Clearly:

The prediction can change dramatically.

Understanding Small K Values

Suppose:

K = 1

The algorithm simply copies the label of the nearest point.

Example

Nearest Neighbor:

Fraud

Prediction:

Fraud

No other points are considered.

Advantages of Small K

  • Captures local patterns
  • Very flexible decision boundaries
  • Low bias

Disadvantages of Small K

  • Sensitive to noise
  • Sensitive to outliers
  • High variance
  • Overfitting risk

Visualizing K = 1

A A A

A ? B

B B B

The prediction depends entirely on the closest point.

A single noisy observation can change the outcome.

Understanding Large K Values

Suppose:

K = 25

Now the algorithm considers many neighbors.

Prediction becomes more stable.

Advantages of Large K

  • Less sensitive to noise
  • More stable predictions
  • Lower variance

Disadvantages of Large K

  • May ignore local structure
  • High bias
  • Underfitting risk

Example

Nearest Neighbors:

A A A A A
A A A A B
B B B B B

Majority:

A

Even if nearby points suggest B, the large neighborhood may dominate.

Small K vs Large K

Small KLarge K
High VarianceLow Variance
Low BiasHigh Bias
Overfitting RiskUnderfitting Risk
Sensitive to NoiseStable
Complex BoundarySmooth Boundary

K and Overfitting

Consider:

K = 1

The model memorizes training data.

Every training point creates its own decision region.

Result:

  • Very low training error
  • High test error

This is overfitting.

Visualizing Overfitting

/\/\/\/\/\/\/\

Highly irregular decision boundaries.

K and Underfitting

Consider:

K = 100

The model averages over a very large region.

Local patterns disappear.

Result:

  • High training error
  • High test error

This is underfitting.

Visualizing Underfitting

-------------

Overly simple decision boundary.

Finding the Sweet Spot

The goal is:

Not Too Small

Not Too Large

Choose a K that balances:

  • Bias
  • Variance

This often produces the best generalization.

Example Dataset

Suppose:

K = 1
Accuracy = 82%

K = 3
Accuracy = 88%

K = 5
Accuracy = 92%

K = 7
Accuracy = 91%

K = 15
Accuracy = 86%

Best K:

K = 5

because it gives the highest validation accuracy.

Odd vs Even Values of K

For classification, odd values are often preferred.

Example:

K = 4

Votes:

Pass
Pass
Fail
Fail

Tie:

2 vs 2

No clear winner.

Using Odd Values

Example:

K = 5

Votes:

Pass
Pass
Pass
Fail
Fail

Winner:

Pass

No tie occurs.

Rule of Thumb

A commonly used heuristic:

KNK \approx \sqrt{N}

Where:

NN

is the number of training samples.

Example

Training Samples:

N=100N=100

Then:

K=100K=\sqrt{100} K=10K=10

This serves as a starting point.

Important Note

The square-root rule is only a guideline.

The optimal K must be determined experimentally.

Using Validation Data

The most common approach:

Train Data

Try Different K Values

Evaluate on Validation Set

Choose Best K

Example

Test:

K = 1
K = 3
K = 5
K = 7
K = 9
K = 11

Select the value with the highest validation performance.

Cross Validation for Choosing K

Cross Validation provides a more reliable estimate.

Workflow:

K = 1 → Evaluate

K = 3 → Evaluate

K = 5 → Evaluate

K = 7 → Evaluate

Choose Best

This reduces dependence on a single train-test split.

Visualizing K Selection

Accuracy
^
|
| *
| * *
| *
|*
+---------------->
1 3 5 7 9 11
K

Peak accuracy indicates the best K.

K in Regression Problems

For KNN Regression:

Prediction is based on averaging.

Example:

Neighbor House Prices:

45
50
55

Average:

5050

The value of K still affects:

  • Smoothness
  • Stability
  • Prediction accuracy

Weighted KNN

Standard KNN treats all neighbors equally.

Example:

Neighbor 1
Neighbor 2
Neighbor 3

Each receives one vote.

Weighted KNN gives more importance to closer neighbors.

This can reduce sensitivity to K.

Effect of Dataset Size

Small Dataset

Smaller K values often work well.

Large Dataset

Larger K values may become more appropriate.

More training samples create denser neighborhoods.

Effect of Noise

Noisy datasets benefit from:

Larger K

because averaging reduces noise impact.

Effect of Class Imbalance

Suppose:

90% Class A

10% Class B

Large K values may favor the majority class.

This can hurt minority-class detection.

Python Example

Basic KNN:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(
n_neighbors=5
)

Testing Multiple K Values

for k in range(1, 21):
model = KNeighborsClassifier(
n_neighbors=k
)

Using Grid Search

from sklearn.model_selection import GridSearchCV

Grid Search automatically finds the best K value.

Real-World Example: Disease Prediction

Suppose:

K = 1

One unusual patient may influence predictions heavily.

Using:

K = 7

creates more reliable predictions.

Real-World Example: Recommendation Systems

Movie recommendations often use:

Moderate K values.

Too few neighbors:

Recommendations become unstable.

Too many neighbors:

Recommendations become generic.

Common Mistakes

Always Using K = 5

There is no universal best value.

Using Very Large K

Important local patterns may disappear.

Using Very Small K

Model becomes sensitive to noise.

Ignoring Cross Validation

Validation is essential for selecting K.

Best Practices

  • Start with the square-root rule
  • Use odd values for classification
  • Perform cross-validation
  • Compare multiple K values
  • Monitor overfitting and underfitting
  • Scale features before training

K Selection Workflow

  1. Prepare data
  2. Scale features
  3. Try multiple K values
  4. Evaluate performance
  5. Select optimal K
  6. Retrain model
  7. Test on unseen data

Choosing K Summary

K ValueBehavior
Very SmallOverfitting Risk
ModerateBalanced Performance
Very LargeUnderfitting Risk

Why Choosing K is Important

The value of K directly controls how KNN learns from its neighbors. Small values make the model highly flexible but sensitive to noise, while large values create stable predictions at the risk of oversimplification.

Selecting the right K is one of the most important steps in building a successful KNN model because it determines the balance between overfitting and underfitting. Proper tuning of K often leads to significant improvements in predictive performance and model reliability.

In the next article, we will explore the Curse of Dimensionality, a fundamental challenge faced by KNN and many distance-based algorithms when working with high-dimensional data.