In the previous articles, we learned that Linear Regression fits a straight line through data.

This works well when the relationship between features and the target variable is approximately linear.

However, many real-world relationships are not straight lines.

Consider the following examples:

  • House prices may increase rapidly in premium locations.
  • Product sales may accelerate after a certain advertising budget.
  • Student performance may improve quickly initially and then plateau.
  • Population growth often follows curved patterns.

In such situations, a straight line is unable to capture the true relationship.

This is where Polynomial Regression becomes useful.

Polynomial Regression extends Linear Regression by introducing polynomial terms that allow the model to fit curves instead of straight lines.

Despite its name, Polynomial Regression is still a type of Linear Regression because the model remains linear in its parameters.

In this article, we will understand Polynomial Regression from first principles, learn how it works, explore its advantages and limitations, and implement it using Python.

Why Linear Regression Sometimes Fails

Consider the dataset:

Study HoursMarks
120
235
355
470
580

If we plot these points:

Marks
^
|
| *
| *
| *
| *
+---------------->
Study Hours

The relationship appears curved rather than perfectly linear.

A straight line may not fit well.

Understanding Non-Linear Relationships

A non-linear relationship means:

A change in the input does not produce a constant change in the output.

Example:

ExperienceSalary
13
25
39
415

Salary grows faster as experience increases.

This pattern is difficult to represent using a straight line.

What is Polynomial Regression?

Polynomial Regression extends Linear Regression by adding higher-order powers of features.

Instead of:

y=mx+by=mx+b
mm
bb

we use:

y=β0+β1x+β2x2y=\beta_0+\beta_1x+\beta_2x^2

or

y=β0+β1x+β2x2+β3x3+y=\beta_0+\beta_1x+\beta_2x^2+\beta_3x^3+\cdots

These additional terms allow the model to fit curves.

Polynomial Features

Polynomial Regression creates new features automatically.

Original Feature:

x
2

Polynomial Features:

x
248

The model uses these transformed features to learn non-linear relationships.

Understanding Polynomial Degree

The degree determines the highest power used.

Degree 1

Equation:

y=β0+β1xy=\beta_0+\beta_1x

Equivalent to Linear Regression.

Degree 2 (Quadratic)

Equation:

y=β0+β1x+β2x2y=\beta_0+\beta_1x+\beta_2x^2

Produces a curved relationship.

Degree 3 (Cubic)

Equation:

y=β0+β1x+β2x2+β3x3y=\beta_0+\beta_1x+\beta_2x^2+\beta_3x^3

Can model more complex curves.

Visualizing Different Degrees

Degree 1:

---------

Straight line.

Degree 2:

     *
*
*
*
*

Parabolic curve.

Degree 3:

 *
*
*
*
*

More flexible curve.

Why Polynomial Regression Works

Suppose the true relationship is:

y=x2y=x^2

A straight line cannot represent this.

Polynomial Regression introduces:

x2x^2

allowing the model to capture the pattern accurately.

Important Clarification

Many beginners think Polynomial Regression is a non-linear model.

Technically:

The relationship between input and output becomes non-linear.

However:

The model remains linear in the coefficients.

Example:

y=β0+β1x+β2x2y= \beta_0+ \beta_1x+ \beta_2x^2

The coefficients:

β0,β1,β2\beta_0,\beta_1,\beta_2

still appear linearly.

Therefore, it belongs to the Linear Regression family.

Example Dataset

Suppose:

AreaPrice
100050
150070
2000110
2500170

Price increases faster as area grows.

A straight line may underestimate larger houses.

Polynomial Regression can capture this accelerating growth.

Creating Polynomial Features

Python:

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(
degree=2
)

X_poly = poly.fit_transform(X)

What Happens Internally?

Original Feature:

X = [2]

Transformed Features:

[1, 2, 4]

Representing:

1,x,x21,x,x^2

Training Polynomial Regression

Polynomial Regression still uses Linear Regression.

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(
X_poly,
y
)

Making Predictions

predictions = model.predict(
X_poly
)

Visualizing Polynomial Regression

import matplotlib.pyplot as plt

plt.scatter(X, y)

plt.plot(
X,
model.predict(X_poly)
)

plt.show()

The resulting curve often fits the data much better than a straight line.

Comparing Linear and Polynomial Regression

Suppose:

True relationship:

      *
*
*
*
*
*

Linear Regression:

------------

Poor fit.

Polynomial Regression:

      *
*
*
*
*
*

Better fit.

Feature Expansion Example

Degree 2:

Original:

xx

Transformed:

x,x2x,x^2

Degree 3:

x,x2,x3x,x^2,x^3

Degree 4:

x,x2,x3,x4x,x^2,x^3,x^4

More flexibility is added as degree increases.

The Danger of High Degrees

Higher-degree polynomials can fit training data extremely well.

Example:

Degree 15

The model may pass through every training point.

However:

This often leads to:

Overfitting

Underfitting vs Good Fit vs Overfitting

Low Degree:

-----------

Underfitting.

Optimal Degree:

   *
* *
* *

Good fit.

Very High Degree:

 /\_/\/\_/\

Overfitting.

Choosing the Right Degree

There is no universal answer.

Common practice:

  • Degree 2
  • Degree 3
  • Degree 4

Evaluate performance using:

  • Cross Validation
  • RMSE
  • R² Score

Advantages of Polynomial Regression

  • Captures non-linear relationships
  • Easy to implement
  • More flexible than Linear Regression
  • Works well for moderately complex patterns

Limitations of Polynomial Regression

  • Sensitive to outliers
  • Risk of overfitting
  • Poor extrapolation outside training range
  • Complexity increases rapidly with degree

Real-World Applications

House Price Prediction

Property prices may increase non-linearly.

Population Growth

Population often follows curved growth patterns.

Sales Forecasting

Advertising and sales relationships are rarely perfectly linear.

Manufacturing

Machine performance may change non-linearly with temperature or pressure.

Polynomial Regression and Feature Engineering

Polynomial Regression automatically performs a form of feature engineering.

Original Feature:

xx

Generated Features:

x2,x3,x4x^2,x^3,x^4

This allows the model to discover more complex patterns.

Evaluating Polynomial Regression

Common metrics include:

  • MAE
  • RMSE
  • R² Score
  • Adjusted R²

These help determine whether the chosen degree improves performance.

Example Workflow

Collect Data

Visualize Relationship

Detect Non-Linearity

Generate Polynomial Features

Train Linear Regression

Evaluate Performance

Choose Best Degree

Common Mistakes

Using Very High Degrees

Higher degree does not always mean better performance.

Ignoring Overfitting

Always evaluate on unseen test data.

Skipping Visualization

Plots often reveal whether polynomial relationships exist.

Extrapolating Too Far

Polynomial curves can behave unpredictably outside the training range.

Best Practices

  • Start with Linear Regression first
  • Visualize the data
  • Increase polynomial degree gradually
  • Use cross validation
  • Monitor overfitting carefully
  • Compare with simpler models

Polynomial Regression vs Linear Regression

Linear RegressionPolynomial Regression
Straight LineCurved Relationship
SimplerMore Flexible
Lower Risk of OverfittingHigher Risk of Overfitting
Easier InterpretationMore Complex Interpretation

Why Polynomial Regression is Important

Many real-world relationships are not perfectly linear. While Linear Regression provides a simple baseline, it often struggles to model curved patterns present in real data.

Polynomial Regression bridges the gap between simple linear models and more advanced machine learning algorithms by introducing controlled non-linearity. It allows models to capture richer relationships while remaining mathematically intuitive and relatively easy to implement.

In the next article, we will study Regularization, a powerful technique used to prevent overfitting and improve the generalization ability of regression models.