In the previous articles, we learned that Linear Regression fits a straight line through data.
This works well when the relationship between features and the target variable is approximately linear.
However, many real-world relationships are not straight lines.
Consider the following examples:
- House prices may increase rapidly in premium locations.
- Product sales may accelerate after a certain advertising budget.
- Student performance may improve quickly initially and then plateau.
- Population growth often follows curved patterns.
In such situations, a straight line is unable to capture the true relationship.
This is where Polynomial Regression becomes useful.
Polynomial Regression extends Linear Regression by introducing polynomial terms that allow the model to fit curves instead of straight lines.
Despite its name, Polynomial Regression is still a type of Linear Regression because the model remains linear in its parameters.
In this article, we will understand Polynomial Regression from first principles, learn how it works, explore its advantages and limitations, and implement it using Python.
Why Linear Regression Sometimes Fails
Consider the dataset:
| Study Hours | Marks |
|---|---|
| 1 | 20 |
| 2 | 35 |
| 3 | 55 |
| 4 | 70 |
| 5 | 80 |
If we plot these points:
Marks
^
|
| *
| *
| *
| *
+---------------->
Study Hours
The relationship appears curved rather than perfectly linear.
A straight line may not fit well.
Understanding Non-Linear Relationships
A non-linear relationship means:
A change in the input does not produce a constant change in the output.
Example:
| Experience | Salary |
|---|---|
| 1 | 3 |
| 2 | 5 |
| 3 | 9 |
| 4 | 15 |
Salary grows faster as experience increases.
This pattern is difficult to represent using a straight line.
What is Polynomial Regression?
Polynomial Regression extends Linear Regression by adding higher-order powers of features.
Instead of:
we use:
or
These additional terms allow the model to fit curves.
Polynomial Features
Polynomial Regression creates new features automatically.
Original Feature:
| x |
|---|
| 2 |
Polynomial Features:
| x | x² | x³ |
|---|---|---|
| 2 | 4 | 8 |
The model uses these transformed features to learn non-linear relationships.
Understanding Polynomial Degree
The degree determines the highest power used.
Degree 1
Equation:
Equivalent to Linear Regression.
Degree 2 (Quadratic)
Equation:
Produces a curved relationship.
Degree 3 (Cubic)
Equation:
Can model more complex curves.
Visualizing Different Degrees
Degree 1:
---------
Straight line.
Degree 2:
*
*
*
*
*
Parabolic curve.
Degree 3:
*
*
*
*
*
More flexible curve.
Why Polynomial Regression Works
Suppose the true relationship is:
A straight line cannot represent this.
Polynomial Regression introduces:
allowing the model to capture the pattern accurately.
Important Clarification
Many beginners think Polynomial Regression is a non-linear model.
Technically:
The relationship between input and output becomes non-linear.
However:
The model remains linear in the coefficients.
Example:
The coefficients:
still appear linearly.
Therefore, it belongs to the Linear Regression family.
Example Dataset
Suppose:
| Area | Price |
|---|---|
| 1000 | 50 |
| 1500 | 70 |
| 2000 | 110 |
| 2500 | 170 |
Price increases faster as area grows.
A straight line may underestimate larger houses.
Polynomial Regression can capture this accelerating growth.
Creating Polynomial Features
Python:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(
degree=2
)
X_poly = poly.fit_transform(X)
What Happens Internally?
Original Feature:
X = [2]
Transformed Features:
[1, 2, 4]
Representing:
Training Polynomial Regression
Polynomial Regression still uses Linear Regression.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(
X_poly,
y
)
Making Predictions
predictions = model.predict(
X_poly
)
Visualizing Polynomial Regression
import matplotlib.pyplot as plt
plt.scatter(X, y)
plt.plot(
X,
model.predict(X_poly)
)
plt.show()
The resulting curve often fits the data much better than a straight line.
Comparing Linear and Polynomial Regression
Suppose:
True relationship:
*
*
*
*
*
*
Linear Regression:
------------
Poor fit.
Polynomial Regression:
*
*
*
*
*
*
Better fit.
Feature Expansion Example
Degree 2:
Original:
Transformed:
Degree 3:
Degree 4:
More flexibility is added as degree increases.
The Danger of High Degrees
Higher-degree polynomials can fit training data extremely well.
Example:
Degree 15
The model may pass through every training point.
However:
This often leads to:
Overfitting
Underfitting vs Good Fit vs Overfitting
Low Degree:
-----------
Underfitting.
Optimal Degree:
*
* *
* *
Good fit.
Very High Degree:
/\_/\/\_/\
Overfitting.
Choosing the Right Degree
There is no universal answer.
Common practice:
- Degree 2
- Degree 3
- Degree 4
Evaluate performance using:
- Cross Validation
- RMSE
- R² Score
Advantages of Polynomial Regression
- Captures non-linear relationships
- Easy to implement
- More flexible than Linear Regression
- Works well for moderately complex patterns
Limitations of Polynomial Regression
- Sensitive to outliers
- Risk of overfitting
- Poor extrapolation outside training range
- Complexity increases rapidly with degree
Real-World Applications
House Price Prediction
Property prices may increase non-linearly.
Population Growth
Population often follows curved growth patterns.
Sales Forecasting
Advertising and sales relationships are rarely perfectly linear.
Manufacturing
Machine performance may change non-linearly with temperature or pressure.
Polynomial Regression and Feature Engineering
Polynomial Regression automatically performs a form of feature engineering.
Original Feature:
Generated Features:
This allows the model to discover more complex patterns.
Evaluating Polynomial Regression
Common metrics include:
- MAE
- RMSE
- R² Score
- Adjusted R²
These help determine whether the chosen degree improves performance.
Example Workflow
Collect Data
↓
Visualize Relationship
↓
Detect Non-Linearity
↓
Generate Polynomial Features
↓
Train Linear Regression
↓
Evaluate Performance
↓
Choose Best Degree
Common Mistakes
Using Very High Degrees
Higher degree does not always mean better performance.
Ignoring Overfitting
Always evaluate on unseen test data.
Skipping Visualization
Plots often reveal whether polynomial relationships exist.
Extrapolating Too Far
Polynomial curves can behave unpredictably outside the training range.
Best Practices
- Start with Linear Regression first
- Visualize the data
- Increase polynomial degree gradually
- Use cross validation
- Monitor overfitting carefully
- Compare with simpler models
Polynomial Regression vs Linear Regression
| Linear Regression | Polynomial Regression |
|---|---|
| Straight Line | Curved Relationship |
| Simpler | More Flexible |
| Lower Risk of Overfitting | Higher Risk of Overfitting |
| Easier Interpretation | More Complex Interpretation |
Why Polynomial Regression is Important
Many real-world relationships are not perfectly linear. While Linear Regression provides a simple baseline, it often struggles to model curved patterns present in real data.
Polynomial Regression bridges the gap between simple linear models and more advanced machine learning algorithms by introducing controlled non-linearity. It allows models to capture richer relationships while remaining mathematically intuitive and relatively easy to implement.
In the next article, we will study Regularization, a powerful technique used to prevent overfitting and improve the generalization ability of regression models.