What Is Correlation in Pandas?
Correlation measures the relationship between two numerical variables—how one variable changes when another changes. In Pandas, correlation helps identify patterns, trends, and dependencies in data, which is crucial for data analysis and feature selection.
Correlation values range from -1 to +1:
-
+1 → Perfect positive correlation
-
0 → No correlation
-
-1 → Perfect negative correlation
Why Correlation Analysis Is Important
Correlation analysis helps to:
-
Understand relationships between variables
-
Detect multicollinearity in datasets
-
Select important features for machine learning
-
Identify trends and patterns in data
Load Sample Data
import pandas as pd data = { "Height": [150, 160, 170, 180, 190], "Weight": [50, 60, 70, 80, 90], "Age": [20, 25, 30, 35, 40] } df = pd.DataFrame(data) print(df)1. Calculate Correlation Using corr()
The corr() method computes pairwise correlation between numerical columns.
print(df.corr()) # Output: # Height Weight Age # Height 1.0 1.0 1.0 # Weight 1.0 1.0 1.0 # Age 1.0 1.0 1.0By default, Pandas uses Pearson correlation.
2. Correlation Between Two Columns
print(df["Height"].corr(df["Weight"])) # Output: # 1.0This shows a strong positive relationship between height and weight.
3. Types of Correlation Methods in Pandas
Pandas supports multiple correlation methods using the method parameter.
a. Pearson Correlation (Default)
Measures linear relationship between variables.
print(df.corr(method="pearson"))b. Spearman Correlation
Measures monotonic relationships and works well with ranked data.
c. Kendall Correlation
Measures ordinal association between variables.
4. Handling Missing Values
Pandas automatically ignores missing values (NaN) when calculating correlation.
df.loc[2, "Weight"] = None print(df.corr()) # Output: # Correlation calculated ignoring NaN 5. Interpreting Correlation Values
| Correlation Value | Meaning |
|---|---|
0.7 to 1.0 | Strong positive correlation |
0.3 to 0.7 | Moderate positive correlation |
0.0 to 0.3 | Weak correlation |
-0.3 to -0.7 | Moderate negative correlation |
-0.7 to -1.0 | Strong negative correlation |
6. Correlation on Selected Columns
7. Correlation Use Cases
-
Feature selection in machine learning
-
Detecting redundant variables
-
Financial data analysis
-
Scientific and statistical research
Best Practices
-
Correlation does not imply causation
-
Remove outliers before correlation analysis
-
Use Spearman or Kendall for non-linear data
-
Always visualize correlation when possible