What Is Correlation in Pandas?

Correlation measures the relationship between two numerical variables—how one variable changes when another changes. In Pandas, correlation helps identify patterns, trends, and dependencies in data, which is crucial for data analysis and feature selection.

Correlation values range from -1 to +1:

  • +1 → Perfect positive correlation

  • 0 → No correlation

  • -1 → Perfect negative correlation

Why Correlation Analysis Is Important

Correlation analysis helps to:

  • Understand relationships between variables

  • Detect multicollinearity in datasets

  • Select important features for machine learning

  • Identify trends and patterns in data

Load Sample Data

Python
import pandas as pd data = { "Height": [150, 160, 170, 180, 190], "Weight": [50, 60, 70, 80, 90], "Age": [20, 25, 30, 35, 40] } df = pd.DataFrame(data) print(df)

1. Calculate Correlation Using corr()

The corr() method computes pairwise correlation between numerical columns.

Python
print(df.corr()) # Output: # Height Weight Age # Height 1.0 1.0 1.0 # Weight 1.0 1.0 1.0 # Age 1.0 1.0 1.0

By default, Pandas uses Pearson correlation.

2. Correlation Between Two Columns

Python
print(df["Height"].corr(df["Weight"])) # Output: # 1.0

This shows a strong positive relationship between height and weight.

3. Types of Correlation Methods in Pandas

Pandas supports multiple correlation methods using the method parameter.

a. Pearson Correlation (Default)

Measures linear relationship between variables.

Python
print(df.corr(method="pearson"))

b. Spearman Correlation

Measures monotonic relationships and works well with ranked data.

Python
print(df.corr(method="spearman"))

c. Kendall Correlation

Measures ordinal association between variables.

Python
print(df.corr(method="kendall"))

4. Handling Missing Values

Pandas automatically ignores missing values (NaN) when calculating correlation.

Python
df.loc[2, "Weight"] = None print(df.corr()) # Output: # Correlation calculated ignoring NaN

5. Interpreting Correlation Values

Correlation ValueMeaning
0.7 to 1.0Strong positive correlation
0.3 to 0.7Moderate positive correlation
0.0 to 0.3Weak correlation
-0.3 to -0.7Moderate negative correlation
-0.7 to -1.0Strong negative correlation

6. Correlation on Selected Columns

print(df[["Height", "Weight"]].corr()) # Output: # Correlation matrix for selected columns

7. Correlation Use Cases

  • Feature selection in machine learning

  • Detecting redundant variables

  • Financial data analysis

  • Scientific and statistical research

Best Practices

  • Correlation does not imply causation

  • Remove outliers before correlation analysis

  • Use Spearman or Kendall for non-linear data

  • Always visualize correlation when possible