Pandas Analyzing Data

Last updated: Jan 25, 2026

Author :

Lingeshk K V

What Does Analyzing Data Mean in Pandas?

Analyzing data in Pandas means exploring, inspecting, and understanding a dataset before deeper processing or modeling. This step helps you answer questions like:

What does the data contain?
Are there missing or duplicate values?
What are the data types?
What patterns or distributions exist?

Pandas provides simple yet powerful functions to perform Exploratory Data Analysis (EDA).

Load Sample Data

Python

import pandas as pd df = pd.read_csv("data.csv")

1. View the Data

First and Last Rows

Python

print(df.head()) # Output: # Shows first 5 rows

Python

print(df.tail()) # Output: # Shows last 5 rows

Random Sample

Python

print(df.sample(3)) # Output: # Displays 3 random rows

2. Understand Dataset Structure

Dataset Information

Python

print(df.info()) # Output: # Column names, non-null count, data types

This helps identify:

Number of rows and columns
Data types
Missing values

Shape of Dataset

Python

print(df.shape) # Output: # (rows, columns)

Column Names

Python

print(df.columns) # Output: # Index of column names

3. Summary Statistics

Describe Numerical Data

Python

print(df.describe()) # Output: # count, mean, std, min, 25%, 50%, 75%, max

Describe All Columns

Python

print(df.describe(include="all")) # Output: # Summary for numeric and non-numeric columns

4. Analyze Individual Columns

Python

print(df["Age"].mean()) print(df["Salary"].max()) # Output: # Mean and max values

Value Counts

Python

print(df["City"].value_counts()) # Output: # Frequency of each category

5. Check Missing Values

Python

print(df.isnull()) # Output: # Boolean DataFrame

Python

print(df.isnull().sum()) # Output: # Count of missing values per column

6. Handle Duplicates

Python

print(df.duplicated()) # Output: # Boolean series

Python

print(df.duplicated().sum()) # Output: # Number of duplicate rows

7. Correlation Analysis

Correlation shows relationships between numerical columns.

Python

print(df.corr()) # Output: # Correlation matrix

Values range from:

+1 → strong positive correlation
0 → no correlation
-1 → strong negative correlation

8. Sorting Data

Python

print(df.sort_values(by="Age")) # Output: # Sorted by Age
print(df.sort_values(by="Salary", ascending=False)) # Output: # Sorted descending

9. Filtering Data

Python

print(df[df["Age"] > 30]) # Output: # Rows where Age > 30

10. Detect Outliers

Python

print(df["Salary"].describe()) # Output: # Helps identify extreme values

Key Points to Remember

Always analyze data before cleaning or modeling
Use head(), info(), and describe() first
Check missing values and duplicates early
Understand column distributions and relationships
Pandas makes EDA fast and simple