What Does Analyzing Data Mean in Pandas?

Analyzing data in Pandas means exploring, inspecting, and understanding a dataset before deeper processing or modeling. This step helps you answer questions like:

  • What does the data contain?

  • Are there missing or duplicate values?

  • What are the data types?

  • What patterns or distributions exist?

Pandas provides simple yet powerful functions to perform Exploratory Data Analysis (EDA).

Load Sample Data

Python
import pandas as pd df = pd.read_csv("data.csv")

1. View the Data

First and Last Rows

Python
print(df.head()) # Output: # Shows first 5 rows
Python
print(df.tail()) # Output: # Shows last 5 rows

Random Sample

Python
print(df.sample(3)) # Output: # Displays 3 random rows

2. Understand Dataset Structure

Dataset Information

Python
print(df.info()) # Output: # Column names, non-null count, data types

This helps identify:

  • Number of rows and columns

  • Data types

  • Missing values

Shape of Dataset

Python
print(df.shape) # Output: # (rows, columns)

Column Names

Python
print(df.columns) # Output: # Index of column names

3. Summary Statistics

Describe Numerical Data

Python
print(df.describe()) # Output: # count, mean, std, min, 25%, 50%, 75%, max

Describe All Columns

Python
print(df.describe(include="all")) # Output: # Summary for numeric and non-numeric columns

4. Analyze Individual Columns

Python
print(df["Age"].mean()) print(df["Salary"].max()) # Output: # Mean and max values

Value Counts

Python
print(df["City"].value_counts()) # Output: # Frequency of each category

5. Check Missing Values

Python
print(df.isnull()) # Output: # Boolean DataFrame
Python
print(df.isnull().sum()) # Output: # Count of missing values per column

6. Handle Duplicates

Python
print(df.duplicated()) # Output: # Boolean series
Python
print(df.duplicated().sum()) # Output: # Number of duplicate rows

7. Correlation Analysis

Correlation shows relationships between numerical columns.

Python
print(df.corr()) # Output: # Correlation matrix

Values range from:

  • +1 → strong positive correlation

  • 0 → no correlation

  • -1 → strong negative correlation

8. Sorting Data

Python
print(df.sort_values(by="Age")) # Output: # Sorted by Age
print(df.sort_values(by="Salary", ascending=False)) # Output: # Sorted descending

9. Filtering Data

Python
print(df[df["Age"] > 30]) # Output: # Rows where Age > 30

10. Detect Outliers 

Python
print(df["Salary"].describe()) # Output: # Helps identify extreme values

Key Points to Remember

  • Always analyze data before cleaning or modeling

  • Use head(), info(), and describe() first

  • Check missing values and duplicates early

  • Understand column distributions and relationships

  • Pandas makes EDA fast and simple