What Is a Pandas DataFrame?

A Pandas DataFrame is a two-dimensional, labeled data structure used to store data in rows and columns, similar to a table in a database or a spreadsheet like Excel. It is the most commonly used data structure in Pandas for data analysis and manipulation.

Each column in a DataFrame is a Pandas Series, and each column can have a different data type.

Why Use Pandas DataFrames?

Pandas DataFrames are popular because they:

  • Handle structured/tabular data efficiently

  • Support row and column labels

  • Allow easy filtering, sorting, and aggregation

  • Handle missing data (NaN)

  • Work seamlessly with CSV, Excel, JSON, and SQL data

Creating a Pandas DataFrame

1. Create DataFrame from a Dictionary

Python
import pandas as pd data = { "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["Delhi", "Mumbai", "Chennai"] } df = pd.DataFrame(data) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai

2. Create DataFrame from a List of Lists

Python
import pandas as pd data = [ ["Alice", 25, "Delhi"], ["Bob", 30, "Mumbai"], ["Charlie", 35, "Chennai"] ] df = pd.DataFrame(data, columns=["Name", "Age", "City"]) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai

3. Create DataFrame from a List of Dictionaries

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df) # Output: # Name Age # 0 Alice 25 # 1 Bob 30 # 2 Charlie 35

Accessing Data in a DataFrame

Access Columns

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df["Name"]) # Output: # Name # 0 Alice # 1 Bob # 2 Charlie 
#Name: Name, dtype: object

Access Multiple Columns

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df[["Name", "Age"]]) # Output: # Name Age # 0 Alice 25 # 1 Bob 30 # 2 Charlie 35

Access Rows by Index

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df.loc[1]) # Output: # Name Bob # Age 30 # City Mumbai # Name: 1, dtype: object

Access Rows by Position

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df.iloc[0]) # Output: # Name Alice # Age 25 # City Delhi # Name: 0, dtype: object

Basic DataFrame Operations

Add a New Column

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df["Salary"] = [50000, 60000, 70000] print(df) # Output: # Name Age City Salary # 0 Alice 25 Delhi 50000 # 1 Bob 30 Mumbai 60000 # 2 Charlie 35 Chennai 70000

Add a New Row

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df.loc[3] = ["David", 28, "Pune", 55000] print(df) # Output: # Name Age City Salary # 0 Alice 25 Delhi 50000 # 1 Bob 30 Mumbai 60000 # 2 Charlie 35 Chennai 70000 # 3 David 28 Pune 55000

Remove a Column

Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df = df.drop("Salary", axis=1) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai # 3 David 28 Pune

DataFrame Attributes

AttributeDescription
df.shapeRows and columns count
df.columnsColumn labels
df.indexRow labels
df.dtypesData types of columns
df.info()Summary of DataFrame

Key Points to Remember

  • DataFrame is a 2D labeled structure

  • Each column is a Pandas Series

  • Supports mixed data types

  • Essential for data analysis tasks

  • Highly flexible and powerful