What Is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional, labeled data structure used to store data in rows and columns, similar to a table in a database or a spreadsheet like Excel. It is the most commonly used data structure in Pandas for data analysis and manipulation.
Each column in a DataFrame is a Pandas Series, and each column can have a different data type.
Why Use Pandas DataFrames?
Pandas DataFrames are popular because they:
-
Handle structured/tabular data efficiently
-
Support row and column labels
-
Allow easy filtering, sorting, and aggregation
-
Handle missing data (NaN)
-
Work seamlessly with CSV, Excel, JSON, and SQL data
Creating a Pandas DataFrame
1. Create DataFrame from a Dictionary
Python
import pandas as pd data = { "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["Delhi", "Mumbai", "Chennai"] } df = pd.DataFrame(data) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai2. Create DataFrame from a List of Lists
Python
import pandas as pd data = [ ["Alice", 25, "Delhi"], ["Bob", 30, "Mumbai"], ["Charlie", 35, "Chennai"] ] df = pd.DataFrame(data, columns=["Name", "Age", "City"]) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai3. Create DataFrame from a List of Dictionaries
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df) # Output: # Name Age # 0 Alice 25 # 1 Bob 30 # 2 Charlie 35Accessing Data in a DataFrame
Access Columns
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df["Name"]) # Output: # Name # 0 Alice # 1 Bob # 2 Charlie #Name: Name, dtype: objectAccess Multiple Columns
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df[["Name", "Age"]]) # Output: # Name Age # 0 Alice 25 # 1 Bob 30 # 2 Charlie 35Access Rows by Index
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df.loc[1]) # Output: # Name Bob # Age 30 # City Mumbai # Name: 1, dtype: objectAccess Rows by Position
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) print(df.iloc[0]) # Output: # Name Alice # Age 25 # City Delhi # Name: 0, dtype: objectBasic DataFrame Operations
Add a New Column
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df["Salary"] = [50000, 60000, 70000] print(df) # Output: # Name Age City Salary # 0 Alice 25 Delhi 50000 # 1 Bob 30 Mumbai 60000 # 2 Charlie 35 Chennai 70000Add a New Row
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df.loc[3] = ["David", 28, "Pune", 55000] print(df) # Output: # Name Age City Salary # 0 Alice 25 Delhi 50000 # 1 Bob 30 Mumbai 60000 # 2 Charlie 35 Chennai 70000 # 3 David 28 Pune 55000Remove a Column
Python
import pandas as pd data = [ {"Name": "Alice", "Age": 25}, {"Name": "Bob", "Age": 30}, {"Name": "Charlie", "Age": 35} ] df = pd.DataFrame(data) df = df.drop("Salary", axis=1) print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 Chennai # 3 David 28 PuneDataFrame Attributes
| Attribute | Description |
|---|---|
df.shape | Rows and columns count |
df.columns | Column labels |
df.index | Row labels |
df.dtypes | Data types of columns |
df.info() | Summary of DataFrame |
Key Points to Remember
-
DataFrame is a 2D labeled structure
-
Each column is a Pandas Series
-
Supports mixed data types
-
Essential for data analysis tasks
-
Highly flexible and powerful