What Is read_csv() in Pandas?
read_csv() is a Pandas function used to read CSV (Comma-Separated Values) files and load them into a DataFrame. CSV files are one of the most common formats for storing tabular data, and Pandas provides powerful options to handle them efficiently.
Why Use read_csv()?
Using read_csv() allows you to:
-
Load large datasets easily
-
Automatically create a DataFrame
-
Handle headers, separators, and encodings
-
Manage missing values
-
Select specific columns or rows
-
Preprocess data while reading
Basic Syntax
This reads the CSV file and stores the data in a DataFrame called df.
Simple Example
Assume a file students.csv:
Name,Age,City Alice,25,Delhi Bob,30,Mumbai Charlie,35,Chennai import pandas as pd df = pd.read_csv("students.csv") print(df) # Output: # Name Age City # 0 Alice 25 Delhi # 1 Bob 30 Mumbai # 2 Charlie 35 ChennaiReading CSV Without Header
If the CSV file does not contain column headers:
import pandas as pd df = pd.read_csv("students.csv", header=None) print(df) # Output: # 0 1 2 # 0 Name Age City # 1 Alice 25 Delhi # 2 Bob 30 Mumbai # 3 Charlie 35 ChennaiYou can assign column names manually:
Custom Separator
CSV files may use separators other than commas (such as ; or |).
Reading Specific Columns
import pandas as pd df = pd.read_csv("students.csv", usecols=["Name", "Age"]) print(df) # Output: # Name Age # 0 Alice 25 # 1 Bob 30 # 2 Charlie 35Handling Missing Values
Pandas automatically converts these values to NaN.
Skipping Rows
Useful when files contain metadata or comments at the top.
Limiting Rows
Reads only the first 5 rows.
Encoding Issues
Some CSV files require specifying encoding:
Common encodings:
-
utf-8 -
latin1 -
ISO-8859-1
Checking the Loaded Data
Important read_csv() Parameters
| Parameter | Description |
|---|---|
filepath_or_buffer | Path to CSV file |
sep | Column separator |
header | Row number for column names |
names | Custom column names |
usecols | Select specific columns |
skiprows | Skip rows |
nrows | Limit number of rows |
encoding | File encoding |
na_values | Define missing values |
Key Points to Remember
-
read_csv()loads CSV files into DataFrames -
Highly customizable through parameters
-
Handles missing values automatically
-
Supports large datasets
-
Most commonly used Pandas I/O function