File Handling in Python for Machine Learning

Last updated: May 21, 2026

Author :

Christy Harshitha Dakarapu

File Handling is one of the most important concepts in Python programming and Machine Learning because almost every real-world AI system works with external data files.

Machine Learning projects constantly interact with:

datasets,
CSV files,
JSON files,
text files,
images,
model files,
logs,
and configuration files.

Before training Machine Learning models, developers usually need to:

load datasets,
preprocess data,
save results,
store models,
and manage files efficiently.

Python provides powerful built-in tools and libraries for performing file operations easily and efficiently.

Companies such as Google, Amazon, Netflix, Meta, OpenAI, and Tesla heavily rely on file-based workflows for:

data pipelines,
AI training,
logging systems,
distributed processing,
and model deployment.

In this article, we will explore File Handling in Python in detail, understand reading and writing operations, learn file modes, work with CSV and JSON files, manage directories, and implement practical examples for Machine Learning projects.

What is File Handling?

File Handling refers to performing operations on files such as:

reading,
writing,
updating,
deleting,
and managing data stored in files.

Files help store information permanently.

Unlike variables stored in memory, files remain available even after the program ends.

Why File Handling is Important in Machine Learning

Machine Learning projects depend heavily on external data.

Examples:

CSV datasets
JSON APIs
image datasets
model checkpoints
logs
configuration files

File handling enables Machine Learning systems to:

load datasets,
save trained models,
store predictions,
and maintain logs.

Types of Files Commonly Used in Machine Learning

File Type	Usage
TXT	Text data
CSV	Tabular datasets
JSON	Structured data
Excel	Business datasets
Images	Computer Vision
Pickle Files	Saved ML models

Opening Files in Python

Python uses the open() function to work with files.

Syntax:

file = open("example.txt", "r")

Parameters:

file name
mode

File Modes in Python

Mode	Description
r	Read mode
w	Write mode
a	Append mode
x	Create file
b	Binary mode
t	Text mode

Reading Files

Reading Entire File

Reading Line by Line

Reading Specific Number of Characters

Writing Files

Write mode creates or overwrites files.

Appending Data

Append mode adds new content without deleting existing data.

Closing Files

Files should always be closed after usage.

file.close()

Closing files:

releases memory,
prevents corruption,
improves performance.

Using with Statement

The with statement automatically closes files.

with open("data.txt", "r") as file:
    content = file.read()

print(content)

This is the recommended approach.

File Pointer Functions

Python allows moving within files.

tell()

Returns current file position.

with open("data.txt", "r") as file:
    print(file.tell())

seek()

Moves file pointer to a specific position.

with open("data.txt", "r") as file:

    file.seek(5)

    print(file.read())

Working with CSV Files

CSV files are widely used in Machine Learning.

CSV stands for:
Comma-Separated Values.

Example:

name,score
Alice,90
Bob,85

Reading CSV Files Using csv Module

import csv

with open("data.csv", "r") as file:

    reader = csv.reader(file)

    for row in reader:
        print(row)

Writing CSV Files

import csv

with open("output.csv", "w", newline="") as file:

    writer = csv.writer(file)

    writer.writerow(["Name", "Score"])
    writer.writerow(["Alice", 90])

Using Pandas for CSV Files

Pandas simplifies CSV handling significantly.

Reading CSV Using Pandas

import pandas as pd

df = pd.read_csv("data.csv")

print(df.head())

Writing CSV Using Pandas

df.to_csv("output.csv", index=False)

Working with JSON Files

JSON is widely used for APIs and structured data.

Example JSON:

{
    "name": "Alice",
    "score": 90
}

Reading JSON Files

import json

with open("data.json", "r") as file:

    data = json.load(file)

print(data)

Writing JSON Files

import json

data = {
    "name": "Alice",
    "score": 90
}

with open("output.json", "w") as file:

    json.dump(data, file)

Working with Binary Files

Binary files are used for:

images,
audio,
videos,
Machine Learning models.

Reading Binary Files

with open("image.jpg", "rb") as file:

    data = file.read()

Writing Binary Files

with open("copy.jpg", "wb") as file:

    file.write(data)

File Handling with Directories

Python’s os module helps manage directories.

Current Working Directory

import os

print(os.getcwd())

Creating Directories

import os

os.mkdir("datasets")

Listing Files

import os

print(os.listdir())

Checking File Existence

import os

print(os.path.exists("data.csv"))

Deleting Files

import os

os.remove("data.txt")

File Handling in Machine Learning

Machine Learning projects commonly perform:

dataset loading,
model saving,
logging,
prediction storage.

Loading Datasets

import pandas as pd

df = pd.read_csv("dataset.csv")

print(df.head())

Saving Machine Learning Models

Python’s pickle module is commonly used.

Saving Models

import pickle

model_data = {"accuracy": 95}

with open("model.pkl", "wb") as file:

    pickle.dump(model_data, file)

Loading Models

import pickle

with open("model.pkl", "rb") as file:

    model = pickle.load(file)

print(model)

Logging in Machine Learning Projects

Logs help track:

training progress,
errors,
predictions.

Example:

with open("log.txt", "a") as file:

    file.write("Model training started\n")

Advantages of File Handling

Permanent data storage
Dataset management
Easy data sharing
Model persistence
Logging and tracking

Common Challenges in File Handling

Large file sizes
Memory limitations
Corrupted files
Incorrect file paths
Encoding problems

Best Practices for File Handling

Always close files
Use with statement
Handle exceptions properly
Validate file existence
Avoid hardcoded paths

File Handling and Big Data

Large-scale AI systems often process:

terabytes of data,
distributed file systems,
cloud storage.

Technologies include:

Hadoop
Spark
AWS S3
Google Cloud Storage

Real-World Applications of File Handling

Industry	Usage
Healthcare	Medical record storage
Finance	Transaction processing
AI Research	Dataset handling
E-Commerce	Customer data storage
Cybersecurity	Log analysis

File Handling vs Database Systems

File Handling	Databases
Simpler	More scalable
Good for small projects	Better for large systems
Easier setup	Advanced querying
File-based storage	Structured storage

Future of File Management in AI

Modern AI systems increasingly rely on:

cloud storage,
distributed file systems,
real-time data pipelines,
large-scale dataset management.

Understanding File Handling is essential for anyone learning Machine Learning because data loading, storage, preprocessing, and model persistence are fundamental parts of every AI workflow.