What Is “Wrong Format” in Pandas?
Wrong format refers to data stored in an incorrect data type or structure. Common examples include:
-
Numbers stored as strings
-
Dates stored as plain text
-
Mixed values in a single column
-
Extra symbols (₹, %, commas) in numeric fields
-
Inconsistent text casing or spacing
Cleaning wrong formats is essential before analysis or modeling.
Why Fixing Wrong Formats Is Important
Wrong formats can:
-
Break calculations and comparisons
-
Cause errors in sorting and filtering
-
Produce incorrect statistics
-
Prevent proper date/time operations
Correct formats ensure accurate analysis and reliable results.
Load Sample Data
import pandas as pd df = pd.read_csv("data.csv") print(df.info()) # Output: # Shows columns with incorrect data types1. Fix Numeric Columns Stored as Strings
Convert String to Integer or Float
df["Age"] = df["Age"].astype(int) df["Salary"] = df["Salary"].astype(float) print(df.dtypes) # Output: # Correct numeric data typesHandle Invalid Numeric Values
If a column contains non-numeric text:
df["Marks"] = pd.to_numeric(df["Marks"], errors="coerce") print(df) # Output: # Invalid values converted to NaN2. Remove Symbols from Numeric Data
Columns may include currency symbols, commas, or percentages.
df["Salary"] = df["Salary"].str.replace(",", "") df["Salary"] = df["Salary"].str.replace("₹", "") df["Salary"] = df["Salary"].astype(float) print(df["Salary"]) # Output: # Clean numeric salary values3. Fix Date Columns
Convert String to Date
df["JoinDate"] = pd.to_datetime(df["JoinDate"]) print(df["JoinDate"].dtype) # Output: # datetime64[ns]Handle Invalid Dates
df["DOB"] = pd.to_datetime(df["DOB"], errors="coerce") print(df) # Output: # Invalid dates converted to NaT4. Standardize Text Format
Convert to Lowercase / Uppercase
df["City"] = df["City"].str.lower() print(df["City"]) # Output: # All values in lowercaseRemove Extra Spaces
df["Name"] = df["Name"].str.strip() print(df["Name"]) # Output: # Leading and trailing spaces removed5. Fix Boolean Columns
Boolean values may appear as "Yes", "No", "Y", "N".
df["Active"] = df["Active"].replace({"Yes": True, "No": False}) print(df["Active"]) # Output: # Boolean values6. Fix Mixed Data Types in a Column
df["Score"] = pd.to_numeric(df["Score"], errors="coerce") print(df["Score"]) # Output: # Non-numeric values converted to NaN7. Rename Columns for Consistency
df.columns = df.columns.str.lower().str.replace(" ", "_") print(df.columns) # Output: # Clean, consistent column names8. Verify Cleaned Formats
Always recheck after cleaning:
print(df.info()) print(df.head()) # Output: # Updated data types and valuesBest Practices for Fixing Wrong Formats
-
Always inspect data using
info()andhead() -
Fix formats before analysis
-
Use
errors="coerce"for safe conversions -
Standardize text early
-
Validate results after cleaning