Runnable Python Docs: CSV Files

Q: How do I read a CSV file in Python?

Use the built-in csv module. csv.reader gives you each row as a list of strings; csv.DictReader uses the first row as headers and yields each row as a dict. Open the file with newline='' so Python doesn't mangle line endings: with open('data.csv', newline='') as f:.

Q: How do I write a CSV file in Python?

Pair csv.writer with a file opened in write mode and newline=''. Call writer.writerow([...]) for each row or writer.writerows([[...], [...]]) for a batch. For dict-based data, use csv.DictWriter - it handles headers automatically.

Q: Should I use the csv module or pandas?

Use csv for quick reads and writes, files you process row-at-a-time, and when you don't want another dependency. Use pandas when you need to filter, group, or join - or when the file is big enough that vectorised operations matter. They handle the same files; the choice is about what you do with the data after loading it.

Q: Why does my CSV have blank lines between rows on Windows?

You opened the file without newline=''. The csv module writes its own line terminators; without that argument, Python adds extra ones on Windows. Always open CSV files with open(path, newline='') for both reading and writing.

CSV Is Simpler Than It Looks (And Trickier Than You'd Expect)

A CSV file is just text: rows separated by newlines, fields separated by commas. That's the idea. In practice, you hit quoted strings that contain commas, fields with embedded newlines, different regional conventions (comma vs semicolon), files saved from Excel that include a BOM - enough edge cases that writing your own line.split(",") is almost always a mistake.

Python's built-in csv module handles all of it. You'll rarely need anything else for small to medium files.

Reading a CSV With `csv.reader`

csv.reader yields each row as a list of strings:

import csv

with open("people.csv", newline="") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

A few non-obvious details:

Always pass newline="" to open. The csv module handles line endings itself; without this, you get extra blank rows on Windows.
Every value is a string. "42" stays a string until you call int(...) on it. CSV has no types.
The header row is just another row. If your file has headers, either skip the first row manually or switch to DictReader.

Skipping the header row

import csv

with open("people.csv", newline="") as f:
    reader = csv.reader(f)
    headers = next(reader)         # pulls the first row out
    for row in reader:
        print(row)

next(reader) advances the iterator by one and returns that row.

Reading as Dicts With `DictReader`

csv.DictReader treats the first row as headers and gives you each subsequent row as a dict:

import csv

with open("people.csv", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["name"], row["email"])

This is almost always what you want. Column names are self-documenting, and reordering columns in the source file doesn't break your code.

If the file has no headers, pass them explicitly with fieldnames=["name", "email", ...].

Writing a CSV With `csv.writer`

csv.writer turns rows (lists) into CSV lines:

import csv

rows = [
    ["name", "age", "city"],
    ["Rosa", 30, "Lisbon"],
    ["Ada", 36, "London"],
]

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

writerow(row) writes a single row; writerows(rows) writes a whole iterable at once. Both quote fields automatically when they contain commas, quotes, or newlines - you don't have to think about it.

Writing Dicts With `DictWriter`

When your data is already in dict form, DictWriter skips the "convert to list" step:

import csv

people = [
    {"name": "Rosa", "age": 30, "city": "Lisbon"},
    {"name": "Ada", "age": 36, "city": "London"},
]

with open("out.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "age", "city"])
    writer.writeheader()
    writer.writerows(people)

The fieldnames argument controls both the header and the column order. Keys in your dicts that aren't in fieldnames are silently dropped (or you can raise with extrasaction="raise").

Different Delimiters and Quoting

Not every "CSV" uses commas. European locales often use ;, tab-separated files use \t, some systems use |. Pass a delimiter= argument:

import csv

with open("data.tsv", newline="") as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        print(row)

For files with unusual quoting rules, csv.register_dialect(...) lets you configure once and reuse. For most files, the defaults plus delimiter= are enough.

Encoding

CSV files are text - they have encodings. UTF-8 is the modern default; Excel-originated files on Windows sometimes use cp1252 or include a UTF-8 BOM. Be explicit:

with open("data.csv", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    ...

If you see UnicodeDecodeError, the file isn't the encoding you guessed. Try utf-8-sig (handles the Excel BOM), cp1252, or latin-1 as the common suspects.

Turning CSV Rows Into Useful Types

Since every value arrives as a string, parsing is on you:

import csv
from io import StringIO

raw = """name,age,active
Rosa,30,true
Ada,36,false
Malik,28,true
"""

reader = csv.DictReader(StringIO(raw))
people = []

for row in reader:
    people.append({
        "name": row["name"],
        "age": int(row["age"]),
        "active": row["active"].lower() == "true",
    })

print(people)

(StringIO lets us run the example without an actual file - in real code you'd open(path).)

For CSVs with tricky types (dates, nullable numbers, true/false in ten different spellings), consider pandas - it has conventions for most of them baked in.

When to Reach for pandas

pandas.read_csv(path) returns a DataFrame, which is the right structure the moment you want to:

Filter rows: df[df["active"] == True]
Aggregate: df.groupby("city")["age"].mean()
Join with another table
Write back out with simple formatting

import pandas as pd

df = pd.read_csv("people.csv")
adults = df[df["age"] >= 18]
adults.to_csv("adults.csv", index=False)

Pandas is overkill for small, linear reads - and it's a heavy dependency (pip-install it in a virtual environment). But for anything data-shaped, it's the tool most Python analysts reach for.

Streaming Very Large Files

csv.reader is already lazy - it reads one row at a time. Keep it that way by iterating (not calling list(reader) up front), and your memory stays flat regardless of file size:

import csv

with open("huge.csv", newline="") as f:
    reader = csv.DictReader(f)
    error_count = 0
    for row in reader:
        if row["status"] == "error":
            error_count += 1

print(f"Found {error_count} errors.")

That handles a 10 GB file just as happily as a 10 KB one, as long as you don't accumulate the rows into a list.

A Few Habits

Always pass newline="" to open when reading or writing CSVs.
Use DictReader/DictWriter whenever the file has headers - more readable than integer indexes.
Be explicit about encoding, especially with files from Excel or non-English sources.
Convert types right at the reading step so downstream code doesn't have to.
Reach for pandas once you want to analyze the data, not just move it.

Up Next

You can now read JSON and CSV. The last real-world skill we'll cover is fetching data over the network - that's the next doc, on HTTP requests with the requests library.

Frequently Asked Questions

How do I read a CSV file in Python?

Use the built-in csv module. csv.reader gives you each row as a list of strings; csv.DictReader uses the first row as headers and yields each row as a dict. Open the file with newline='' so Python doesn't mangle line endings: with open('data.csv', newline='') as f:.

How do I write a CSV file in Python?

Pair csv.writer with a file opened in write mode and newline=''. Call writer.writerow([...]) for each row or writer.writerows([[...], [...]]) for a batch. For dict-based data, use csv.DictWriter - it handles headers automatically.

Should I use the csv module or pandas?

Use csv for quick reads and writes, files you process row-at-a-time, and when you don't want another dependency. Use pandas when you need to filter, group, or join - or when the file is big enough that vectorised operations matter. They handle the same files; the choice is about what you do with the data after loading it.

Why does my CSV have blank lines between rows on Windows?

You opened the file without newline=''. The csv module writes its own line terminators; without that argument, Python adds extra ones on Windows. Always open CSV files with open(path, newline='') for both reading and writing.

Related concepts

Python CSV: Read and Write CSV Files With the csv Module

CSV Is Simpler Than It Looks (And Trickier Than You'd Expect)

Reading a CSV With `csv.reader`

Skipping the header row

Reading as Dicts With `DictReader`

Writing a CSV With `csv.writer`

Writing Dicts With `DictWriter`

Different Delimiters and Quoting

Encoding

Turning CSV Rows Into Useful Types

When to Reach for pandas

Streaming Very Large Files

A Few Habits

Up Next

Frequently Asked Questions

Learn to code with Coddy

CSV Is Simpler Than It Looks (And Trickier Than You'd Expect)

Reading a CSV With csv.reader

Skipping the header row

Reading as Dicts With DictReader

Writing a CSV With csv.writer

Writing Dicts With DictWriter

Different Delimiters and Quoting

Encoding

Turning CSV Rows Into Useful Types

When to Reach for pandas

Streaming Very Large Files

A Few Habits

Up Next

Frequently Asked Questions

Learn to code with Coddy

Reading a CSV With `csv.reader`

Reading as Dicts With `DictReader`

Writing a CSV With `csv.writer`

Writing Dicts With `DictWriter`