Menu

Python Regex Tutorial: re Module, Patterns, Groups, and Replace

A hands-on introduction to Python's re module — searching, matching, capturing groups, replacement, and the patterns you'll reach for most.

When to Reach for Regex

Regular expressions are a small language for describing text patterns. They're powerful and they're easy to over-use.

Before reaching for re, ask whether plain string methods would do. .split(), .replace(), .startswith(), "target" in text — these are faster, more readable, and less error-prone than the equivalent regex. Save regex for when the pattern you care about is genuinely structured but too flexible for fixed string operations: email addresses, phone numbers, log lines with known shapes, HTML or Markdown snippets, any "find this kind of thing" search.

The Basic Vocabulary

A regex pattern is a string that describes what you're looking for. A few building blocks:

  • . — any single character
  • \d — any digit (0–9)
  • \w — any "word" character (letter, digit, underscore)
  • \s — any whitespace character
  • ^ — start of the string
  • $ — end of the string
  • [abc] — any one of a, b, or c
  • [^abc] — any character except a, b, c
  • a|b — a or b
  • * — zero or more of the previous thing
  • + — one or more
  • ? — zero or one
  • {3} — exactly 3
  • {2,5} — between 2 and 5
  • ( ... ) — group (captures the match inside)

That's enough for most real-world regexes.

The Key Functions

re.search(pattern, text) finds the first match anywhere in the string. Returns a match object or None:

main.py
Output
Click Run to see the output here.

Always use a raw string (r"...") for regex patterns. Otherwise Python tries to interpret the backslashes as string escapes and you get a different pattern than you typed.

re.match(pattern, text) is like search but only matches at the start:

main.py
Output
Click Run to see the output here.

Most of the time you want search.

re.findall(pattern, text) returns all non-overlapping matches as a list:

main.py
Output
Click Run to see the output here.

Note that findall returns strings, not match objects. If you have a pattern with one group, you get the group contents back. With multiple groups, you get a list of tuples.

Capturing Groups

Parentheses in a pattern capture whatever matched inside them:

main.py
Output
Click Run to see the output here.

Named groups are usually clearer:

main.py
Output
Click Run to see the output here.

Once the regex has more than one group, naming them makes the extraction code much easier to follow.

Replacing With re.sub

re.sub(pattern, replacement, text) replaces every match:

main.py
Output
Click Run to see the output here.

That strips everything that isn't a digit. The replacement can reference captured groups with \1, \2, etc. — or in raw strings, \g<1> is clearer:

main.py
Output
Click Run to see the output here.

You can also pass a function as the replacement. That's useful for transformations that aren't a simple swap:

main.py
Output
Click Run to see the output here.

Compiling Patterns for Reuse

If you're using the same pattern many times — especially in a loop — compile it once:

main.py
Output
Click Run to see the output here.

Compiled patterns expose the same methods — search, match, findall, sub — without the pattern as the first argument. Slightly more efficient, often more readable.

Flags

Common modifiers, passed as a flags= argument or OR'd together:

  • re.IGNORECASE (re.I) — case-insensitive matching.
  • re.MULTILINE (re.M) — ^ and $ match at every line, not just the string boundaries.
  • re.DOTALL (re.S) — . matches newlines too (by default it doesn't).
  • re.VERBOSE (re.X) — allows whitespace and # comments in the pattern, for readability.
main.py
Output
Click Run to see the output here.

re.VERBOSE is especially useful for complex patterns:

main.py
Output
Click Run to see the output here.

Multi-line, commented regexes are vastly easier to maintain than opaque one-liners.

Greedy vs Lazy

Quantifiers (*, +, ?, {n,}) are greedy by default — they match as much as they can. Add a ? to make them lazy:

main.py
Output
Click Run to see the output here.

The greedy version captures the whole substring from <b> to </i> — probably not what you wanted. The lazy .+? stops at the first >.

(Side note: don't actually parse HTML with regex in production. Use html.parser or BeautifulSoup. The example above is just to illustrate greediness.)

A Realistic Example

Parse lines from a simple log format:

main.py
Output
Click Run to see the output here.

That's a lot of power in not many lines.

A Few Habits

  • Always use raw strings for patterns.
  • Start with the simplest pattern that works; tighten later.
  • Use named groups once you have more than one group.
  • Compile patterns you'll reuse.
  • Regex is slower than plain string methods. If .split() will do, use .split().

Next: Errors and Debugging

That closes out the real-data tour — files, JSON, CSV, HTTP, dates, and regex. Anything in a real program sooner or later hits an error, though, and reading Python tracebacks well is the single biggest debugging skill you can pick up. The last chapter covers exceptions and the specific errors you'll meet most often.

Frequently Asked Questions

What is regex in Python?

A regular expression (regex) is a small language for describing patterns in text. Python's re module lets you search for patterns, extract pieces, and replace matches. It's overkill for simple string operations — use string methods like .split() or .replace() first — but unbeatable for structured pattern matching.

What's the difference between re.match and re.search?

re.match only matches at the start of the string. re.search scans the whole string and finds the first match anywhere. When in doubt, use search — it matches human intuition better.

Should I always use raw strings for regex patterns?

Yes. Regex patterns often contain backslashes (\d, \s, \b), which Python strings interpret as escape sequences. Prefixing with rr'\d+' — tells Python to take the string literally, which keeps the regex itself readable.

Learn to code with Coddy

GET STARTED