A Subquery Is a SELECT Inside a SELECT
A subquery is exactly what it sounds like: a SELECT statement tucked inside another statement, wrapped in parentheses. SQLite runs the inner query, takes its result, and feeds it to the outer one.
Set up a small example we can reuse:
Five orders, four customers, two of whom haven't ordered anything. We'll use this throughout.
Subquery in WHERE: Filtering by a List
The most common shape: pull a list of ids in an inner query, then filter the outer query against it.
The inner query produces every customer_id that appears in orders. The outer query keeps only the customers whose id is in that list. Cleo and Boris and Ada show up; Dmitri (no orders) doesn't.
IN (SELECT ...) is the workhorse pattern for "rows in A that have a match in B." Mentally, read it as "where this column's value is one of the values returned by the inner query."
NOT IN: Mind the NULLs
The opposite question — "which customers haven't ordered?" — is one line away:
That works here. But NOT IN has a sharp edge: if the subquery ever returns a NULL, the whole NOT IN becomes NULL (which is not TRUE), and you get zero rows back. Surprising and silent.
The safe habit when using NOT IN against a column that might contain NULL:
Or use NOT EXISTS, which doesn't have this problem at all. We'll get to that.
Scalar Subqueries: One Row, One Column
A scalar subquery returns a single value — one row, one column — and you can use it anywhere a value is expected.
The inner SELECT MAX(total) FROM orders returns 200. The outer query then filters for orders matching that value. Useful any time you need to compare against an aggregate.
You can also use a scalar subquery in the SELECT list to attach a computed value to every row:
Each row of customers runs the inner query once, with customers.id plugged in. That's a correlated subquery — more on that below. For "one number per row" cases like this, a LEFT JOIN with GROUP BY usually performs better, but the scalar form reads beautifully.
EXISTS: Just Check Whether Anything Matches
EXISTS is the quieter cousin of IN. It doesn't care about values — it only checks whether the subquery returns any row. You typically write SELECT 1 inside, because the column doesn't matter.
This finds customers who have placed at least one order over 100. The inner query references c.id from the outer query — that's what makes it correlated. SQLite stops scanning the inner table the moment it finds a match, which is why EXISTS often outperforms IN for "does this row have a related row?" questions.
The negation, NOT EXISTS, is the NULL-safe way to ask "no related row":
Subquery in FROM: A Derived Table
A subquery can stand in wherever a table can — including the FROM clause. The inner query becomes a temporary, named "derived table" you can join, filter, or aggregate over.
The inner query computes a total per customer. The outer query averages those totals per country. Two-stage aggregations like this are exactly what derived tables are for — when you can't do everything in one GROUP BY.
The AS per_customer alias is required: every derived table needs a name.
Correlated Subqueries: Run Per Outer Row
A subquery is correlated when it references a column from the outer query. SQLite has to re-evaluate the inner query for every outer row, which is flexible but can get expensive.
For each customer, find their largest order. The inner query depends on customers.id, so it runs once per customer. Customers with no orders get NULL — which is what you'd want.
Correlated subqueries are the natural fit for "for each row in A, compute something from B." If the table's small or the lookup is indexed, they're fine. On big tables without supporting indexes, profile before shipping — a JOIN with GROUP BY is often faster.
Subquery vs JOIN: Which One?
These two queries answer the same question:
Both return the same rows. SQLite's optimiser frequently rewrites one form into the other internally. Pick based on readability:
- Use a subquery when you only need to filter, and you don't want columns from the inner table polluting the result.
- Use a JOIN when the result needs columns from both tables.
- Use EXISTS when you're asking "does at least one related row exist?" — it's clearer and avoids the
NULLtraps ofIN/NOT IN.
When in doubt, write the version that explains itself when read aloud.
A Common Pitfall: Subqueries Returning Multiple Rows
A subquery used with = must return at most one row. If it returns more, SQLite picks one (effectively at random) and you get silently wrong results — no error.
Use IN when the inner query could return multiple rows:
If you expect exactly one row and want to enforce it, add LIMIT 1 and an ORDER BY so the choice is at least deterministic. Better: write the query so a single row is guaranteed by the data (filter on a unique column).
Next: Common Table Expressions
Subqueries in FROM get unwieldy fast — especially when you need the same derived table twice, or when the nesting goes three levels deep. Common Table Expressions (WITH ... AS (...)) let you name a subquery up front and reference it by name in the rest of the statement. That's the next page.
Frequently Asked Questions
What is a subquery in SQLite?
A subquery is a SELECT statement nested inside another statement, wrapped in parentheses. SQLite runs the inner query and feeds its result to the outer one. Subqueries can appear in WHERE, FROM, SELECT, and several other clauses.
What's the difference between IN and EXISTS in SQLite?
IN (SELECT ...) checks whether a value matches any row the subquery returns. EXISTS (SELECT ...) just checks whether the subquery produces any row at all — it doesn't care about the values. EXISTS is usually the better choice when the inner query references the outer row (a correlated subquery).
Should I use a subquery or a JOIN in SQLite?
Use a JOIN when you need columns from both tables in the result. Use a subquery when you only need to filter or compute a single value. SQLite's optimiser often rewrites one form into the other anyway, so pick whichever reads more clearly.
What is a correlated subquery in SQLite?
A correlated subquery references a column from the outer query, so it has to be re-evaluated for every outer row. They're flexible but can be slow on large tables. If a correlated subquery shows up as a hot spot, rewriting it as a JOIN or a CTE often helps.