Runnable SQLite Docs: Subqueries

Q: What is a subquery in SQLite?

A subquery is a SELECT statement nested inside another statement, wrapped in parentheses. SQLite runs the inner query and feeds its result to the outer one. Subqueries can appear in WHERE, FROM, SELECT, and several other clauses.

Q: What's the difference between IN and EXISTS in SQLite?

IN (SELECT ...) checks whether a value matches any row the subquery returns. EXISTS (SELECT ...) just checks whether the subquery produces any row at all - it doesn't care about the values. EXISTS is usually the better choice when the inner query references the outer row (a correlated subquery).

Q: Should I use a subquery or a JOIN in SQLite?

Use a JOIN when you need columns from both tables in the result. Use a subquery when you only need to filter or compute a single value. SQLite's optimiser often rewrites one form into the other anyway, so pick whichever reads more clearly.

A Subquery Is a SELECT Inside a SELECT

A subquery is exactly what it sounds like: a SELECT statement tucked inside another statement, wrapped in parentheses. SQLite runs the inner query, takes its result, and feeds it to the outer one.

Set up a small example we can reuse:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT * FROM orders;

Five orders, four customers, two of whom haven't ordered anything. We'll use this throughout.

Subquery in WHERE: Filtering by a List

The most common shape: pull a list of ids in an inner query, then filter the outer query against it.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name
FROM customers
WHERE id IN (SELECT customer_id FROM orders);

The inner query produces every customer_id that appears in orders. The outer query keeps only the customers whose id is in that list. Cleo and Boris and Ada show up; Dmitri (no orders) doesn't.

IN (SELECT ...) is the workhorse pattern for "rows in A that have a match in B." Mentally, read it as "where this column's value is one of the values returned by the inner query."

NOT IN: Mind the NULLs

The opposite question - "which customers haven't ordered?" - is one line away:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);

That works here. But NOT IN has a sharp edge: if the subquery ever returns a NULL, the whole NOT IN becomes NULL (which is not TRUE), and you get zero rows back. Surprising and silent.

The safe habit when using NOT IN against a column that might contain NULL:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name
FROM customers
WHERE id NOT IN (
    SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);

Or use NOT EXISTS, which doesn't have this problem at all. We'll get to that.

Scalar Subqueries: One Row, One Column

A scalar subquery returns a single value - one row, one column - and you can use it anywhere a value is expected.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name, total
FROM orders
JOIN customers ON customers.id = orders.customer_id
WHERE total = (SELECT MAX(total) FROM orders);

The inner SELECT MAX(total) FROM orders returns 200. The outer query then filters for orders matching that value. Useful any time you need to compare against an aggregate.

You can also use a scalar subquery in the SELECT list to attach a computed value to every row:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT
    name,
    (SELECT COUNT(*) FROM orders WHERE customer_id = customers.id) AS order_count
FROM customers;

Each row of customers runs the inner query once, with customers.id plugged in. That's a correlated subquery - more on that below. For "one number per row" cases like this, a LEFT JOIN with GROUP BY usually performs better, but the scalar form reads beautifully.

EXISTS: Just Check Whether Anything Matches

EXISTS is the quieter cousin of IN. It doesn't care about values - it only checks whether the subquery returns any row. You typically write SELECT 1 inside, because the column doesn't matter.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.id
      AND o.total > 100
);

This finds customers who have placed at least one order over 100. The inner query references c.id from the outer query - that's what makes it correlated. SQLite stops scanning the inner table the moment it finds a match, which is why EXISTS often outperforms IN for "does this row have a related row?" questions.

The negation, NOT EXISTS, is the NULL-safe way to ask "no related row":

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Subquery in FROM: A Derived Table

A subquery can stand in wherever a table can - including the FROM clause. The inner query becomes a temporary, named "derived table" you can join, filter, or aggregate over.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT country, AVG(customer_total) AS avg_per_customer
FROM (
    SELECT c.country, c.id, SUM(o.total) AS customer_total
    FROM customers c
    JOIN orders o ON o.customer_id = c.id
    GROUP BY c.id
) AS per_customer
GROUP BY country;

The inner query computes a total per customer. The outer query averages those totals per country. Two-stage aggregations like this are exactly what derived tables are for - when you can't do everything in one GROUP BY.

The AS per_customer alias is required: every derived table needs a name.

Correlated Subqueries: Run Per Outer Row

A subquery is correlated when it references a column from the outer query. SQLite has to re-evaluate the inner query for every outer row, which is flexible but can get expensive.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT name,
       (SELECT MAX(total)
        FROM orders
        WHERE customer_id = customers.id) AS biggest_order
FROM customers;

For each customer, find their largest order. The inner query depends on customers.id, so it runs once per customer. Customers with no orders get NULL - which is what you'd want.

Correlated subqueries are the natural fit for "for each row in A, compute something from B." If the table's small or the lookup is indexed, they're fine. On big tables without supporting indexes, profile before shipping - a JOIN with GROUP BY is often faster.

Subquery vs JOIN: Which One?

These two queries answer the same question:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

-- Subquery
SELECT name
FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 100);

-- JOIN
SELECT DISTINCT c.name
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 100;

Both return the same rows. SQLite's optimiser frequently rewrites one form into the other internally. Pick based on readability:

Use a subquery when you only need to filter, and you don't want columns from the inner table polluting the result.
Use a JOIN when the result needs columns from both tables.
Use EXISTS when you're asking "does at least one related row exist?" - it's clearer and avoids the NULL traps of IN/NOT IN.

When in doubt, write the version that explains itself when read aloud.

A Common Pitfall: Subqueries Returning Multiple Rows

A subquery used with = must return at most one row. If it returns more, SQLite picks one (effectively at random) and you get silently wrong results - no error.

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

-- Risky: what if more than one customer is from the UK?
SELECT * FROM orders
WHERE customer_id = (SELECT id FROM customers WHERE country = 'UK');

Use IN when the inner query could return multiple rows:

CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total REAL
);

INSERT INTO customers (id, name, country) VALUES
    (1, 'Ada',   'UK'),
    (2, 'Boris', 'FR'),
    (3, 'Cleo',  'UK'),
    (4, 'Dmitri','DE');

INSERT INTO orders (customer_id, total) VALUES
    (1, 120), (1, 80), (2, 50), (3, 200), (3, 30);

SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE country = 'UK');

If you expect exactly one row and want to enforce it, add LIMIT 1 and an ORDER BY so the choice is at least deterministic. Better: write the query so a single row is guaranteed by the data (filter on a unique column).

Next: Common Table Expressions

Subqueries in FROM get unwieldy fast - especially when you need the same derived table twice, or when the nesting goes three levels deep. Common Table Expressions (WITH ... AS (...)) let you name a subquery up front and reference it by name in the rest of the statement. That's the next page.

Frequently Asked Questions

What is a subquery in SQLite?

A subquery is a SELECT statement nested inside another statement, wrapped in parentheses. SQLite runs the inner query and feeds its result to the outer one. Subqueries can appear in WHERE, FROM, SELECT, and several other clauses.

What's the difference between IN and EXISTS in SQLite?

IN (SELECT ...) checks whether a value matches any row the subquery returns. EXISTS (SELECT ...) just checks whether the subquery produces any row at all - it doesn't care about the values. EXISTS is usually the better choice when the inner query references the outer row (a correlated subquery).

Should I use a subquery or a JOIN in SQLite?

Use a JOIN when you need columns from both tables in the result. Use a subquery when you only need to filter or compute a single value. SQLite's optimiser often rewrites one form into the other anyway, so pick whichever reads more clearly.

What is a correlated subquery in SQLite?

A correlated subquery references a column from the outer query, so it has to be re-evaluated for every outer row. They're flexible but can be slow on large tables. If a correlated subquery shows up as a hot spot, rewriting it as a JOIN or a CTE often helps.

Related concepts

SQLite Subqueries: Nested SELECTs in WHERE, FROM, and SELECT