SQL

SQL Q&A (Beginner to Advanced)

Fundamentals

Q1. What is SQL?

SQL (Structured Query Language) is the standard language for managing and querying data in relational database management systems (RDBMS). It is declarative — you describe what you want, not how to fetch it.

Q2. What are the main categories of SQL statements?

  • DDL (Data Definition Language): CREATE, ALTER, DROP, TRUNCATE
  • DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
  • DCL (Data Control Language): GRANT, REVOKE
  • TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT

Q3. What is a relational database?

A database that organizes data into tables (relations) made of rows and columns, where relationships between tables are expressed via keys.

Q4. What is a table, row, and column?

A table is a collection of related data. A row (tuple/record) is a single entry. A column (attribute/field) defines a data type and meaning for a value in each row.

Q5. Write a basic SELECT statement.

SELECT first_name, last_name FROM employees;

Q6. How do you select all columns from a table?

SELECT * FROM employees;

Avoid SELECT * in production code — it hurts performance and breaks when the schema changes.

Q7. What is the difference between WHERE and HAVING?

WHERE filters rows before grouping/aggregation. HAVING filters after GROUP BY, allowing conditions on aggregate functions.

Q8. What is the logical order of execution of a SELECT query?

FROMJOINWHEREGROUP BYHAVINGSELECTDISTINCTORDER BYLIMIT/OFFSET. (Written order differs from execution order.)

Q9. What is NULL in SQL?

NULL represents a missing or unknown value. It is not equal to 0 or an empty string, and any comparison with NULL using = yields UNKNOWN.

Q10. How do you test for NULL?

SELECT * FROM employees WHERE manager_id IS NULL;
SELECT * FROM employees WHERE manager_id IS NOT NULL;

Q11. What is the difference between DELETE, TRUNCATE, and DROP?

  • DELETE removes rows (DML, can have WHERE, logged, can rollback).
  • TRUNCATE removes all rows fast (DDL, resets identity, usually can't filter, minimal logging).
  • DROP removes the entire table/object structure.

Q12. What is the difference between CHAR and VARCHAR?

CHAR(n) is fixed length, padded with spaces. VARCHAR(n) is variable length, storing only the actual characters plus length info.

Q13. How do you insert a row?

INSERT INTO employees (first_name, last_name, salary)
VALUES ('Ada', 'Lovelace', 90000);

Q14. How do you insert multiple rows at once?

INSERT INTO employees (first_name, salary) VALUES
  ('Ada', 90000),
  ('Alan', 85000),
  ('Grace', 95000);

Q15. How do you update existing data?

UPDATE employees SET salary = salary * 1.10 WHERE department_id = 3;

Always include a WHERE clause unless you intend to update every row.

Q16. How do you delete rows?

DELETE FROM employees WHERE termination_date IS NOT NULL;

Q17. What is a primary key?

A column or set of columns that uniquely identifies each row. It cannot contain NULL and must be unique. A table has at most one primary key.

Q18. What is a foreign key?

A column (or columns) that references the primary key of another table, enforcing referential integrity between the two tables.

Q19. What is a composite key?

A primary or unique key made up of more than one column.

Q20. What is a candidate key vs. a surrogate key?

A candidate key is any column set that could uniquely identify a row (natural key). A surrogate key is an artificial identifier (e.g., auto-increment id) with no business meaning.

Q21. What is the difference between DISTINCT and GROUP BY?

DISTINCT removes duplicate rows from the result set. GROUP BY groups rows to apply aggregate functions. GROUP BY without aggregates behaves similarly to DISTINCT but is more flexible.

Q22. How do you rename a column in the output?

SELECT salary AS monthly_salary, salary * 12 AS annual_salary FROM employees;

Q23. How do you concatenate strings?

  • ANSI/PostgreSQL/Oracle: first_name || ' ' || last_name
  • SQL Server: first_name + ' ' + last_name or CONCAT(...)
  • MySQL: CONCAT(first_name, ' ', last_name)

Q24. What is the difference between = and LIKE?

= matches exact values. LIKE matches patterns using wildcards % (any sequence) and _ (single character).

Q25. How do you filter with a pattern?

SELECT * FROM employees WHERE last_name LIKE 'S%';   -- starts with S
SELECT * FROM employees WHERE email LIKE '%@gmail.com';

Q26. What is the IN operator?

A shorthand for multiple OR conditions:

SELECT * FROM employees WHERE department_id IN (1, 2, 5);

Q27. What is the BETWEEN operator?

Inclusive range filter:

SELECT * FROM employees WHERE salary BETWEEN 50000 AND 80000;

Q28. How do you sort results?

SELECT * FROM employees ORDER BY salary DESC, last_name ASC;

Q29. How do you limit the number of rows returned?

  • PostgreSQL/MySQL: LIMIT 10
  • SQL Server: SELECT TOP 10 ... or OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
  • Oracle 12c+: FETCH FIRST 10 ROWS ONLY

Q30. What is a comment in SQL?

-- single line comment

/* multi
   line comment */

Filtering, Sorting & Functions

Q31. What are the common comparison operators?

=, <> (or !=), <, >, <=, >=.

Q32. What is the difference between AND and OR precedence?

AND binds tighter than OR. Use parentheses to make intent explicit:

WHERE department_id = 1 AND (salary > 50000 OR bonus > 5000)

Q33. How do you handle NULLs in calculations?

Use COALESCE to substitute a default:

SELECT COALESCE(bonus, 0) + salary AS total_comp FROM employees;

Q34. What is the difference between COALESCE and ISNULL/NVL?

COALESCE is ANSI standard and accepts many arguments, returning the first non-null. ISNULL (SQL Server) and NVL (Oracle) take exactly two arguments.

Q35. What does NULLIF do?

Returns NULL if the two arguments are equal; otherwise returns the first. Useful to avoid divide-by-zero:

SELECT total / NULLIF(count, 0) FROM stats;

Q36. What is a CASE expression?

Conditional logic in queries:

SELECT name,
  CASE WHEN salary > 100000 THEN 'High'
       WHEN salary > 50000  THEN 'Mid'
       ELSE 'Low' END AS band
FROM employees;

Q37. What's the difference between simple and searched CASE?

Simple CASE expr WHEN value THEN ... compares one expression to values. Searched CASE WHEN condition THEN ... evaluates arbitrary boolean conditions.

Q38. Common string functions?

UPPER, LOWER, LENGTH=/=LEN, SUBSTRING, TRIM, LTRIM, RTRIM, REPLACE, LEFT, RIGHT, POSITION=/=CHARINDEX.

Q39. How do you extract part of a string?

SELECT SUBSTRING(phone FROM 1 FOR 3) FROM contacts;  -- ANSI
SELECT SUBSTRING(phone, 1, 3) FROM contacts;          -- common dialect form

Q40. Common numeric functions?

ABS, ROUND, CEIL=/=CEILING, FLOOR, MOD, POWER, SQRT, TRUNC.

Q41. How do you round to two decimals?

SELECT ROUND(price, 2) FROM products;

Q42. How do you get the current date/time?

  • PostgreSQL: CURRENT_DATE, CURRENT_TIMESTAMP, NOW()
  • MySQL: CURDATE(), NOW()
  • SQL Server: GETDATE(), SYSDATETIME()
  • Oracle: SYSDATE, CURRENT_TIMESTAMP

Q43. How do you compute the difference between two dates?

  • PostgreSQL: end_date - start_date (returns days/interval)
  • MySQL: DATEDIFF(end_date, start_date)
  • SQL Server: DATEDIFF(DAY, start_date, end_date)

Q44. How do you extract the year from a date?

SELECT EXTRACT(YEAR FROM hire_date) FROM employees;  -- ANSI/PostgreSQL
SELECT YEAR(hire_date) FROM employees;                -- MySQL/SQL Server

Q45. How do you add days/months to a date?

  • PostgreSQL: hire_date + INTERVAL '30 days'
  • MySQL: DATE_ADD(hire_date, INTERVAL 30 DAY)
  • SQL Server: DATEADD(DAY, 30, hire_date)
  • Oracle: hire_date + 30 or ADD_MONTHS(hire_date, 1)

Q46. How do you cast/convert data types?

SELECT CAST(salary AS VARCHAR(20)) FROM employees;       -- ANSI
SELECT CAST('2024-01-01' AS DATE);
SELECT salary::TEXT FROM employees;                       -- PostgreSQL shorthand

Q47. What is the difference between ORDER BY column and ORDER BY position?

You can order by output column position number (ORDER BY 2), but it's fragile — prefer column names or aliases for readability.

Q48. How do NULLs sort?

By default, PostgreSQL/Oracle sort NULLs last in ASC (configurable with NULLS FIRST/LAST); MySQL/SQL Server sort NULLs first in ASC. Use explicit NULLS FIRST=/=NULLS LAST where supported.

Q49. How do you implement pagination?

SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;  -- page 3, 20 per page

For large offsets, prefer keyset pagination (WHERE id > :last_id).

Q50. What is the EXISTS operator?

Returns true if a subquery returns any rows. Often faster than IN for correlated checks:

SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

Q51. How do you filter case-insensitively?

SELECT * FROM employees WHERE LOWER(last_name) = 'smith';
SELECT * FROM employees WHERE last_name ILIKE 'smith';  -- PostgreSQL

Q52. What does the ANY/SOME and ALL keyword do?

Compare a value against a set:

SELECT * FROM products WHERE price > ALL (SELECT price FROM clearance);
SELECT * FROM products WHERE price > ANY (SELECT price FROM clearance);

Q53. How do you escape a wildcard in LIKE?

SELECT * FROM files WHERE name LIKE '100\%' ESCAPE '\';

Q54. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

COUNT(*) counts all rows. COUNT(column) counts non-null values. COUNT(DISTINCT column) counts unique non-null values.

Q55. How do you concatenate multiple rows into one string?

  • PostgreSQL: STRING_AGG(name, ', ')
  • MySQL: GROUP_CONCAT(name SEPARATOR ', ')
  • SQL Server: STRING_AGG(name, ', ')
  • Oracle: LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name)

Joins

Q56. What is a JOIN?

A JOIN combines rows from two or more tables based on a related column.

Q57. What is an INNER JOIN?

Returns only rows where the join condition matches in both tables:

SELECT e.name, d.name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

Q58. What is a LEFT (OUTER) JOIN?

Returns all rows from the left table and matching rows from the right; unmatched right-side columns are NULL.

Q59. What is a RIGHT JOIN?

The mirror of LEFT JOIN — all rows from the right table, matched or not.

Q60. What is a FULL OUTER JOIN?

Returns all rows from both tables, with NULL where there's no match on either side. (MySQL lacks it natively; emulate with LEFT JOIN UNION RIGHT JOIN.)

Q61. What is a CROSS JOIN?

Produces the Cartesian product — every row of A combined with every row of B.

Q62. What is a self join?

A table joined to itself, typically to compare rows or model hierarchies:

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Q63. What is the difference between an equi-join and a non-equi join?

An equi-join uses = in the condition. A non-equi join uses other operators (<, >, BETWEEN), e.g., joining to a salary-band range table.

Q64. How do you join more than two tables?

Chain JOIN clauses:

SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id;

Q65. What's the difference between ON and WHERE in an outer join?

A condition in ON of a LEFT JOIN filters which right rows match (preserving all left rows). The same condition in WHERE filters the final result and can turn the outer join into an inner join.

Q66. What is USING in a join?

A shorthand when the join columns share the same name:

SELECT * FROM employees JOIN departments USING (department_id);

Q67. What is a natural join?

Automatically joins on all columns with the same name in both tables. Avoid it — it's implicit and fragile to schema changes.

Q68. How do you find rows in table A with no match in table B?

Anti-join via LEFT JOIN + IS NULL:

SELECT c.* FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;

Q69. Why might a JOIN return more rows than expected?

A one-to-many or many-to-many relationship multiplies rows. Duplicate keys on the joined side fan out the result. Use DISTINCT or aggregate appropriately.

Q70. What is the difference between explicit and implicit join syntax?

Explicit uses JOIN ... ON. Implicit lists tables in FROM with conditions in WHERE (FROM a, b WHERE a.id = b.a_id). Prefer explicit — it's clearer and avoids accidental cross joins.

Q71. How do you join on multiple columns?

JOIN ON a.region = b.region AND a.year = b.year

Q72. Can you join a table to a subquery?

Yes — a derived table:

SELECT e.name, t.total
FROM employees e
JOIN (SELECT employee_id, SUM(amount) total FROM sales GROUP BY employee_id) t
  ON t.employee_id = e.id;

Q73. What is a lateral join?

A LATERAL (PostgreSQL) / CROSS APPLY / OUTER APPLY (SQL Server) lets a subquery reference columns from preceding tables in the FROM clause — useful for top-N-per-group.

Q74. How do you get the top N rows per group using a join?

Use LATERAL=/=APPLY or a window function (see Q132). Example with LATERAL:

SELECT d.name, top.name, top.salary
FROM departments d
CROSS JOIN LATERAL (
  SELECT name, salary FROM employees e
  WHERE e.department_id = d.id ORDER BY salary DESC LIMIT 3
) top;

Q75. What performance considerations apply to joins?

Ensure join columns are indexed, join on the smallest necessary result set, prefer inner joins when semantics allow, and watch for unintended row multiplication. The optimizer chooses join algorithms (nested loop, hash, merge) based on statistics.

Aggregation & Grouping

Q76. What are aggregate functions?

Functions that compute a single value over a set of rows: COUNT, SUM, AVG, MIN, MAX.

Q77. How does GROUP BY work?

It collapses rows sharing the same grouping-column values into one row per group, over which aggregates are computed:

SELECT department_id, AVG(salary)
FROM employees GROUP BY department_id;

Q78. Can you select a non-aggregated column not in GROUP BY?

No (in standard SQL and most engines) — every selected column must be aggregated or appear in GROUP BY. MySQL historically allowed it with surprising results (ONLY_FULL_GROUP_BY now blocks it).

Q79. How do you filter groups?

With HAVING:

SELECT department_id, COUNT(*) AS cnt
FROM employees GROUP BY department_id
HAVING COUNT(*) > 5;

Q80. How do AVG/SUM treat NULLs?

They ignore NULLs. AVG divides by the count of non-null values, which may differ from total row count.

Q81. What is GROUP BY with multiple columns?

Creates a group for each unique combination of those column values:

SELECT department_id, job_title, COUNT(*)
FROM employees GROUP BY department_id, job_title;

Q82. What are GROUPING SETS, ROLLUP, and CUBE?

Extensions to compute multiple grouping levels in one query. ROLLUP produces subtotals + grand total along a hierarchy; CUBE produces all combinations; GROUPING SETS lets you list exactly which groupings you want.

SELECT region, product, SUM(sales)
FROM orders GROUP BY ROLLUP (region, product);

Q83. What does the GROUPING() function do?

Returns 1 if the column is aggregated (a subtotal/grand-total row from ROLLUP/CUBE) and 0 otherwise — useful for labeling subtotal rows.

Q84. How do you count rows per group and the percentage of total?

SELECT department_id,
       COUNT(*) AS cnt,
       COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS pct
FROM employees GROUP BY department_id;

Q85. What is the difference between WHERE and HAVING performance-wise?

WHERE reduces rows before aggregation (cheaper). Push conditions into WHERE whenever they don't depend on aggregates.

Q86. How do you find duplicates?

SELECT email, COUNT(*) FROM users
GROUP BY email HAVING COUNT(*) > 1;

Q87. How do you compute conditional aggregates?

Use CASE inside the aggregate:

SELECT
  SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) AS paid_total,
  COUNT(CASE WHEN status = 'open' THEN 1 END) AS open_count
FROM invoices;

PostgreSQL supports COUNT(*) FILTER (WHERE status = 'open').

Q88. How do you compute a median?

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) FROM employees;

Q89. What is the difference between MIN/MAX and ORDER BY … LIMIT 1?

MIN=/=MAX return a single aggregate value; ORDER BY ... LIMIT 1 returns the full row. To get the whole row of the max value, use ORDER BY ... LIMIT 1 or a window function.

Q90. Can you nest aggregate functions?

Not directly (e.g., MAX(AVG(x)) requires a subquery or window). You compute the inner aggregate in a grouped subquery, then aggregate the result.

Subqueries & Set Operations

Q91. What is a subquery?

A query nested inside another query, used in SELECT, FROM, WHERE, or HAVING.

Q92. What is the difference between a correlated and non-correlated subquery?

A non-correlated subquery runs once independently. A correlated subquery references the outer query and runs (logically) per outer row.

Q93. Give an example of a scalar subquery.

Returns a single value:

SELECT name, salary,
  (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;

Q94. How do you use a subquery in WHERE?

SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Q95. What is a derived table?

A subquery in the FROM clause treated as a temporary table; it must have an alias.

Q96. What is the difference between IN and EXISTS performance?

EXISTS can short-circuit on the first match and often performs better for correlated checks and large inner sets. IN may be better for small, static lists. Modern optimizers frequently treat them similarly. Beware NOT IN with NULLs (see Q97).

Q97. What's the danger of NOT IN with NULLs?

If the subquery returns any NULL, NOT IN yields no rows because the comparison becomes UNKNOWN. Prefer NOT EXISTS or filter out NULLs.

Q98. What is UNION vs UNION ALL?

UNION combines result sets and removes duplicates (sorts/hashes — more expensive). UNION ALL keeps all rows including duplicates and is faster. Use UNION ALL unless you need dedup.

Q99. What are the rules for set operations?

Each query must have the same number of columns with compatible data types in the same order. Column names come from the first query.

Q100. What is INTERSECT?

Returns rows present in both result sets. (MySQL added it in 8.0.31; older versions emulate via joins.)

Q101. What is EXCEPT / MINUS?

Returns rows in the first set not in the second. EXCEPT (ANSI/PostgreSQL/SQL Server), MINUS (Oracle).

Q102. How do you replace a value with a default using a subquery?

SELECT name, COALESCE((SELECT MAX(amount) FROM orders o WHERE o.cust_id = c.id), 0) FROM customers c;

Q103. Can a subquery return multiple columns?

Yes, in row-value comparisons or in the FROM clause. In a scalar context it must return one column and at most one row.

Q104. How do you find the second-highest salary?

SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Or with window functions / OFFSET 1 LIMIT 1.

Q105. What is a semi-join and anti-join?

A semi-join returns rows from the left table that have a match on the right (logically EXISTS=/=IN). An anti-join returns left rows with no match (NOT EXISTS=/=NOT IN).

DDL, Constraints & Schema Design

Q106. How do you create a table?

CREATE TABLE employees (
  id          INT PRIMARY KEY,
  first_name  VARCHAR(50) NOT NULL,
  salary      NUMERIC(10,2) DEFAULT 0,
  dept_id     INT REFERENCES departments(id)
);

Q107. What are the main constraint types?

PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, DEFAULT.

Q108. What does a CHECK constraint do?

Enforces a boolean condition on column values:

salary NUMERIC CHECK (salary >= 0)

Q109. What is a UNIQUE constraint, and how does it differ from a primary key?

UNIQUE ensures values are distinct but allows one (or, by ANSI rules, multiple) NULLs and you can have several per table. A primary key disallows NULLs and there's only one per table.

Q110. How do you add a column to an existing table?

ALTER TABLE employees ADD COLUMN email VARCHAR(255);

Q111. How do you drop a column?

ALTER TABLE employees DROP COLUMN email;

Q112. How do you add a foreign key after creation?

ALTER TABLE employees
ADD CONSTRAINT fk_dept FOREIGN KEY (dept_id) REFERENCES departments(id);

Q113. What are referential actions ON DELETE / ON UPDATE?

CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION — define what happens to child rows when a referenced parent row is deleted/updated.

Q114. What is an auto-increment / identity column?

A column that auto-generates sequential values: SERIAL=/=GENERATED ... AS IDENTITY (PostgreSQL), AUTO_INCREMENT (MySQL), IDENTITY (SQL Server), sequences (Oracle).

Q115. What is normalization?

The process of organizing data to reduce redundancy and improve integrity by decomposing tables and defining relationships.

Q116. Explain 1NF, 2NF, 3NF.

  • 1NF: atomic values, no repeating groups.
  • 2NF: 1NF + no partial dependency (non-key columns depend on the whole composite key).
  • 3NF: 2NF + no transitive dependency (non-key columns depend only on the key).

Q117. What is BCNF?

Boyce-Codd Normal Form — a stricter 3NF where every determinant is a candidate key.

Q118. What is denormalization and when is it used?

Intentionally introducing redundancy to improve read performance (e.g., reporting/analytics), at the cost of update complexity and potential anomalies.

Q119. What is a view?

A stored named query that behaves like a virtual table:

CREATE VIEW active_employees AS
SELECT * FROM employees WHERE status = 'active';

Q120. What is a materialized view?

A view whose results are physically stored and periodically refreshed — faster reads, but data can be stale. Supported in PostgreSQL/Oracle; SQL Server uses indexed views.

Q121. Can you update through a view?

Sometimes — simple views over a single table without aggregation/DISTINCT/joins are often updatable. Complex views are read-only unless you define INSTEAD OF triggers.

Q122. What is a temporary table?

A table that exists only for the session/transaction, useful for staging intermediate results: CREATE TEMP TABLE ... (PostgreSQL), #temp (SQL Server).

Q123. What is a sequence?

A database object that generates a sequence of unique numbers, often used for surrogate keys:

CREATE SEQUENCE order_seq START 1 INCREMENT 1;
SELECT NEXTVAL('order_seq');

Q124. What data types should you use for money?

Use exact numeric types (NUMERIC=/=DECIMAL), never floating point (FLOAT=/=REAL), to avoid rounding errors.

Q125. What is a partitioned table?

A logically single table split into physical partitions (by range, list, or hash) to improve manageability and query performance on very large datasets.

Window Functions & CTEs

Q126. What is a window function?

A function that computes a value across a set of rows ("window") related to the current row, without collapsing rows like GROUP BY does.

Q127. What is the basic syntax of a window function?

SELECT name, department_id,
  AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
FROM employees;

Q128. What is the difference between PARTITION BY and GROUP BY?

GROUP BY reduces rows to one per group. PARTITION BY keeps all rows but resets the window's calculation per partition.

Q129. What are ROWNUMBER, RANK, and DENSERANK?

  • ROW_NUMBER(): unique sequential number, no ties handling.
  • RANK(): ties get the same rank, leaving gaps.
  • DENSE_RANK(): ties get the same rank, no gaps.
SELECT name, salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
  RANK()       OVER (ORDER BY salary DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;

Q130. What are LAG and LEAD?

Access a prior or following row's value without a self-join:

SELECT month, revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_month,
  revenue - LAG(revenue) OVER (ORDER BY month) AS delta
FROM monthly_sales;

Q131. What are FIRSTVALUE, LASTVALUE, and NTHVALUE?

Return the first/last/nth value in the window frame. Watch the frame clause — LAST_VALUE often needs ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to behave intuitively.

Q132. How do you select the top N rows per group?

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
  FROM employees
) t WHERE rn <= 3;

Q133. What is a window frame (ROWS vs RANGE)?

The frame defines which rows around the current row are included. ROWS counts physical rows; RANGE includes rows with equal ordering values (logical range). E.g., ROWS BETWEEN 2 PRECEDING AND CURRENT ROW for a 3-row moving average.

Q134. How do you compute a running total?

SELECT order_date, amount,
  SUM(amount) OVER (ORDER BY order_date
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;

Q135. How do you compute a moving average?

SELECT day, value,
  AVG(value) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7
FROM metrics;

Q136. What is NTILE?

Divides rows into N roughly equal buckets — useful for quartiles/percentiles:

SELECT name, NTILE(4) OVER (ORDER BY salary) AS quartile FROM employees;

Q137. Can you use window functions in WHERE?

No — windows are computed after WHERE. Wrap the query in a subquery/CTE and filter on the window column in the outer query.

Q138. What is a CTE (Common Table Expression)?

A named temporary result set defined with WITH, improving readability and enabling recursion:

WITH high_earners AS (
  SELECT * FROM employees WHERE salary > 100000
)
SELECT department_id, COUNT(*) FROM high_earners GROUP BY department_id;

Q139. What is the difference between a CTE and a subquery?

Functionally similar; CTEs are named, can be referenced multiple times, and support recursion. They're often clearer for complex queries. (Materialization behavior varies by engine.)

Q140. What is a recursive CTE?

A CTE that references itself, used for hierarchies and graph traversal:

WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS lvl FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.lvl + 1
  FROM employees e JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org;

Q141. What can go wrong with recursive CTEs?

Infinite loops if the data has cycles. Many engines provide cycle detection (CYCLE clause in PostgreSQL) or you can track visited nodes; always ensure the recursion terminates.

Q142. Can you define multiple CTEs in one query?

Yes, comma-separated:

WITH a AS (...), b AS (...) SELECT * FROM a JOIN b ON ...;

Q143. Are CTEs always materialized?

Not necessarily. PostgreSQL pre-12 always materialized; 12+ may inline non-recursive CTEs. You can force with MATERIALIZED=/=NOT MATERIALIZED.

Q144. How do you deduplicate keeping the latest row per key?

WITH ranked AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY updated_at DESC) rn FROM users
  )
DELETE FROM users WHERE id IN (SELECT id FROM ranked WHERE rn > 1);

Q145. What is PERCENTRANK and CUMEDIST?

PERCENT_RANK() gives relative rank between 0 and 1. CUME_DIST() gives cumulative distribution — the fraction of rows ≤ the current row's value.

Transactions & Concurrency

Q146. What is a transaction?

A unit of work that is executed atomically — all operations succeed (COMMIT) or none take effect (ROLLBACK).

Q147. What are the ACID properties?

  • Atomicity: all-or-nothing.
  • Consistency: transitions leave the DB in a valid state.
  • Isolation: concurrent transactions don't interfere visibly.
  • Durability: committed changes survive crashes.

Q148. How do you write a transaction?

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;  -- or ROLLBACK on error

Q149. What is a savepoint?

A marker within a transaction you can roll back to without aborting the whole transaction:

SAVEPOINT sp1;
-- ...
ROLLBACK TO sp1;

Q150. What are the SQL isolation levels?

From weakest to strongest: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE.

Q151. What concurrency anomalies do isolation levels prevent?

  • Dirty read: reading uncommitted data — prevented at READ COMMITTED+.
  • Non-repeatable read: a row changes between reads — prevented at REPEATABLE READ+.
  • Phantom read: new rows appear in a range between reads — prevented at SERIALIZABLE.

Q152. What is the default isolation level?

PostgreSQL/Oracle/SQL Server default to READ COMMITTED; MySQL (InnoDB) defaults to REPEATABLE READ.

Q153. What is optimistic vs pessimistic locking?

Pessimistic locks rows up front (SELECT ... FOR UPDATE). Optimistic assumes no conflict and checks a version/timestamp column at write time, retrying on mismatch.

Q154. What is SELECT … FOR UPDATE?

Locks the selected rows so other transactions can't modify them until the current transaction ends — used to safely read-then-update.

Q155. What is a deadlock?

Two transactions each hold a lock the other needs, blocking forever. The DB detects it and aborts one (the "victim"). Mitigate by acquiring locks in a consistent order and keeping transactions short.

Q156. What is MVCC?

Multi-Version Concurrency Control — readers see a consistent snapshot without blocking writers (and vice versa) by keeping multiple row versions. Used by PostgreSQL, Oracle, MySQL InnoDB.

Q157. What is the difference between a shared and exclusive lock?

A shared (read) lock allows other shared locks but blocks writes. An exclusive (write) lock blocks all other locks.

Q158. What happens if you don't commit?

Changes remain in the transaction's scope and are invisible to others; an open transaction can hold locks and bloat undo/WAL. On disconnect, most engines roll back uncommitted work.

Q159. What is autocommit?

A mode where each statement is its own transaction, committed immediately. Most clients enable it by default; disable it to batch statements into one transaction.

Q160. How do you handle a serialization failure?

Under =SERIALIZABLE=/optimistic schemes, conflicting transactions get a serialization error; the application should catch it and retry the transaction.

Indexing & Performance

Q161. What is an index?

A data structure (usually a B-tree) that speeds up row lookup at the cost of extra storage and slower writes.

Q162. What is the difference between a clustered and non-clustered index?

A clustered index determines the physical row order (the table is the index; one per table). A non-clustered index is a separate structure pointing to rows. (SQL Server terminology; PostgreSQL heap tables differ.)

Q163. When does an index help vs. hurt?

Helps selective lookups, joins, ORDER BY, and range scans. Hurts INSERT/UPDATE/DELETE performance and consumes space. Over-indexing degrades writes.

Q164. What is a composite index, and does column order matter?

An index on multiple columns. Order matters: an index on (a, b) supports filtering on a or a, b but not b alone (left-most prefix rule).

Q165. What is a covering index?

An index that contains all columns a query needs, so the engine answers from the index without touching the table ("index-only scan"). SQL Server uses INCLUDE columns.

Q166. Why might an index not be used?

Low selectivity, functions wrapping the column (WHERE UPPER(name)=...), implicit type conversions, leading wildcard LIKE '%x', stale statistics, or the optimizer estimating a scan is cheaper.

Q167. What is a query execution plan?

The optimizer's chosen strategy to run a query (scans, joins, order). View it with EXPLAIN=/=EXPLAIN ANALYZE (PostgreSQL/MySQL) or SET SHOWPLAN execution plan (SQL Server).

Q168. What's the difference between an index seek and an index scan?

A seek navigates directly to qualifying rows (efficient, selective). A scan reads the whole index/table. Seeks are generally preferable for selective queries.

Q169. What is a partial/filtered index?

An index over a subset of rows defined by a predicate, smaller and faster for targeted queries:

CREATE INDEX idx_active ON orders (created_at) WHERE status = 'active';

Q170. What is a functional/expression index?

An index on an expression, enabling index use for that expression:

CREATE INDEX idx_lower_email ON users (LOWER(email));

Q171. How do statistics affect query performance?

The optimizer uses table/column statistics (row counts, value distribution) to estimate costs. Stale stats lead to bad plans; refresh with ANALYZE=/=UPDATE STATISTICS.

Q172. What are common causes of slow queries?

Missing indexes, SELECT *, non-sargable predicates, large unfiltered scans, N+1 query patterns, bad join order, parameter sniffing, lock contention, and insufficient memory causing spills to disk.

Q173. What is "sargable"?

Search ARGument ABLE — a predicate that can use an index (e.g., col = :v or col > :v). Wrapping the column in a function or doing arithmetic on it usually makes it non-sargable.

Q174. How do you optimize pagination on large tables?

Use keyset (seek) pagination instead of large OFFSET:

SELECT * FROM orders WHERE id > :last_seen_id ORDER BY id LIMIT 20;

Q175. What is index fragmentation and how do you address it?

Over time, page splits and deletes leave indexes fragmented, hurting scans. Rebuild or reorganize indexes (REINDEX, ALTER INDEX ... REBUILD) and tune fill factor.

Advanced & Real-World

Q176. What is an UPSERT and how do you do it?

Insert-or-update on conflict:

-- PostgreSQL
INSERT INTO inventory (sku, qty) VALUES ('A1', 10)
ON CONFLICT (sku) DO UPDATE SET qty = inventory.qty + EXCLUDED.qty;

-- MySQL
INSERT INTO inventory (sku, qty) VALUES ('A1', 10)
ON DUPLICATE KEY UPDATE qty = qty + VALUES(qty);

Q177. What is the MERGE statement?

A single statement to insert/update/delete a target based on a source — standard SQL, supported by SQL Server/Oracle/PostgreSQL 15+:

MERGE INTO target t USING source s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.val = s.val
WHEN NOT MATCHED THEN INSERT (id, val) VALUES (s.id, s.val);

Q178. How do you pivot rows into columns?

Conditional aggregation or PIVOT (SQL Server/Oracle):

SELECT
  SUM(CASE WHEN quarter='Q1' THEN sales END) AS q1,
  SUM(CASE WHEN quarter='Q2' THEN sales END) AS q2
FROM revenue GROUP BY region;

Q179. How do you unpivot columns into rows?

UNPIVOT (SQL Server/Oracle) or UNION ALL / CROSS JOIN LATERAL (VALUES ...) in PostgreSQL.

Q180. What is a stored procedure?

A precompiled set of SQL statements stored in the database, callable by name, supporting parameters and control flow — useful for encapsulating business logic and reducing round-trips.

Q181. What is the difference between a function and a stored procedure?

Functions return a value and can be used in queries (usually no side effects); procedures perform actions (DML, transactions) and are invoked with CALL=/=EXEC, often without returning a usable value in a SELECT.

Q182. What is a trigger?

Code that automatically executes in response to INSERT=/=UPDATE=/=DELETE events on a table — used for auditing, validation, or maintaining derived data. Use sparingly; they hide logic and can hurt performance.

Q183. How do you store and query JSON?

Modern engines have JSON types: PostgreSQL jsonb, MySQL JSON, SQL Server JSON functions.

SELECT data->>'name' FROM events WHERE data->>'type' = 'click';  -- PostgreSQL

Index jsonb with GIN indexes for containment queries.

Q184. How do you handle hierarchical data?

Options: adjacency list (parent_id + recursive CTE), nested sets, materialized path, or closure tables — each trades read vs. write complexity.

Q185. What is a window frame "RANGE BETWEEN INTERVAL" used for?

Time-based windows, e.g., a 7-day rolling sum based on dates rather than row counts:

SUM(amount) OVER (ORDER BY ts RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)

Q186. How do you detect and remove gaps/islands in sequences?

The "gaps and islands" pattern uses ROW_NUMBER() differences to group consecutive runs:

SELECT MIN(d), MAX(d) FROM (
  SELECT d, d - (ROW_NUMBER() OVER (ORDER BY d)) * INTERVAL '1 day' AS grp
  FROM activity
) t GROUP BY grp;

Q187. What is the difference between logical and physical data independence?

Logical independence: schema (logical) changes don't break applications (via views). Physical independence: storage/index changes don't affect the logical schema.

Q188. How do you safely run a large UPDATE/DELETE?

Batch it (e.g., DELETE ... WHERE id IN (SELECT id ... LIMIT 10000) in a loop) to limit lock duration and transaction log growth, commit between batches, and run during low-traffic windows.

Q189. What are common SQL anti-patterns?

SELECT *, implicit joins, N+1 queries, scalar functions in WHERE on indexed columns, storing comma-separated lists instead of relations (EAV/jaywalking), missing foreign keys, and overusing triggers/cursors.

Q190. How do you prevent SQL injection?

Use parameterized queries / prepared statements — never concatenate user input into SQL. Apply least-privilege DB accounts, validate input, and use ORMs/query builders that parameterize by default.