Recursive Common Table Expressions (CTEs) for Hierarchical Data Queries
Introduction
Hierarchical data queries are common in various domains, including organizational charts, family trees, and genealogy. In these scenarios, it’s essential to retrieve not only the immediate children but also the nested children of every parent. This problem can be solved using Recursive Common Table Expressions (CTEs) in SQL.
Problem Statement
Given a table with a parent-child relationship, we want to query all users at each level of nesting, including their parent and child IDs.
Example Data
The following example demonstrates the data structure:
| UserID | ParentID |
|---|---|
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
| 6 | 5 |
| 7 | 6 |
| 8 | 6 |
| 9 | NULL |
Solution
We can solve this problem using a recursive CTE. The approach involves two main steps:
- Identify the root nodes: Select all rows with
ParentIDasNULL, which represent the top-level parents. - Iteratively expand the hierarchy: Use the
union alloperator to combine the results of the previous step with the nested children.
Step-by-Step Solution
Identify the Root Nodes
First, we select all rows where the ParentID is NULL, which represent the root nodes:
WITH cte AS (
SELECT UserID rootid, UserID, ParentID FROM users
UNION ALL
SELECT cte.rootid rootid, users.UserID, users.ParentID FROM users
INNER JOIN cte ON users.ParentID = cte.UserID
)
SELECT rootid parentid, UserID from cte
ORDER BY rootid, UserID;
Explaination
- We start by creating a CTE named
cte. - Inside the CTE, we define two recursive queries:
- The first query selects all rows from the
userstable whereParentIDisNULL, and assigns it as the root node (rootid). It also includes theUserIDandParentIDcolumns for later use. - The second query joins the
cteCTE with theuserstable on the condition that theParentIDin theuserstable matches theUserIDin thecte.
- The first query selects all rows from the
- We then select all rows from the
cteCTE, ordering them by both therootidandUserIDcolumns to produce the desired hierarchical structure.
Option: Limiting Recursion
To prevent an infinite recursion loop, we can set a limit on the maximum number of recursions using the OPTION (MAXRECURSION 0) clause. This ensures that the query does not go beyond a certain depth in the hierarchy:
WITH cte AS (
SELECT UserID rootid, UserID, ParentID FROM users
UNION ALL
SELECT cte.rootid rootid, users.UserID, users.ParentID FROM users
INNER JOIN cte ON users.ParentID = cte.UserID
)
SELECT rootid parentid, UserID from cte
ORDER BY rootid, UserID;
OPTION (MAXRECURSION 0);
Example Output
The final output will contain all users at each level of nesting, including their parent and child IDs:
| ParentID | UserID |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 2 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 5 | 6 |
| 5 | 7 |
| 5 | 8 |
| 6 | 6 |
| 6 | 7 |
| 6 | 8 |
| 7 | 7 |
| 8 | 8 |
| 9 | 9 |
Conclusion
Recursive CTEs are a powerful tool for solving hierarchical data queries. By identifying the root nodes and iteratively expanding the hierarchy, we can retrieve all users at each level of nesting, including their parent and child IDs. This approach allows us to efficiently process complex data structures and provides a flexible solution for various use cases.
Further Reading
Last modified on 2023-10-29