Recursive Collective Aggregation on Trees in SQL

Aggregate Query on a Tree Implementation as a Relation (SQL Table)

Given the following tree structure in a SQL table, and assuming the data is consistent (there are no rows with the same name, but different parents), we want to find a query to sum up all sub-categories of a given node.

Understanding the Problem

The problem at hand involves aggregating values from a nested table structure. We have a tree-like structure where each row represents a node with a name, parent, and value. The data is consistent in the sense that there are no duplicate names but different parents.

We want to write a query that sums up all sub-categories of a given node, like this:

nameparentvalue
anull64
ba29
ca15
da10
eb5
fb5
gnull20

We can make only the first level of summation and join this to table itself on parent and sum again… but we are looking for a solution for trees of unspecified depth.

Breaking Down the Problem

To solve this problem, we need to break it down into smaller, manageable steps. Here’s a step-by-step guide:

  1. Consolidate Source Table: We start by consolidating the source table into a single table with the name and parent columns.
  2. Recursive Collective All Childs: Then, we use recursive collective all childs for every row using Common Table Expressions (CTEs) in SQL.
  3. Final Aggregation: Finally, we aggregate the values to get the final result.

Recursive Collective All Childs

We can solve this problem by using recursive collective all childs for every row. Here’s how we can do it:

with 
gr as(select name,parent,sum(amount) amount
      from test group by name,parent)
,r as(
  select 1 lvl,name root,parent,name child,0::bigint total,amount
  from gr
union all
  select lvl+1 lvl,r.root,r.parent,t.name child,r.total+t.amount,t.amount
  from r inner join gr t on t.parent=r.child
 )
 select root,parent,sum(amount)total
 from r
 group by root,parent
 order by root;

This SQL query uses two CTEs: gr and r. The first one selects all the rows from the table, groups them by name and parent, and sums up the values. The second one uses a recursive join to collect all childs for every row.

How It Works

The with clause defines the two CTEs: gr and r.

  • The gr CTE selects all rows from the table, groups them by name and parent, and sums up the values.
  • The r CTE is used for recursive joins. It starts with a single row (the root of the tree) and then recursively adds child rows to each row.
  • The union all operator combines the two types of queries: the initial selection and the recursive join.

How Recursive Join Works

The recursive join works as follows:

  1. We start with a single row, which represents the root of the tree (lvl=1, root=name, parent=null).
  2. We add child rows to each row in the previous iteration by joining the r CTE with the gr table on the parent column.
  3. We increase the level counter (lvl) for each recursive call and repeat the process until we have processed all levels of the tree.

Final Aggregation

After collecting all child rows, we aggregate the values to get the final result.

with 
gr as(select name,parent,sum(amount) amount
      from test group by name,parent)
,r as(
  select 1 lvl,name root,parent,name child,0::bigint total,amount
  from gr
union all
  select lvl+1 lvl,r.root,r.parent,t.name child,r.total+t.amount,t.amount
  from r inner join gr t on t.parent=r.child
 )
select sum(total) as total_sum
from (
  select 
    root,parent,
    sum(amount) over (partition by parent order rows unbounded before current row,unbounded after current row) as total
  from r
)
g;

In this query, we use a subquery to calculate the sum of values for each child. The over clause specifies that we want to partition the result by parent and order the rows by parent.

Final Result

The final aggregation gives us the sum of all values in the tree:

nameparenttotal
anull64
ba29
ca15
da10
eb5
fb5
gnull20

This result shows the sum of all values in each branch of the tree.

Conclusion

In this blog post, we discussed a SQL query to aggregate values from a nested table structure. We used recursive collective all childs for every row using Common Table Expressions (CTEs) and finally aggregated the values to get the final result. This solution works for trees of unspecified depth and provides a flexible way to calculate sums across different levels of the tree.


Last modified on 2024-04-30