Of sql trees, table keys and parent_ids
XoreMy current job has me often sifting through someone else’s source code and making small changes as required by the client. This can occasionally be rewarding, since it gives me a chance to grow accustomed to many coding styles and algorithms. And any humorous comments left behind in the code (infrequent at best)
There’s always a certain amount of irritation that comes across when it’s obvious that the person who wrote the original code just didn’t have a clue what they were doing. Core concepts like very basic SQL optimization are just not there. I ran across this one little gem today, and it made me cringe.

This was just a small part of an overall poorly implemented mailbox tree algorithm. The sql table is equally poorly implemented, storing nothing but a parent id for each node in the tree. There’s a few ways to get a full tree out such a table: none of them are good.
- query the entire table, and manually reconstruct the tree in code
- recursively query the tree for parent_ids, which will take k consecutive calls for a tree k layers deep
- JOIN the table to itself k-1 times (assuming you know the value of k).
- Perhaps a leap-frog like method that combines the above two “solutions”
I don’t particularly like any of these solutions, because there are obvious boundary cases where any particular method will prove to have exceptionally better performance than the others. Are there good heuristics for determining this?
Back when i was still taking courses in university, the courses i took where tree algoritms were mentioned exclusively dealt with situations where trees were used as indexes into larger datasets, but that the tree structure itself was unimportant: You see algorithms that rearrange trees arbitrarily to optimize search performance and storage space, but there is little or no interest in optimizations for storing trees with fixed hierarchies.
There are a few solutions out on the web that i’ve seen. Some of them involve parents storing lists of children, some involving children storing lists of ancestors. None of these work particularly well unless your DBMS has good support for list processing within single table cell. The better alternative is to store parent/ancestor-child relationships in a secondary table. This enables one to query full subtrees and parent paths in a single query (w/ JOIN), which is nice, but of course requires the overhead of a second table to store this data. Additinally, the process of moving a node from one position in a tree to another involves deleting and inserting multiple entries for each of it’s children, and recalculating depth information (if you choose to store this)
An innovative solution i’ve seen (found in phpBB 3.0 alpha code but likely exists elsewhere) is storing left and right ids of a given node. This acts as an “umbrella” of sorts, that aims to guarantee that any entities with left and right ids inside it’s own range are children, and any entities with left and right ids that bracket it’s own range are ancestors.

This doesn’t require the overhead of a second table but it does require that you’re very careful about rearranging items in the tree, that left and right ids get updated properly for all entries with each move. The downside is that usually this means renumbering O(n) left and right ids (ie, sometimes the entire table) but the upside is that the code is trivial: you can do this with a single SQL query because the shift is identical across the board. Another bonus is that moving entire subtrees is equally easy as moving any single item.
For smallish trees where the overhead of rolling a right/left id update over the entire tree isn’t a big deal, this is so far the best solution i’ve come across. With a little tweaking, it could additionally support depth information.
But what about situtations where you’ve got a *lot* of data and you’re adding/deleting/moving an entry/subtree and the overhead of updating every single entry is just too much?
Are there any known good solutions for this problem?
May 21st, 2006 at 1:23 pm
Are you sure that’s not my code you’re talking about?