Left-leaning red-black trees are hard to implement
Back in 2002, I needed balanced trees for a project I was working on, so I used the description and pseudo-code in Introduction to Algorithms to implement red-black trees. I vaguely recall spending perhaps two days on implementation and testing. That implementation uses C preprocessor macros in order to make it possible to link data structures into one or more red-black trees without requiring container objects.
About the same time, Niels Provos added a similar implementation to OpenBSD, which was imported into FreeBSD, so when I imported jemalloc into FreeBSD, I switched from my own red-black tree implementation to the standard one. Unfortunately, both implementations use nodes that include four pieces of information: parent, left child, right child, and color (red or black). That typically adds up to 16 or 32 bytes on 32- and 64-bit systems, respectively. A few months ago I fixed some scalability issues Stuart Parmenter found in jemalloc by replacing linear searches with tree searches, but that meant adding more tree links. These trees now take up ~2% of all mapped memory, so I have been contemplating ways to reduce the overhead.
A couple of weeks ago, I came across some slides for a talk that Robert Sedgewick recently gave on left-leaning red-black trees. His slides pointedly disparage the use of parent pointers, and they also make left-leaning red-black trees look simple to implement. Left-leaning red-black trees maintain a logical 1:1 correspondence with 2-3-4 B-trees, which is a huge help in understanding seemingly complex tree transformations.
Last Monday, I started implementing left-leaning red-black trees, expecting to spend perhaps 15 hours on the project. I'm here more than 60 hours of work later to tell you that left-leaning red-black trees are hard to implement, and contrary to Sedgewick's claims, their implementation appears to require approximately the same amount of code and complexity as standard red-black trees. Part of the catch is that although standard red-black trees have additional cases to deal with due to 3-nodes that can lean left or right, left-leaning red-black trees have a universal asymmetry between the left and right versions of the algorithms.
If memory overhead weren't my primary concern for this project, I would have dropped red-black trees in favor of treaps. Unfortunately, treaps require either recursive implementation or parent pointers, and they also require an extra "priority" field, whereas red-black trees can be implemented without recursion or parent pointers, and it is possible to stuff the red-black bit in the least significant bit of one of the left/right pointers.
For the curious or those in need of such a beast, here is my left-leaning red-black tree implementation.
[26 April 2008] I did some further experimentation to understand the performance disparity between implementations. The benchmarks mentioned above were flawed, in that they always searched for the most recently inserted item. Since top-down insertion/deletion is more disruptive than lazy fixup, the searches significantly favored the old implementation. I fixed the benchmarks to compute the times for random searches, random insertions/deletions, and in-order tree traversal.
The old rb.h and sys/tree.h perform essentially the same for all operations. The new rb.h takes almost twice as long for insertion/deletion, is the same speed for searches, and is slightly faster for iteration. Red/black bit twiddling overhead accounts for ~6% of insertion/deletion time, and <3% of search time.
I am actually quite pleased with these benchmark results, because they show that for random inputs, left-leaning red-black trees do not noticeably suffer from the fact that tree height is O(3h) rather than O(2h), where h is the height of an equivalent fully balanced tree.
