tl;dr: Weak AVL trees are replacements for AVL trees and red-black trees.
The 2014 paper Rank-Balanced Trees (Haeupler, Sen, Tarjan) presents a framework using ranks and rank differences to define binary search trees.
- Each node has a non-negative integer rank
r(x). Null nodes have rank -1. - The rank difference of a node
xwith parentp(x)isr(p(x)) − r(x). - A node is
i,jif its children have rank differencesiandj(unordered), e.g., a 1,2 node has children with rank differences 1 and 2. - A node is called 1-node if its rank difference is 1.
Several balanced trees fit this framework:
- AVL tree: Ranks are defined as heights. Every node is 1,1 or 1,2 (rank differences of children)
- Red-Black tree: All rank differences are 0 or 1, and no parent of a 0-child is a 0-child. (red: 0-child; black: 1-child; null nodes are black)
- Weak AVL tree (new tree described by this paper): All rank
differences are 1 or 2, and every leaf has rank 0.
- A weak AVL tree without 2,2 nodes is an AVL tree.
1 | AVL trees ⫋ weak AVL trees ⫋ red-black trees |
Weak AVL Tree
Weak AVL trees are replacements for AVL trees and red-black trees. A single insertion or deletion operation requires at most two rotations (forming a double rotation when two are needed). In contrast, AVL deletion requires O(log n) rotations, and red-black deletion requires up to three.
Without deletions, a weak AVL tree is exactly an AVL tree. With deletions, its height remains at most that of an AVL tree with the same number of insertions but no deletions.
The rank rules imply:
- Null nodes have rank -1, leaves have rank 0, unary nodes have rank 1.
Insertion
The new node x has a rank of 0, changed from the null
node of rank -1. There are three cases.
- If the tree was previously empty, the new node becomes the root.
- If the parent of the new node was previously a unary node (1,2 node), it is now a 1,1 binary node.
- If the parent of the new node was previously a leaf (1,1 node), it is now a 0,1 binary node, leading to a rank violation.
When the tree was previously non-empty, x has a parent
node. We call the following subroutine with x indicating
the new node to handle the second and third cases.
The following subroutine handles the rank increase of x.
We call break if there is no more rank violation, i.e. we
are done.
The 2014 paper isn't very clear about the conditions.
1 | // Assume that x's rank has just increased by 1 and rank_diff(x) has been updated. |
Insertion never introduces a 2,2 node, so insertion-only sequences produce AVL trees.
Deletion
TODO: Describe deletion
Implementation
In 2020, FreeBSD has changed its sys/tree.h to use weak
AVL trees instead of red-black trees. https://reviews.freebsd.org/D25480 The rb_
prefix remains as it can also indicate Rank-Balanced:)
Here is a C++ implementation with the following operations
insert: insert a noderemove: remove a noderank: count elements less than a keyselect: find the k-th smallest element (0-indexed)prev: find the largest element less than a keynext: find the smallest element greater than a key
The insertion and deletion operations are inspired by the FreeBSD implementation, with the insertion further optimized.
We encode rank differences instead of the absolute rank values. Since
valid rank differences can only be 1 or 2, a single bit suffices to
encode each. We take 2 bits from the parent pointer to
store the two child rank differences.
1 |
|
Node structure:
ch[2]: left and right child pointers.par_and_flg: packs the parent pointer with 2 flag bits in the low bits. Bit 0 indicates whether the left child has rank difference 2; bit 1 indicates whether the right child has rank difference 2. A cleared bit means rank difference 1.i: the key value.sum,size: augmented data maintained bymconcatfor order statistics operations.
Helper methods:
rd2(d): returns true if childdhas rank difference 2.flip(d): toggles the rank difference of childdbetween 1 and 2.clr_flags(): sets both children to rank difference 1 (used after rotations to reset a node to 1,1).
Invariants:
- Leaves always have
flags() == 0, meaning both null children are 1-children (null nodes have rank -1, leaf has rank 0). - After each insertion or deletion,
mconcatis called along the path to the root to update augmented data.
Rotations:
The rotate(x, d) function rotates node x in
direction d. It lifts x->ch[d] to replace
x, and updates the augmented data for x. The
caller is responsible for updating rank differences.
The paper suggests that each node can use one bit to encode its own rank difference. However, this representation is not ideal. A null node can be either a 1-child (parent is binary) or a 2-child (parent is unary). Retrieving the child rank difference requires probing the sibling node:
int rank_diff(Node *p, int d) { return p->ch[d] ? p->ch[d]->par_and_flg & 1 : p->ch[!d] ? 2 : 1; }
Misc
Visualization: https://tjkendev.github.io/bst-visualization/avl-tree/bu-weak.html