The Ubiquitous B-Tree
Recently, I was working on a collision detection routine for a small graphics library. My idea was to index objects in the scene, and then iterate over nearby pairs of objects and check for collisions. In order to do this, the index would need to be updated frequently, and it would need to be able to be scanned in order. While this was ultimately the wrong approach[1], it got me thinking about the most common data structure with this property, a variation of a B-tree called a B+tree.
The B-tree is an old data structure. It was invented in 1971 by Rudolf Bayer and Edward McCraight in their paper Organization and Maintenance of Large Ordered Indices, one year before C was implemented for the PDP-11[2][3]. Bayer and McCraight were looking to figure out a way to create a index of objects where adding, deleting, or modifying objects could be done quickly. In the original paper the objects were files in the filesystem, but their design is general enough to index any set of key-value pairs.
Using their design, setting or deleting a key could be done in O(log N) time where N is the number of objects indexed by the tree. This is a huge improvement over the worst case performance of a hash table which had been in use since the 1950s and which grows linearly when an extremely large amount of data needs to be indexed.
The B-tree itself is essentially a generalization of a binary search tree. The difference between them is that while a binary search tree can be unbalanced and therefore have poor worst case performance, a B-tree is always as balanced as possible. The other main difference between a binary search tree and a B-tree is that the branching factor for a B-tree can be greater than two. This means that a B-tree can be shallower than a binary search tree, since each layer can hold much more data, which is nice because it takes fewer reads from memory to find nodes on the tree.
Not long after the introduction of the B-tree did variations of the B+tree get introduced. By at least 1973, there was literature about some kind of B+tree for IBM’s VSAM file type. However, the canonical description of the B+tree comes from Douglas Cromer’s 1979 paper The Ubiquitous B-Tree. In his paper, he describes both B-trees and B+trees, highlighting their use in designing data storage applications.
The difference between a B-tree and a B+tree is that a B+tree only stores data in its leaves, and those leaves each have a link to the next leaf. The B+tree is then an index over a linked list.
Now, lets implement a B+tree. This not an ideal implementation since it is not thread safe. Thread safe B+trees are crazy and best practice dictates using a skip list for a concurrent key-value store. It is in C++17 because the actual algorithms used are more clear when you are managing the memory yourself. All pointers are shared pointers so that the implementation is safe-by default. The linked list structure is left off the following description, but adding it to a B+tree is trivial and is included in the source.
First let’s look at a the structure of a B+tree at rest.
To find a key k
within a B+tree, start at the root node. For each key stored in the root, find the pointer at index i such that key i - 1 if less than k
and key i and is greater than k
. Follow that pointer to the next node, and repeat the process until arriving at a leaf. Within the leaves, the key-value pairs can be stored in an array. Iterate over those pairs and find the key. If it is not there, then k
is not in the leaf. For updating the value for an already present key, the update can be done in place.
However, for inserting a new element or deleting an old element, one will need to make sure that the tree remains balanced. This is done by splitting and merging nodes when they are too full or too empty. In particular, for a B+tree of order b, each inner node can only hold between b and 2b pointers to children. When a node splits its parent node must add a pointer to accommodate both new nodes. If that node becomes to full, it splits, and so on. For merges, if a node shrinks in below length b, then the node merges with an adjacent node and notifies its parent to remove one of the pointers. If the parent becomes too short, then it merges with an adjacent node and so on. The root node will split if it gets too full and a new root node will need to be created. However the root node can have fewer than b pointers since it does not have any neighbors. Here I assume that each leaf can only hold b key-value pairs as well, but in practice the leaves may hold many more pairs than the inner nodes.
As for an implementation, we can see the structure of an internal node is different from a leaf node, but it would be nice if we could could use the same interface for merging and splitting nodes. With that in mind we will write an abstract class called BPlusTreeNode
. We then write two child classes BPlusTreeLeaf
and BPlusTreeInnerNode
. The nodes will need to implement get
, set
, remove
, split
, and merge
. Further, we want to be able to store different key and value types so we will template this class. Lets try:
template <typename K, typename V> class BPlusTreeNode {
public:
virtual ErrorOr<V> get(K key) {
ErrorOr<V> ret;
ret.set_error(Error("Not implemented"));
return ret;
}
virtual std::shared_ptr<BPlusTreeNode<K, V>> set(K key, V value) {
return nullptr;
}
virtual Status remove(K key) { return Status("Not implemented", false); }
virtual void merge(std::shared_ptr<BPlusTreeNode<K, V>> node) {}
virtual std::shared_ptr<BPlusTreeNode<K, V>> split() { return nullptr; }
};
Even though this is an abstract class, and we will override every function, we need to have some dummy implementation to please the compiler. (The classes ErrorOr
and Status
are used for returning and handling errors without throwing exceptions. You can see their implementation in the linked code, but the interface should be obvious from context.)
Now, let’s implement BPlusTreeInnerNode
. Because the each node does not need to know the type of its children, the implementation does not need to know if its children are leaves or more inner nodes.
template <typename K, typename V>
class BPlusTreeInnerNode : public BPlusTreeNode<K, V> {
public:
ErrorOr<V> get(K key) {
ErrorOr<V> ret;
if (children_.size() == 0) {
// Check node is not empty in case it is a root node.
ret.set_error(Error("BPlusTreeInnerNode::get: Node has no children."));
return ret;
}
// `childIndex(...)` calculated which child in `children_`
// could contain a given key. Here, we pass the `get` call
// to the correct child and return the result.
int index = childIndex(key);
return children_.at(index)->get(key);
}
std::shared_ptr<BPlusTreeNode<K, V>> set(K key, V value) {
// The set operation works by looking for child that would
// have a key and telling it to set the value. This process
// terminates when the set operation hits a leaf. A set
// operation should never fail.
// `node->lowest_key()` return the lowest key stored by
// the node. This is implemented for leaves as well.
if (children_.size() == 0) {
// This is an empty inner node. This happens when the
// B+tree is just initialized. Thus we make a new
// leaf and add the key-value pair.
std::tuple<K, V> pair(key, value);
std::vector<std::tuple<K, V>> pairs;
pairs.push_back(pair);
std::shared_ptr<BPlusTreeLeaf<K, V>> leaf =
std::make_shared<BPlusTreeLeaf<K, V>>(order_, pairs);
children_.push_back(leaf);
return nullptr; // this did not split
}
// If we have not returned, then we still need to set the
// key-value pair.
int index = childIndex(key);
auto node = children_.at(index)->set(key, value);
if (node != nullptr) {
// The child split and we need to update this instance's
// set of children.
keys_.insert(keys_.begin() + index, node->lowest_key().expect());
children_.insert(children_.begin() + index + 1, node);
}
if (children_.size() > static_cast<std::size_t>(2 * order_)) {
// This instance is too big and needs to split.
std::shared_ptr<BPlusTreeNode<K, V>> ret = split();
return ret; // this did split
}
return nullptr; // this did not split
}
Status remove(K key) {
// This function removes a key. It returns an error if
// the key could not be found.
int index = childIndex(key);
if (index == -1) {
// Check if node is empty.
return Status("Key not found", false); // error
}
// Pass the remove call to the correct child and return
// the result of there is no error.
Status status = children_.at(index)->remove(key);
if (!status.is_ok()) {
return status; // error
}
if (children_.at(index)->length() < order_ && children_.size() > 1) {
// Merge if needed.
int low_index = index == 0? index : index - 1;
int high_index = index == 0? index + 1 : index;
children_.at(low_index)->merge(children_.at(high_index));
keys_.erase(keys_.begin() + low_index);
children_.erase(children_.begin() + high_index);
if (children_.at(low_index)->length() > 2 * order_) {
std::shared_ptr<BPlusTreeNode<K, V>> node = children_.at(low_index)->split();
keys_.insert(keys_.begin() + low_index, node->lowest_key().expect());
children_.insert(children_.begin() + high_index, node);
}
}
return Status(); // okay
}
void merge(std::shared_ptr<BPlusTreeNode<K, V>> node) {
// Merging is easy for inner nodes. We take all of the
// children and keys and merge the two arrays. We are
// assured this will work because the node with which
// this instance is merging will always be the next
// highest adjacent node.
// Here we use functions not shown. `node->children()`
// returns `node.children_` if node is an inner node.
// If it is a leaf it returns `nullptr`, but this function
// is only ever called when `node` is an inner node.
for (const auto& child : node->children()) {
K lowest_key = child->lowest_key().expect();
keys_.push_back(lowest_key);
children_.push_back(child);
}
}
std::shared_ptr<BPlusTreeNode<K, V>> split() {
// To implement this function we simply iterate over
// the pointers and keys and move them from `this`
// instance to a new one.
std::vector<std::shared_ptr<BPlusTreeNode<K, V>>> new_children;
std::vector<K> new_keys;
std::size_t start = children_.size() / 2;
keys_.erase(keys_.begin() + start - 1);
while (start < children_.size()) {
if (start < children_.size() - 1) {
new_keys.push_back(keys_.at(start - 1));
keys_.erase(keys_.begin() + start - 1);
}
new_children.push_back(children_.at(start));
children_.erase(children_.begin() + start);
}
return std::make_shared<BPlusTreeInnerNode>(new_children, new_keys, order_);
}
.
.
.
private:
// The order of this tree
int order_;
// The used to sperate child trees
std::vector<K> keys_;
// The this node's child keys
std::vector<std::shared_ptr<BPlusTreeNode<K, V>>> children_;
.
.
.
};
Here I have left out the constructors and the definition of functions that should be obvious. You can see the full implementation in the linked code.
Now for the leaf nodes:
template <typename K, typename V>
class BPlusTreeLeaf : public BPlusTreeNode<K, V> {
public:
ErrorOr<V> get(K key) {
// Try to return the value for a key if it is present by
// iterating over the key-value pairs in `pairs_`.
ErrorOr<V> ret;
for (auto const &pair : pairs_) {
if (key == std::get<0>(pair)) {
ret.set_value(std::get<1>(pair));
return ret; // ok, found the value
}
}
ret.set_error(Error("BPlusTreeLeaf::get: Key was not found in leaf."));
return ret; // error, not found
}
std::shared_ptr<BPlusTreeNode<K, V>> set(K key, V value) {
// Add a key-value pair, or set the value for a given key
// if it is already present.
std::tuple<K, V> tuple = std::make_tuple(key, value);
if (pairs_.size() == 0) {
// if `pairs_` is empty, trivially add the pair
pairs_.push_back(tuple);
return nullptr;
} else {
// try to replace the key-value pair if the key is
// present.
for (std::size_t k = 0; k < pairs_.size(); k++) {
if (key == std::get<0>(pairs_.at(k))) {
pairs_.erase(pairs_.begin() + k);
pairs_.insert(pairs_.begin() + k, tuple);
return nullptr;
}
}
// Try to insert the key-value pair.
bool is_set = false;
for (std::size_t k = 0; k < pairs_.size() && !is_set; k++) {
if (std::get<0>(pairs_.at(k)) > key) {
pairs_.insert(pairs_.begin() + k, tuple);
is_set = true;
}
}
// If not inserted by now, then the key must not be
// between any other keys in `pairs_`
if (!is_set) {
pairs_.push_back(tuple);
}
if (pairs_.size() > (std::size_t) (2 * order_)) {
// split if the node is too big
return split();
}
return nullptr;
}
}
Status remove(K key) {
// Iterate over key value pairs and delete the key if it
// is found.
for (std::size_t k = 0; k < pairs_.size(); k++) {
if (key == std::get<0>(pairs_.at(k))) {
pairs_.erase(pairs_.begin() + k);
return Status(); // ok
}
}
return Status("Key not present", false); // error, key not found
}
void merge(std::shared_ptr<BPlusTreeNode<K, V>> node) {
// Merge leaves by merging each instance's `pairs_`.
for(const auto& pair : node->pairs()) {
pairs_.push_back(pair);
}
}
std::shared_ptr<BPlusTreeNode<K, V>> split() {
// Split leaves by splitting `pairs_`
std::vector<std::tuple<K, V>> new_pairs;
std::size_t start = pairs_.size() / 2;
while (start < pairs_.size()) {
new_pairs.push_back(pairs_.at(start));
pairs_.erase(pairs_.begin() + start);
}
return std::make_shared<BPlusTreeLeaf<K, V>>(order_, new_pairs);
}
private:
// The order of the B+tree
int order_;
// The key-value pairs present in this leaf
std::vector<std::tuple<K, V>> pairs_;
.
.
.
};
Finally, we need to wrap the root node in another class to handle root splitting, but the implementation is trivial and, I’d refer you to the full implementation on Gitlab.
[1] The actual blessed method is indexing with a BSP-tree.
[2] The paper was published in Acta Information, the same journal in which the LSM-tree would first appear years later.
[3] No one know what the ‘B’ in B-tree originally referred to, but I think it is obvious from reading the paper that b was just a variable name to be replaced by a value. For example, a B-tree of order 5 would be called a 5-tree.