Sunday, May 08, 2011

Binary Trees with Repeated elements

In school/college, we all learnt about Binary Trees and while we were taught how to implement them, the implicit assumption was that elements would not be repeated. However, in the real-world, we need to deal with repeated elements all the time. Most standard library implementations of popular languages (read Java) provide a TreeMap that is reduced to something very powerless because it doesn't support repeated elements!! The C++ std::multimap class however supports repetitions and is IMHO much more powerful than any other Balanced Binary Tree implementation that I have come across.

Today, I shall make an attempt to de-mistify the process of creating a Binary Tree with repeated elements. It's really simple and I think that the process can be summarized in one sentence: "Instead of the left subtree containing elements strictly less than the element at the root and the right subtree containing elements strictly greater than the element at the root, the left subtree shall contain elements less than or equal to the element at the root and the right subtree shall contain elements greater than or equal to the element at the root". Simple, isn't it!!

Now, let's see why it works.

The simple case of asking whether an element exists or not can be easily implemented the same way as before. If the element is present, we shall hit it while traversing down the tree.

Finding the first instance of an element (if it exists) is called locating the lower_bound of that element. This can be accomplished by running down the left of the node till you have an element at the node that is greater than or equal to the element to be found. We keep a track of the node every time we take a left turn. Othwerise (the element at the node is less than the element we are interested in), we go down the right. Since the lower_bound of an element is never less than the element itself, we don't keep track of nodes when we take a right turn.

Finding one-past the last instance of an elemnt (if it exists) is called locating the upper_bound of that element. This can be accomplished by running down the left of the node till you have an element at the node that is greater than the element to be found. We keep a track of the node every time we take a left turn. Othwerise (the element at the node is less than or equal to the element we are interested in), we go down the right. Since the upper_bound of an element is never less than or equal to the element itself, we don't keep track of nodes when we take a right turn.

On the following pages, can find very high quality implementations of a Red-Black Tree and an AVL Tree respectively that both support repeated elements as well as the lower_bound and upper_bound functions.

No special handling is required during re-balancing the tree during an insertion or a deletion.

Here is a sample instance of the AVL Tree after the following elements have been inserted into it in this same order (the numbers after the - sign are just to disambiguate nodes with the same key while plotting the tree):

4, 9, 2, 5, 4, 2, 1, 2, 3, 2, 1, 7, 3, 2



No comments: