8 Binary Search Trees
8.1 Trees
Up to this point we have dealt with linear data structures. In some sense, the items were always stored one after the other. There was always some kind of linear arrangement to the way that the data was stored and there was the concept of the next item in the structure. From here onwards, we will focus on data structures that are non-linear. We will consider various types of trees and the algorithms associated with them.
Trees allow us to model hierarchical relationships between data where there is a parent and a child. For example, a filesystem has directories that can contain other subdirectories and files. A directory is the parent of all the items inside it and the items inside the directory are the children.
Another reason to use a tree is when we have large amounts of data and the access and search times associated with lists become too expensive. By organising data in a tree, we can achieve search times of O(logn) in comparison to O(n) that we would usually get when searching through a vector or linked list. This change might not seem substantial, but note that a running time of O(logn) is exponentially faster than O(n).25 If there are 1 million items in a vector, then a linear search would perform 1 million comparisons in the worst case. With a balanced binary tree, we could search the entire list for a number under 20 comparisons. In fact, this is how database systems work – they build an index using some kind of balanced binary tree (usually a B-Tree) which allows for exceptionally fast lookups even when there is a very large amount of data to search through.
We represent a tree as a set of vertices and edges. Each vertex represents a value that is stored in the tree and each edge represents the connection between a parent and a child. Every vertex has exactly 1 parent, except for the root which does not have a parent.
“A tree is a collection of nodes. The collection can be empty; otherwise, a tree consists of a distinguished node, r, called the root, and zero or more nonempty (sub)trees T1,T2,…,Tk each of whose roots are connected by a directed edge from r.”
A tree then has n−1 edges. A path from node nj to nk is a sequence of nodes such that there is an edge between any two consecutive nodes in the sequence. The length of the path is the number of edges that would be traversed moving from nj to nk.
A node can have any number of children. Nodes that share the same parent are called siblings. A node is called an internal node if it has one or more children. A node that has no children is called a leaf. The depth of a node is the length of the path from the root to that node. The height of the tree is the length of the longest path from the root to any leaf.
If we speak of a subtree that is rooted at nk, this means we are speaking of the node nk and all of the edges and vertices that are descendants of that node.
8.2 Binary Trees
A tree where every node has a maximum of two children is called a Binary Tree and in this case, we can speak of the left and right subtrees. We can speak of the level of a node, the root has level 1, its children are on level 2 etc. A child’s level is the parent’s level +1. A binary tree of height h will have h+1 levels. A binary tree with 2k nodes in level k is called a perfect binary tree. A binary tree where each node has zero or two children is called full or proper. A tree that is perfect
up to level h and then fills up from the left in level h+1 is called a complete binary tree.
Let n denote the number of nodes, nl denote the number of leaves, ni denote the number of internal nodes and h is the height of the tree. Then the following relationships hold:
- log2(n+1)−1≤h≤n−1
- h+1≤n≤2h+1−1
- nl=ni+1
- n=2ni+1
- ni=(n−1)/2
- nl=(n+1)/2
- n=2nl−1
- ni=nl−1
The first relationship is significant and gives us bounds on several algorithms that run from the root down a path until it reaches a leaf.
8.3 Binary Search Tree
An ordered binary tree is known as a Binary Search Tree (BST). A BST is a binary tree with the added constraint that all values stored in the left subtree of node nk must be less than the value stored in nk and all the values stored in the right subtree must be greater than or equal to the value stored in nk. The order property must hold for every node in the entire tree.
8.3.1 Insertion
Inserting a value into a BST requires that we start at the root and compare the value to be inserted with the value at the current node. If the value to be inserted is less than the current node’s value, then we should recursively insert the value into the left subtree, otherwise, we should recursively insert the value into the right subtree. When we find an empty space in the tree (e.g. we need to insert into the left subtree and there is no left child) then we create a node with the new value and insert that node into the space.
In the worst case, our algorithm needs to traverse the tree along the longest path from the root to a leaf before inserting. In this case, we traverse h nodes before we can perform the insert. So in the worst case, the amount of work done is proportional to the height of the tree.
From the relationships in Section 8.2 above, this means that the amount of work we need to do will depend on the structure of the tree. In the best case, the height of the tree will be log2(n+1)−1 and in the worst case, the height of the tree will be n−1. In Big-O notation, we are interested in the shape of these functions and so we say that the height is O(logn) or O(n) respectively.
When the height of the tree is O(n) we say that the tree is degenerate, and it has basically become a linked list. In later chapters we will consider algorithms that help make sure that the tree is always balanced and therefore has a height that is O(logn).
8.3.2 Searching
If we would like to search for a value, x, we follow a similar approach. Again we start at the root and compare x with the value of the current node. If it is equal then we have found the value, if x is less than the current node’s value, then we should recursively search the left subtree, otherwise, we should recursively search the right subtree. If the subtree being searched is ever empty, then it means that x is not present in the BST.
In the worst case, the number is not in the tree and the path we follow corresponds to the longest path from the root to a leaf. In this case, we traverse h times, so the amount of work done is proportional to the height of the tree. Just as in the case of insertion, this means that we will do either O(logn) or O(n) depending on the structure of the tree. In the best case, the search key (x) is the value stored in the root and we do O(1) amount of work.
8.3.3 Min & Max
The minimum or maximum values in a BST can be found by repeatedly traversing to the left or right.
To find the minimum value, we start at the root and repeatedly traverse left (curr = curr->left
) until we encounter a node that does not have a left child. This is then the minimum value in the tree. Similarly, by repeatedly traversing to the right, we will find the maximum value.
8.3.4 Delete
To remove a value from the tree we need to consider what happens to that node’s children. There are three different cases to consider.
When the node has no children, then we can simply delete the node, and set the parent’s child pointer to nullptr
. This process resembles deleting the last link in a linked list.
When the node has one child, then we need to set the parent’s child pointer to point to the current node’s child. After that is done then we can delete the current node.
Finally, when the node has two children, then we need to replace the value inside the current node with the successor. This means we need to find the next largest value in the tree and move it to the current node. We can then delete that value from the relevant subtree. The next largest value in the BST will be the minimum value in the right subtree. So we traverse to the right subtree and then traverse as far left as possible. This value can simply replace the value that was being deleted, and then we recursively delete that new value from the right subtree. We are guaranteed that in that case, then it will have zero or one child and the deletion can take place using the strategies above.
Using the proof techniques from DCS, how do you think you could prove that the successor does not have a left child?
8.4 Tree Traversals
8.4.1 Depth First Traversals
Sometimes we would like to traverse over the entire tree, visiting each node in a specified order. We can do this by performing a Depth First Search. At each step, we can process/print the current value, process all the values in the left subtree, and process all the values in the right subtree. The order in which we do this creates three different traversals:
- Pre-Order Traversal
- Value, Left, Right
- In-Order Traversal
- Left, Value, Right
- Post-Order Traversal
- Left, Right, Value
Each of these traversals achieves a different task. The in-order traversal will give us all of the values in order. This means that we could sort a list by constructing a BST and then performing an in-order traversal. The pre-order traversal can be used to serialise the tree. If we perform a pre-order traversal, we will get a list of numbers that, if used with the standard insertion algorithm, will recreate the tree. If we perform a post-order traversal, then we can use the tree to construct postfix expressions as seen in Section 5.5.1.
All of these traversals have elegant recursive implementations:
8.5 Lab: Binary Search Trees
For this lab, you need to implement your own Binary Search Tree. The code will be graded using IO marking on Moodle. As such, you can structure your code in any way you see fit. However, I have provided a skeleton that you can adapt if you find that easier.
You should read from stdin. There will be one number per line and you should incrementally insert them into your binary search tree. You should keep inserting numbers until you read a -1
.
8.5.1 Submission 1
Construct a binary search tree based on the numbers given on stdin. Then print out the pre-order, in-order, and post-order traversals each on their own lines. You should print out the numbers separated by spaces and it is acceptable to have a space at the end of each line.
Sample Input:
50
30
35
61
24
58
62
32
-1
Sample Output:
50 30 24 35 32 61 58 62
24 30 32 35 50 58 61 62
24 32 35 30 58 62 61 50
8.5.2 Submission 2
Construct a binary search tree based on the numbers given on stdin. Then print out the minimum number and maximum numbers each on their own line.
Sample Input:
50
30
35
61
24
58
62
32
-1
Sample Output:
24
62
8.5.3 Submission 3
Construct a binary search tree based on the numbers given on stdin. Then read another number from stdin and search for it in the tree. You should print out true
or false
depending on whether you can find the number or not. Your program should exit when it encounters another -1
.
Sample Input:
50
30
35
61
24
58
62
32
-1
-5
50
64
59
66
99
24
-1
Sample Output:
false
true
false
false
false
false
true
8.5.4 Submission 4
Finally, you should construct a binary search tree based on the numbers given on stdin. You should then continue to read numbers from stdin until you encounter a -1
again. After reading each number, you should delete it from the tree, and then give the pre-order, in-order and post-order traversals (just as in Submission 1) followed by a blank line. For each number, you delete you should print all three traversals.
Sample Input 1:
50
30
35
61
24
58
62
32
-1
30
24
32
-1
Sample Output 1:
50 32 24 35 61 58 62
24 32 35 50 58 61 62
24 35 32 58 62 61 50
50 32 35 61 58 62
32 35 50 58 61 62
35 32 58 62 61 50
50 35 61 58 62
35 50 58 61 62
35 58 62 61 50
Sample Input 2:
50
30
35
61
24
58
62
32
33
-1
30
24
32
-1
Sample Output 2:
50 32 24 35 33 61 58 62
24 32 33 35 50 58 61 62
24 33 35 32 58 62 61 50
50 32 35 33 61 58 62
32 33 35 50 58 61 62
33 35 32 58 62 61 50
50 35 33 61 58 62
33 35 50 58 61 62
33 35 58 62 61 50
8.5.5 Recommend Structure
Here is a recommended structure for your code:
class TreeNode{
public:
// Pointer to the left child
TreeNode* left = nullptr;
// Pointer to the right child
TreeNode* right = nullptr;
// Value in the node
int value;
// Constructor, sets the value
TreeNode(int v) : value(v) {}
// Destructor
~TreeNode() {}
};
class Tree{
private:
TreeNode* root = nullptr;
// These functions do the actual work
void insert(int v, TreeNode* &subtree){
if(subtree == nullptr){
subtree = new TreeNode(v);
}else if(v < subtree->value){
insert(v, subtree->left);
}else{
insert(v, subtree->right);
}
}
void preOrderTraversal(TreeNode* subtree) const {/* TODO */}
void inOrderTraversal(TreeNode* subtree) const {/* TODO */}
void postOrderTraversal(TreeNode* subtree) const {/* TODO */}
int min(TreeNode* subtree) const {/* TODO */}
int max(TreeNode* subtree) const {/* TODO */}
bool contains(int value, TreeNode* subtree) const {/* TODO */}
void remove(int value, TreeNode* &subtree) {/* TODO */}
public:
// Public interface to the recursive functions above
void insert(int value){
insert(value, root);
}
// Submission 1
void preOrderTraversal(){
preOrderTraversal(root);
cout << endl;
}
void inOrderTraversal(){
inOrderTraversal(root);
cout << endl;
}
void postOrderTraversal(){
postOrderTraversal(root);
cout << endl;
}
// Submission 2
int min(){
return min(root);
}
int max(){
return max(root);
}
// Submission 3
bool contains(int value){
return contains(value, root);
}
// Submission 4
void remove(int value){
remove(value, root);
}
~Tree(){
// Here you should recursively delete all the nodes in the tree
// This is not tested by the Moodle marker, but see if you can
// figure out how to implement this.
}
};
8.6 Additional Reading
- Great BST Visualisation: https://www.cs.usfca.edu/~galles/visualization/BST.html
- Geeks for Geeks - Binary Trees: https://www.geeksforgeeks.org/binary-tree-data-structure
- Geeks for Geeks - Binary Search Trees: https://www.geeksforgeeks.org/binary-search-tree-data-structure
- Chapter 4 – Weiss (2014)
- Chapter 7 – Goodrich, Tamassia, and Mount (2011)
- Chapter 8 – Dale, Weems, and Richards (2018)
- MIT OpenCourseware, Binary Search Trees: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-5-binary-search-trees-bst-sort/
References
Literally, exponential speedup in the mathematical sense: 2log2n=n↩︎