1 Data Structures & Algorithms
1.1 Analysis of Algorithms
In your previous algorithms course, the focus was on programming and problem solving only. We cared about whether we could get the right answer but did not consider the quality of the algorithm. Suppose now that you have two algorithms that both return the correct answer. How do you compare them? How do you decide which algorithm is better and when?
This is the idea of Algorithm Analysis, which aims to understand the resource requirements of an algorithm, measured in both time (algorithm runtime) and space (memory requirements). Computational Complexity theory then extends these ideas beyond a specific solution and focuses on all possible algorithms/solutions that could exist. For example, we know that to sort a list of \(n\) numbers by comparing their values an algorithm will need to make at least of the order \(nlog(n)\) comparisons on average, and in Honours, you will be able to prove this! This is one of the reasons you need all the different proof techniques from DCS - to prove an algorithm is correct, to prove its complexity, and to prove that it is or is not optimal.
When considering the computational complexity of an algorithm, we think about two things:
- How much time it takes to complete.
- How much memory it needs to complete.
Let’s say that the size of the input is \(n\), then we need to count the number of operations that the algorithm performs and try to come up with a function in terms of \(n\)… for example \(f(n) = 43n^2 + 3n + 42\). In a case like this, we would say that the runtime is quadratic because the function that describes it is quadratic in the size of the input. So the time complexity is quadratic.
In the second case, we would try to do the same thing, but rather than modelling the runtime, we would need to model the amount of RAM that is needed for the algorithm to finish. For example, if an algorithm requires \(g(n) = 42n + 3\) bytes of memory, then we would say that the memory requirements are linear because as the size of the input grows, the amount of memory we need grows linearly. The space complexity is linear.
1.2 Best, Worst or Average?
In the examples from above, the code always took the same amount of time to run for a specific sized input. But we know that sometimes algorithms will perform differently in different cases – even if the size of the input is the same. For example, how many times do we run the ==
operation in the code below?
Well… it depends on the values given to the algorithm. If the input list is \([1,2,3,4,5,6,7,8,9]\) and the search value is \(42\), then we will do \(n\) comparisons (where \(n\) is the length of the list). This is called the worst-case because it causes us to do the most work for inputs of size \(n\). On the other hand, if the search value was 1
when we would only do a single comparison and then the function would return. This is called the best-case because it causes us to do the least work for inputs of size \(n\).
We can also consider something called the average case. Here, we use the statistical definition of expectation and we calculate the total amount of work we expect to do by weighting it by the probability of us actually having to do that amount of work. For example, a QuickSort might make \(an^2+bn+c\) comparisons to sort a list in the worst case, but if we look really carefully, this can only actually happen in one or two of the \(n!\) permutations of the list. So even though the worst case is quadratic, it is so rare that the average case is still really really good! We will look at QuickSort in detail later on in this course.
We leave average case analysis to second and third years, and we will only focus on the best and worst-case analysis of the algorithms considered in this course.
Note: The best and worst cases have NOTHING to do with the size of the input, only the configuration. An input of length 0 is NOT the best case!
1.3 When do we actually care?
- Sort the following list of numbers:
[4, 6, 0, 5] - Sort the following list of numbers:
[79, 206, 4, 344, 323, 216, 278, 341, 55, 220, 320, 205, 36, 399, 160, 364, 303, 223, 325, 5, 199, 284, 256, 66, 65, 27, 203, 16, 301, 336, 138, 315, 76, 68, 103, 282, 202, 300, 167, 39, 46, 124, 197, 34, 141, 97, 270, 236, 342, 299, 84, 15, 77, 170, 334, 275, 250, 207, 135, 338, 390, 244, 252, 215, 266, 93, 384, 251, 296, 355, 335, 81, 31, 332, 373, 64, 98, 305, 107, 110, 23, 243, 351, 234, 378, 41, 184, 118, 277, 286, 306, 294, 186, 7, 123, 173, 192, 152, 11, 183, 87, 86, 137, 357, 26, 74, 175, 231, 20, 317, 18, 263, 392, 122, 185, 78, 106, 228, 80, 67, 24, 318, 394, 181, 254, 398, 279, 151, 159, 248, 326, 145, 247, 174, 91, 101, 13, 130, 125, 385, 90, 131, 289, 71, 308, 89, 349, 237, 72, 163, 136, 85, 375, 272, 182, 35, 290, 30, 212, 246, 257, 348, 293, 134, 213, 28, 253, 316, 387, 264, 133, 63, 120, 230, 142, 393, 32, 302, 214, 119, 258, 171, 319, 369, 95, 157, 57, 298, 400, 132, 102, 139, 140, 367, 259, 113, 117, 224, 304, 333, 208, 331, 368, 17, 380, 164, 324, 379, 179, 195, 14, 51, 33, 121, 274, 180, 108, 235, 295, 191, 200, 150, 340, 269, 345, 376, 116, 313, 262, 187, 61, 114, 267, 83, 194, 126, 177, 155, 281, 9, 388, 147, 261, 99, 82, 395, 209, 172, 53, 94, 366, 42, 374, 347, 178, 40, 280, 193, 44, 245, 372, 165, 88, 268, 54, 363, 352, 310, 115, 211, 127, 226, 128, 389, 287, 391, 276, 377, 383, 59, 129, 217, 353, 249, 314, 38, 166, 96, 358, 196, 312, 149, 168, 29, 241, 307, 229, 397, 8, 354, 176, 148, 239, 260, 111, 232, 297, 153, 190, 356, 221, 328, 58, 218, 198, 162, 285, 3, 109, 21, 337, 343, 381, 265, 386, 25, 283, 371, 233, 227, 238, 242, 146, 19, 2, 321, 240, 327, 69, 6, 359, 311, 204, 330, 73, 56, 169, 396, 271, 189, 361, 309, 210, 37, 105, 365, 201, 52, 10, 49, 350, 329, 219, 346, 112, 43, 154, 360, 161, 62, 339, 362, 382, 22, 322, 291, 370, 75, 48, 1, 100, 70, 92, 104, 60, 45, 273, 222, 255, 288, 158, 144, 292, 143, 156, 188, 50, 12, 225, 47]
No one cares how long it takes to sort four numbers, even a bad algorithm should sort that quickly, but as the input size (\(n\)) grows we start to care very quickly. So we prefer to only think about what happens when \(n\) is very large, or even as \(n\rightarrow\infty\). In this case, we are talking about the Asymptotic Complexity of the algorithm. When we talk about complexity in this way, we are comparing the runtime or space requirements of different algorithms when \(n\) is large enough to matter.
On top of this, these values only matter if they are general. No one cares if my T-1000 computer is crazy fast and blows all the other computers away1. If your computer really is that fast, then we’d want to be sorting bigger more complicated lists. So mathematically, \(n\) tends to infinity, and practically, \(n\) is big enough for it to matter on your computer.
This also means that we shouldn’t get too bogged down in counting every single thing. Consider this code below:
Suppose that an assignment takes \(a\) time units, integer comparison takes \(b\), floating point multiplication takes \(c\) time units, integer addition takes \(d\) time units. Then we end up with the following analysis:
def pow(x, n):
out = 1.0 # a
i = 0 # a
while i < n: # b * (n+1)
out = out * x # (a+c) * n
i = i + 1 # (a+d) * n
return out
So in the first two lines we assign values to out
and i
, after that our loop runs. We do \(n\) comparisons where we enter the loop and on comparison \(n+1\), \(i\nless n\) and the loop stops. Inside the loop, we perform the calculation and then assign it to the out
variable. After that, we add \(1\) to i
and then store the value back into i
. This happens each of the \(n\) times that we enter the loop. So in total:
\[\begin{eqnarray}
time & = & 2a+b(n+1)+(a+c)n+(a+d)n\\
& = & 2a+bn+b+an+cn+an+dn\\
& = & (b+2a+c+d)n + (2a+b)\\
& = & en + f,
\end{eqnarray}\]
where \(e\) and \(f\) are some constants that are very specific to my hardware and operating system and are not useful to anyone else. Ever. The important thing here is that this is a linear function and that it will be linear on every machine, everywhere, forever. So we try to pick a dominant operation and count how many times it is performed. The trick is to pick an operation that is representative of the total number of operations the algorithm will make. We’ll see that this is usually a comparison, assignment or pointer operation.
So when we do this type of analysis, we want to compare the shape of the graphs and ignore the constants that are specific to my machine. We are simply asking: as \(n\) becomes very very large, how do the runtime and memory requirements of the algorithm grow? We like to think of the complexity belonging to a class such as constant, linear, quadratic, cubic, logarithmic, exponential, linearithmic, factorial etc.
We write these as:
Class | Notation |
---|---|
Constant | \(O(1)\) |
Linear | \(O(n)\) |
Quadratic | \(O(n^2)\) |
Cubic | \(O(n^3)\) |
Logarithmic | \(O(log(n))\) |
Exponential | \(O(2^n)\) |
Linearithmic | \(O(nlog(n))\) |
Factorial | \(O(n!)\) |
This is called Big-Oh notation. If \(f(n) \in O(g(n))\) this means that \(g\) multiplied by a constant is an upper bound of \(f\), when \(n\) is big enough. Don’t panic. We will look at the maths formally later on once you have done a bit more work in DCS. For now just remember: if we say that an algorithm takes \(O(n)\) time, then the runtime is bounded above by a linear function.
We also have Big-\(\Omega\) (Big Omega), which means it is bounded below: if we say the algorithm takes \(\Omega(n)\) time, then the runtime is bounded below by a linear function. This means we do at least a linear amount of work. Finally, we have Big-\(\Theta\) (Big Theta), which means that it is bounded above and below. So \(\Theta(n)\) is exactly a linear function.
Don’t panic about these definitions yet. We’ll look at them carefully later on and we’ll talk about the maths when we need it. For now, we will only think about Big-Oh.
In the pow
function above, we saw that the amount of work that we did (number of operations) is a linear function of the size of the input, \(n\). We can write this in Big-Oh notation as \(O(n)\).
In the search
function above, we had a best and a worst-case. In the best case, the number we’re looking for is at the front. So in the best case, we only ever do \(1\) comparison. This means that regardless of how big \(n\) is, we always do a constant amount of work. So in the best case, search
takes \(O(1)\) time. In the worst case, when the number wasn’t there, we had to compare the search value to every number in the list. This means that we need to do \(n\) comparisons because there were \(n\) numbers. So the amount of work is linear and we can write \(O(n)\).
Have a look at the graphs on Big-Oh Cheat Sheet. See how the different classes grow at different speeds as \(n\) increases. Using the graphs, try to order the classes in the table above from best to worst.
1.4 What are we ignoring?
When we say that an algorithm takes \(O(1)\) time, it means that it takes the same amount of time regardless of how big or small the input is. When we say that an algorithm takes \(O(n)\) time, it means that it takes a linear amount of time - as the input size grows, the runtime increases linearly. When we say that an algorithm takes \(O(n^2)\) time, it means that as the input size grows, the runtime grows quadratically.
These shapes and growth rates are very important when \(n\) is very large because the shape is the most important thing in telling us how well our algorithm scales to larger input. If I double the input size on a \(O(n)\) algorithm, the runtime doubles, but if I double the input size on a \(O(n^2)\) algorithm, the runtime quadruples. So we now know which algorithm is likely to work well when \(n\) gets bigger.
BUT… we need to be careful claiming that one algorithm is better than another. When an algorithm is \(O(1)\) it takes a constant amount of time, but that constant could be \(4\) nanoseconds or it could be \(4\) hours. When an algorithm is \(O(n)\) it takes a linear amount of time, but the linear function could be \(42n+6\) or \(42000000n+6000000\). They are both the same in terms of asymptotic complexity - but they are definitely not the same in terms of efficiency.
We are also tempted to think that a \(O(1)\) is always better than \(O(n)\). This is certainly the case when \(n\) is large enough, but the constants that are hidden inside the Big-Oh (the \(a,b,c,d,e,f...\) from the previous examples) might matter in practice. A constant time of \(4\) million seconds is almost certainly worse than \(4n\) seconds - unless we know that \(n\) is likely to be larger than \(4\) million.
Asymptotic complexity is an important tool to analyse algorithms and understand how well they scale with the size of the input, but it is not the only thing to consider and we should bear in mind that sometimes the constants do actually matter.
1.5 Abstract Data Types
A core aspect of Computer Science is abstraction. We can’t deal with all the details all of the time and so sometimes we need to think at a high level to solve a full problem, but at a low level to solve specific aspects of the problem. When reading in a collection of numbers in Python, you probably said something like myList.append(42)
. You asked python to add a number to the list, but you did not tell python how to do so. Python gave us an interface that says what can do with the list and we could just use the interface without ever having to think about what was happening underneath. This is a high level of abstraction and the more details that are hidden, the higher the abstraction. Likewise, in C++ when you used vectors, you didn’t think about how the numbers were being represented in memory. You just had a list of numbers and you could add and remove at the back. Again, the language gave us an interface and we could just tell C++ what we wanted it to do.
In both of these cases, having the python list/C++ Vector allowed us as the programmer to think about how to solve our problem using a list of things, rather than us having to think about how to build our own list. We know that for a list we should be able to add items, remove items, get the number of items stored in it, and fetch each item in the list. This is a sensible set of operations that we might want to perform on a list, whether it is a list of students, a list of Tweets, or a list of my favourite chocolates. We add things, we remove things, we fetch things.
When we only declare an interface for a structure to store or manage data we can think of this as an Abstract Data Type (ADT) - for example, a list. We care about how we can use the object through its interface, but we don’t care about how that interface was actually implemented.
Think of a car. We only use the steering wheel, the brake pedal, the accelerator pedal, the clutch, the indicator, and the gear lever. Using only those \(6\) things, we can drive from one point to another. Those \(6\) things make up the interface to the car and we don’t need to know anything about the implementation (the engine, the gearbox, the… other things that make a car go…). This hides details from us and makes things easier to code.
One potential issue is that when we don’t know anything about the implementation, we might use it very inefficiently. We don’t know how python implements lists and so if we are constantly adding and removing items, it might slow our algorithm down substantially - it might even change the asymptotic complexity if we think that myList.push_back(42)
is constant, but actually takes \(O(n)\) time. If we call the function \(n\) times, we do \(O(n^2)\) amount of work instead of \(O(n)\) - this could be the difference between \(1000\) operations and \(1000000\) operations.
By understanding how we can represent data in memory efficiently, we can improve the efficiency of our algorithms substantially. To do this, we need to think of the abstract data type which declares the interface (what we can do with it) and then based on which operations we perform often, we then think about how we might implement or define the data type so that these functions are efficient.
1.6 Conclusion
By studying and analysing various structures, and the algorithms that manage them, we can learn to write code that is not just correct, but also efficient and scalable. This is a core aspect of Computer Science and you will study Algorithmic Analysis and Complexity Theory in the first, second and third years, as well as honours. In IDSA we will study, analyse and implement the data structures. We’ll consider time and space complexity and we’ll see how using these structures can change the efficiency of the programs that we implement.
In DCS you will learn a number of techniques from discrete mathematics, and you will learn a number of different proof techniques. Next year, in Analysis of Algorithms (AA), you will put the two together and you will learn to prove mathematically that algorithms are correct and that they are efficient. In third year you will start to look at Advanced Analysis of Algorithms where you learn about different problem classes themselves and you will learn to prove that an algorithm is optimal by finding lower bounds on the amount of work that needs to be done. In Honours, you’ll learn more advanced algorithms and analysis techniques and you’ll be able to analyse recursive and stochastic algorithms.
This is a core aspect of what it means to be a computer scientist and I suggest that you invest in IDSA and DCS as much as possible so that you have a strong foundation to build on in future years.
1.7 TODO
- Make sure that you are registered in the Moodle course for IDSA.
- Download the “C++ Revision Notes” pdf from Moodle and ensure that you’re comfortable with the concepts discussed there. If you’re unsure about anything, please speak to a tutor during the lab session.
- We will have the first lab on Thursday.
Ok, they probably do care, but that’s purely about street cred.↩︎