Big O Notation

The question we will try to answer in this blog post is as follows: How can we measure the effectiveness/performance of an algorithm?

First let’s consider this quote from Bill Gates (Founder of Microsoft):

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”

So, according to Bill Gates the length of a program (in lines of code) is not a criteria to consider when evaluating its effectiveness to solve a problem or its performance.

• A long program does not necessarly mean that the program has been coded the most effectively.
• And vice-versa, a shorter program does not necessarly perform better than a longer piece of code.

Big O Notation

The Big O notation is used in Computer Science to describe the performance (e.g. execution time or space used) of an algorithm.

The Big O notation can be used to compare the performance of different search algorithms (e.g. linear search vs. binary search), sorting algorithms (insertion sort, bubble sort, merge sort etc.), backtracking and heuristic algorithms, etc. It is especially useful to compare algorithms which will require a large number of steps and/or manipulate a large volume of data (e.g. Big Data algorithms).

Worst Case Scenario

The Big O notation specifically describes the worst-case scenario of an algorithm. For instance, let’s consider a linear search (e.g. finding a user by its username in a list of 100 users). In the best case scenario, the username being searched would be the first username of the list. In this case the algorithm would complete the search very effectively, in one iteration. However, the worst case scenario would be that the username being searched is the last of the list. In this case the algorithm would require 100 iterations to find it.

Considering that the Big O Notation is based on the worst case scenario, we can deduct that a linear search amongst N records could take N iterations. There is a linear correlation between the number of records in the data set being searched and the number of iterations of the worst case scenario.

In this case a linear search is a linear algorithm: Big O Notation: O(N)

So let’s review the different types of algorithm that can be classified using the Big O Notation:

O(1)O(N)O(N2)O(2N)O(log(N)) Constant Notation: O(1)

The constant notation describes an algorithm that will always execute in the same execution time regardless of the size of the data set.

For instance, an algorithm to retrieve the first value of a data set, will always be completed in one step, regardless of the number of values in the data set.

A hashing algorithm is an O(1) algorithm that can be used to very effectively locate/search a value/key when the data is stored using a hash table. It’s a more effective way than using a linear search O(N) or binary search O(Log(N)) algorithm. (See example blog post on hashing algorithm for memory addressing) Linear Notation: O(N)

A linear algorithm is used when the execution time of an algorithm grows in direct proportion to the size of the data set it is processing.

Algorithms, such as the linear search, which are based on a single loop to iterate through each value of the data set are more likely to have a linear notation O(N) though this is not always the case (e.g. binary search). Polynomial Notation: O(N2), O(N3), etc.

Polynomial algorithms include quadratic algorithms O(N2), cubic algorithms O(N3) and so on:

• O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the data set.
• O(N3) represents an algorithm whose performance is directly proportional to the cube of the size of the data set.
• etc.

Algorithms which are based on nested loops are more likely to have a quadratic O(N2), or cubic (N3), etc. depending on the level of nesting.

Typically, O(N2) algorithms can be found when manipulating 2-dimensional arrays, O(N3) algorithms can be found when manipulating 3-dimensional arrays and so on.

Most sorting algorithms such as Bubble Sort, Insertion Sort, Quick Sort algorithms are O(N2) algorithms. Exponential Notation: O(2N)

The exponential notation O(2N) describes an algorithm whose growth doubles with each addition to the data set.

Backtracking algorithms which test every possible “pathway” to solve a problem can be based on this notation. Such algorithms become very slow as the data set increases.

Example of exponential algorithm: An algorithm to list all the possible binary permutations depending on the number of digits (bits). Logarithmic Notation: O(log(N))

A logarithmic algorithm O(log(N)) is an algorithm whose growth decreases when the data set increase following a logarithmic curve. Logarithmic algorithms are hence quite efficient especially when processing large sets of data.

A binary search is a typical example of logarithmic algorithm. In a binary search, half of the data set is discarded after each iteration. Which means that an algorithm which searches through 2,000,000 values will just need one more iteration than if the data set only contained 1,000,000 values. Use a logarithmic algorithm (based on a binary search) to play the game Guess the Number.