Duplicate Detection Algorithms & Big O Notation

One of the most powerful ways to understand Big O Notation is to compare two different ways of solving the same problem.

Duplicate Detection Challenge?

The aim of a duplicate detection algorithm is to determine whether a list of values contains any duplicates.
This sounds simple. But the way we solve it makes a huge difference to performance, especially when the list given list contain a large number (n) of values.

The Problem

Imagine we have a list of student ID numbers:

list = [1023, 1087, 1045, 1067, 1036, 1099, 1043, 1022, 1063, 1045, 1039]

In this list of IDs we have one duplicate value: 1045 appears twice.

So let’s investigate an algorithm to detect whether a list contains duplicate values. We will compare two approaches:

  1. A solution using nested for loops (Brute force approach)
  2. A more efficient solution using a set (Hash Table)

Method 1: Using Nested Loops (Brute Force)

With this approach we will compare every element of the list with every other elements.
If any two values match (at different positions), we have found a duplicate.

Algorithm (Pseudocode)
FOR i FROM 0 TO length-1
   FOR j FROM i+1 TO length-1
      IF list[i] == list[j]
         RETURN True
RETURN False
Python Implementation
def has_duplicates_bruteforce(data):
   for i in range(len(data)):
      for j in range(i + 1, len(data)):
         if data[i] == data[j]:
            return True
   return False
Big O Notation?

If the list has n items, the outer loop runs n times.
For each of those, the inner loop runs roughly n times.
Total comparisons are approximately:
n × n = n²

  • n = 100 → 10,000 comparisons
  • n = 1,000 → 1,000,000 comparisons
  • n = 10,000 → 100,000,000 comparisons

That growth is quadratic.
We describe this as O(n²) — Polynomial time complexity.

Method 2: Using a Set (Hash Table)

A set in Python:

  • Stores only unique values
  • Allows very fast lookups using a hashing algorithm (average O(1) – Constant Big O Notation to access data)

With this method we access each value of the list one by one and store them in a hash table to we keep track of values we have already seen.
If we ever see a value that is already in the set we have found a duplicate.

Algorithm (Pseudocode)
CREATE empty set called values

FOR each item in list
   IF item is in values
      RETURN True
   ADD item to values

RETURN False

Python Implementation

def has_duplicates_set(data):
   values = set()
   for item in data:
      if item in values:
         return True
      values.add(item)
   return False
Big O Notation?

With this approach, we loop through the list once (n iterations): O(n).
Set lookups and insertions are (on average) constant time: O(1).

  • n = 100 → 100 checks
  • n = 1,000 → 1,000 checks
  • n = 10,000 → 10,000 checks

We describe this as O(n) — linear time complexity.

Comparing Both Methods

n Brute Force (O(n²)) Set Method (O(n))
100 10,000 checks 100 checks
1,000 1,000,000 checks 1,000 checks
10,000 100,000,000 checks 10,000 checks

The difference becomes enormous very quickly.

We can clearly see that as n increases, the second method based on a linear time complexity – O(n), will perform much better than the first approach based on a polynomial time complexity – O(n²)


Python Code

We have implemented both functions in the Python IDE below. Using the time library we are also measuring how long the computer takes to cehck for duplaicates in our randomly generated lists of 5,000 values (n=5,000) using each approach.

Try increasing the size of the list to:

  • 10,000 values
  • 20,000 values
  • 50,000 values

What happens? You should notice the brute-force version slows dramatically.

Key Big O Takeaway

  • Nested loops over the same data usually lead to a O(n²) time complexity
  • Single loop with constant-time operations usually lead to a O(n) time complexity
  • Smarter data structures such as Sets (Hash tables) can dramatically reduce the time complexity of an algorithm

Did you like this challenge?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

As you found this challenge interesting...

Follow us on social media!

Tagged with: