Duplicate Detection Algorithms & Big O Notation

Posted on February 24, 2026 by Administrator Posted in A Level Concepts, Computer Science, Computing Concepts, Python - Advanced, Python Challenges

One of the most powerful ways to understand Big O Notation is to compare two different ways of solving the same problem.

Duplicate Detection Challenge?

The aim of a duplicate detection algorithm is to determine whether a list of values contains any duplicates.
This sounds simple. But the way we solve it makes a huge difference to performance, especially when the given list contains a large number (n) of values.

The Problem

Imagine we have a list of student ID numbers:

list = [1023, 1087, 1045, 1067, 1036, 1099, 1043, 1022, 1063, 1045, 1039]

In this list of IDs we have one duplicate value: 1045 appears twice.

So let’s investigate an algorithm to detect whether a list contains duplicate values. We will compare two approaches:

A solution using nested for loops (Brute force approach)
A more efficient solution using a set (Hash Table)

Method 1: Using Nested Loops (Brute Force)

With this approach we will compare every element of the list with every other elements.
If any two values match (at different positions), we have found a duplicate.

Algorithm (Pseudocode)

FOR i FROM 0 TO length-1
   FOR j FROM i+1 TO length-1
      IF list[i] == list[j]
         RETURN True
RETURN False

Python Implementation

def has_duplicates_bruteforce(data):
   for i in range(len(data)):
      for j in range(i + 1, len(data)):
         if data[i] == data[j]:
            return True
   return False

Big O Notation?

If the list has n items, the outer loop runs n times.
For each of those, the inner loop runs roughly n times.
Total comparisons are approximately:
n × n = n²

n = 100 → 10,000 comparisons
n = 1,000 → 1,000,000 comparisons
n = 10,000 → 100,000,000 comparisons

That growth is quadratic.
We describe this as O(n²) — Polynomial time complexity.

Method 2: Using a Set (Hash Table)

A set in Python:

Stores only unique values
Allows very fast lookups using a hashing algorithm (average O(1) – Constant Big O Notation to access data)

With this method we access each value of the list one by one and store them in a hash table to we keep track of values we have already seen.
If we ever see a value that is already in the set we have found a duplicate.

Algorithm (Pseudocode)

CREATE empty set called values

FOR each item in list
   IF item is in values
      RETURN True
   ADD item to values

RETURN False

Python Implementation

def has_duplicates_set(data):
   values = set()
   for item in data:
      if item in values:
         return True
      values.add(item)
   return False

Big O Notation?

With this approach, we loop through the list once (n iterations): O(n).
Set lookups and insertions are (on average) constant time: O(1).

n = 100 → 100 checks
n = 1,000 → 1,000 checks
n = 10,000 → 10,000 checks

We describe this as O(n) — linear time complexity.

Comparing Both Methods

n	Brute Force (O(n²))	Set Method (O(n))
100	10,000 checks	100 checks
1,000	1,000,000 checks	1,000 checks
10,000	100,000,000 checks	10,000 checks

The difference becomes enormous very quickly.

We can clearly see that as n increases, the second method based on a linear time complexity – O(n), will perform much better than the first approach based on a polynomial time complexity – O(n²)

Python Code

We have implemented both functions in the Python IDE below. Using the time library we are also measuring how long the computer takes to cehck for duplaicates in our randomly generated lists of 5,000 values (n=5,000) using each approach.

Try increasing the size of the list to:

10,000 values
20,000 values
50,000 values

What happens? You should notice the brute-force version slows dramatically.

Key Big O Takeaway

Nested loops over the same data usually lead to a O(n²) time complexity
Single loop with constant-time operations usually lead to a O(n) time complexity
Smarter data structures such as Sets (Hash tables) can dramatically reduce the time complexity of an algorithm

Did you like this challenge?

Click on a star to rate it!

Average rating 3.7 / 5. Vote count: 16

No votes so far! Be the first to rate this post.

Other challenges you may enjoy...

Tagged with: Big O Notation

Learning Zone ✕

Theory Concepts

Interactive Quizzes

GCSE Computer Science (UK)

A-Level Computer Science (UK)

Coding Challenges ✕

Python Challenges

HTML / CSS & JavaScript

LMC

SQL

Block Programming

Interactive Coding Puzzles

Online Interactive Tools ✕

Coding Tools

Design Tools

Logic Gates Circuits

Cryptography Corner

Member's Area ✕

Register/Sign Up

Get In Touch

Books Corner

Duplicate Detection Algorithms & Big O Notation

Duplicate Detection Challenge?

The Problem

Method 1: Using Nested Loops (Brute Force)

Algorithm (Pseudocode)

Python Implementation

Big O Notation?

Method 2: Using a Set (Hash Table)

Algorithm (Pseudocode)

Python Implementation

Big O Notation?

Comparing Both Methods

Python Code

Key Big O Takeaway

Other challenges you may enjoy...

Recent Posts

Our Latest Book