Hashing Algorithms for Integrity Validation

Posted on October 18, 2018 by Administrator Posted in A Level Concepts, Algorithms, Computer Science, Computing Concepts, Cryptography

The enigma machine was used during World War II to encrypt secret messages.

Imagine working for the British Secret Services during World War 2 or during the cold war. As part of your role, you would be expected to exchange secret messages with your allies.

Your messages would most likely be encrypted using various encryption techniques.

The issue is that, occasionally, you may receive messages but may not be 100% sure that these messages are genuine messages or not. It could be that your enemies are sending messages pretending to be one of your allies. They may use the same encryption techniques to lure you. Also, your enemies may have intercepted some of your messages and altered them to confuse you.

This is the reason why secret services had to come with a solution when sending and receiving messages to validate the integrity of a message: So, when a message is received, the recipient should be 100% confident that:

the message has been issued by the right person,
the message has not been tampered with before reaching its destination.

In the 1950’s the first hashing algorithms used to validate the integrity of a message were introduced. They were using the idea of using a complex mathematical calculations on the content of a message (the key) to generate a hash called a checksum that would be appended at the end of the message to be sent.

Let’s consider the following secret message:

Let’s consider a very basic hashing algorithm that takes this message as an input and returns the hash as being the number of characters of this message.

checksum = hash(message) = LENGTH(message)

When applying our hashing algorithm to our secret message we get a checksum of 23.

We will now append this checksum at the end of the message before sending it. Our new message is now:

The recipient of this message will run the same hashing algorithm with the content of the message. They will then compare the resulting checksum with the checksum that has been received. If the two checksums are equal, they can be fairly confident that the message is genuine.

If an enemy intercepts the message and tries to alter it, they will not know how the checksum was calculated. Hence they may change the content of the message but will not be able to recalculate the right checksum. (Provided that your hashing algorithm is not as obvious as calculating the number of characters in the message: This would be a very easy algorithm to guess and recreate).

A message which has been tampered with by an enemy may look like this:

A message which has been tampered with.

When the recipient receives this message, they will apply the same hashing algorithm and get a checksum of 24. When comparing this with the checksum of the received message they will realise that both checksums do not match and hence will be able to identify that the message is invalid: It has either been produced by someone who does not know the hashing algorithm in use, or it has been tampered with before reaching its recipient. In both case the recipient will have to discard this message as it is not reliable.

Hashing Algorithms for Integrity Validation

To summarise what we have learned so far, the idea of a hashing algorithm used for integrity validation is to provide some assurance that a transferred message or file has arrived intact, that it has not been altered on its way to the recipient.

Note that alterations can be caused intentionally by a third party (e.g. a hacker) or can be the consequence of an unintentional “glitch” in the communication. (e.g. poor quality of communication link such as wifi interference, collisons of data packets on a TCP/IP network, human or sensor error when inputting a message or scanning a barcode, etc.)

Nowadays hashing algorithms used for integrity validation are widely used in a range of contexts such as:

Barcodes and ISBN book numbers use a similar approach called check digit.
The CSV number of a credit card is also a form of checksum used to validate credit cards.
The TCP/IP protocols (HTTP, FTP, SMTP, etc.) all use a checksum on all data packets being sent over the Internet, to ensure that the recipient can validate the integrity of the data packets being received.
Digital certificates and software licences also use a checksum to minimise the risk of fraudulent digital certificates and software license keys.
exe files of popular software include a checksum. This is to prevent malicious websites trying to get you to download software where the content of the exe file has been altered to add a virus or a trojan horse.

The hash of a hashing algorithm used for integrity validation is often called a checksum and is appended at the end of the data to be transferred.

Sometimes the hash is called a check digit if it only consists of one digit. This is the case for barcodes, ISBN numbers and credit card numbers where the last digit of the code is a check digit, the result of a complex calculation using all the other digits of the code.

You can check the following links to investigate the use of check digits on a barcode or on a credit card (using the Luhn Algorithm to validate a credit card number).