More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
post
page
Python IDE Dashboard

Relational Databases – Crossword

crosswordAre you confident with your knowledge of key Relational Databases concepts?

You can find our more about Relational Databases on the following page.


Relational Databases ConceptsOpen Crossword Puzzle in a New Window

unlock-access

Solution...

The solution for this challenge is available to full members!
Find out how to become a member:
➤ Members' Area
Tagged with:

Object-Oriented Programming – Crossword

crosswordAre you confident with your knowledge of key Object-Oriented Programming concepts: Classes & Objects, Inheritance, Encapsulation, Polymorphism and Abstract Classes?

You can find our more about OOP concepts on the following page.


Object-Oriented Programming ConceptsOpen Crossword Puzzle in a New Window

unlock-access

Solution...

The solution for this challenge is available to full members!
Find out how to become a member:
➤ Members' Area
Tagged with: ,

Object-Oriented Programming – Terminology

Click on the picture below to check your knowledge of key Object-Oriented Programming concepts: Classes & Objects, Inheritance, Encapsulation, Polymorphism and Abstract Classes.

OOP-Dominoes

Object-Oriented Programming ConceptsOpen Domino Activity
 
Tagged with: ,

Object-Oriented Programming – Quiz

Before taking this quiz, make sure you are confident with your understanding of key Object-Oriented Programming concepts.

Take the Quiz! (open full screen)


List of Keywords


If you are stuck and need a clue, you can reveal below the list of keywords (in random order) to be used:
List of keywords
Method, Inheritance, Class, Attribute, Getter, Polymorphism (x2), Sub, Encapsulation, Private, Setter, Master, Public, Oriented, Constructor, Multiple, Abstract, Object

Tagged with: ,

Object-Oriented Programming Concepts

OOP-ProgrammingObject-Oriented Programming (OOP) is a programming approach based on objects and classes. The object-oriented paradigm allows us to organise software as a collection of objects that consist of both data/attributes and behaviours.

This programming concept appeared in the 1980s and most modern high level programming languages have object-oriented features. Python, Java, C++, Ruby, PHP, Javascript are all OOP programming languages. Sometimes OOP features have been retro-fitted to an existing procedural language: This is the case for C++ which is a fully OOP language based on the procedural language C. Nowadays most advanced pieces of software or video games are built using object-oriented programming concepts.

Object-Oriented Programming makes it easier to design and structure code because:

  • It is based on an approach where objects represent real life objects or concepts. Similarly to real life, an OOP computer model would be structured around different objects that have their own properties and behaviours and that interact with each other. For instance in a car racing game, all the cars, the buildings even the road signs would be objects with their own attributes (Physical dimensions, positions, colours, speed, etc…) and behaviours (e.g. cars can accelerate or slow down, turn, crash, etc.)
  • It facilitates the decomposition of a large system into smaller modules and help structure your code more effectively.
  • It facilitates the re-usability of code.

Key Concepts


OOP Programming is based around the following key concepts:

  • Classes & Objects
  • Inheritance
  • Encapsulation
  • Polymorphism
  • Abstract Classes

Classes & ObjectsInheritanceEncapsulationPolymorphismAbstract Classes

Classes & Objects


A class is a template or blueprint from which objects will be instantiated. It’s like the mould from which objects can be created/instantiated from. A class represents a real life concept such as a person, an animal, a car, a house or a football.

A class consists of a collection of states (a.k.a. attributes or properties) and behaviours (a.k.a. methods).

An object is an instance of a class. The attributes of an object are stored using variables for storing data and the behaviour/methods of an object are implemented as functions or procedures and can be used to perform operations on the data.

Constructor


Most classes have one method called a constructor. It is a function that is automatically called when an object is created/instantiated from a class. It is mainly used to initialise the default values for each attribute/variable of the object.

Superhero Class


Superhero Class

Superhero Class

Let’s consider a class class called Superhero.
The attributes of a superhero are:

  • name (String)
  • superpower (String)
  • strength (Integer)

Some of the methods of a superhero are:

  • init() – This is the constructor which may take some parameters used to initialise the default values of the superhero to be created.
  • walk()
  • run()
  • jump()

The diagram on the right represents the Superhero Class with its attributes and its methods. Note that we do not need to represent the constructor on this diagram as it is assumed that all classes have a constructor. (In Python the constructor of a class is the __init__() function.)

Instantiating Objects from the Superhero Class


Our video game will have 3 superheroes called Spiderman, Superman and Batman. We will hence instantiate 3 objects from our Superhero Class.

spiderman = new Superhero("Spiderman","Wrist web-shooters to shoot spider web material",10)
batman = new Superhero("Batman","Night vision",10)
superman = new Superhero("Superman","Can fly",20)

Naming Convention


When coding, the identifier of a class starts with an uppercase letter, e.g. Superhero.
The identifier of an object starts with a lowercase letter, e.g. spiderman.

Inheritance


It is possible to organise classes into master classes and sub classes by creating a relationship between classes called inheritance. This provides a way of categorising classes and provides an efficient way to organise the code of a complex program more effectively.

For instance, let’s consider an online shop that sells different types of products (items) such as DVDs, Books, and Mp3s.

Item Class

Item Class

The code of such an online shop could be based around the use of a class called Item. This class would contain all the attributes and methods that each item will need:

  • Attributes: name, description, price
  • Methods: viewFullDescription(), addToShoppingBasket(), removeFromShoppingBasket()

Sub-classes can then be defined with additional attributes and methods which are more specific. For instance:

    MP3 Class:

    • Attributes: artist, duration
    • Methods: play(), download()

    DVD Class:

    • Attributes: certificate, duration, actors
    • Methods: viewTrailer()

    Book Class:

    • Attributes: author, genre, numberOfPages
    • Methods: previewContent()

The MP3, DVD and Book sub-classes will inherit the properties and methods of the parent class (Item) as represented in green below.

    MP3 Class:

    • Attributes: name, description, price, artist, duration
    • Methods: viewFullDescription(), addToShoppingBasket(), removeFromShoppingBasket(), play(), download()

    DVD Class:

    • Attributes: name, description, price, certificate, duration, actors
    • Methods: viewFullDescription(), addToShoppingBasket(), removeFromShoppingBasket(), viewTrailer()

    Book Class:

    • Attributes: name, description, priceg, author, genre, numberOfPages
    • Methods: viewFullDescription(), addToShoppingBasket(), removeFromShoppingBasket(), previewContent()

Class Diagram


The inherited properties and methods of a sub class do not need to be duplicated in a class diagram, as it is understood that they are inherited from the parent class:
Inheritance-Class-Diagram

Terminology


The parent class is also called master class or base class.
The child class is also called sub-class or derived class.

Multiple Inheritance


It is possible for a class to inherit from two or more parent classes. In this case it will inherit the properties and methods of all the parent classes.

For example we could have a class representing hybrid cars, that would inherit the properties and method of both electric cars and petrol cars.
mutiple-inheritance-class-diagram

Encapsulation


Private vs. Public


When defining attributes and methods of a class, these can be defined as being either private or public.

A private attribute or a private method can only be accessed from methods of the class. They cannot be directly accessed from outside the class.

A public attribute or a public method can be accessed from anywhere (other methods of the class or from outside the class).

For instance, let’s consider a Car class with the following attributes and methods:
car-class

In our car racing game, we would instantiate an object called playerCar from this Car class.

playerCar = new Car("Mini","Cooper")

If all methods and properties are declared as public, then we would be able to access them from anywhere in the code. e.g.

playerCar.speed = 70
playerCar.fuelLevel = 45
playerCar.accelerate(5) # increase speed by 5mph
playerCar.drive(10) # drive 10 miles
PRINT("Your current fuel level is:"
PRINT playerCar.fuelLevel

However, it is considered good practice to “hide” some properties or methods from the “outside world” by making these private. In this case these private properties and methods cannot be accessed from outside the class.

For instance, in the example above, we may decide that both attributes speed and fuelLevel should be set at private. In this case none of the following three lines would compile and hence these would have to be removed:

playerCar.speed = 70
playerCar.fuelLevel = 45
PRINT playerCar.fuelLevel

We could still alter the content of the fuelLevel and the speed of the car in the code of the fillUp(), accelerate(), slowDown() and drive() methods of the car.

playerCar.fillUp(45) # Add 45 litres of petrol in the tank
playerCar.accelerate(50) # increase speed by 50mph
playerCar.drive(10) # drive 10 miles

Getters and Setters


Very often, properties of a class are defined as being private properties, hence cannot be accessed directly from outside the class. Additional public methods called getters and setters are defined to access/read (getters) the content of a private property and to overwrite (setters) the content of a private property.

For instance, to retrieve the fuelLevel of the player car we could have a new public method called getFuelLevel() and use it to inform the user of how much petrol they have left in their tank.

PRINT "Your current fuel level is:"
PRINT player.getFuelLevel()

We may also define a setter for this property in case we want to give an option for the player to reset their tank to a specific value or to empty it. e.g.

player.setFuelLevel(0) # Emptying the fuel tank

Note that using setters is also useful to prevent direct access to a property and implement validation checks within the setter methods.

For instance if the fuelLevel was a public property then the following code would be accepted even though a car cannot have 999 litres of petrol (999 > tankCapacity)!

playerCar.fuelLevel = 999

By making the fuelLevel property private and forcing the use of a setFuelLevel() public method (setter), we can implement in the code of the setFuelLevel() method a validatiom check that cannot be bypassed to ensure that the parameter given is a positive number, lower or equal to the tankCapacity. e.g.

FUNCTION setFuelLevel(quantity)
    IF quantity<0 THEN
        fuelLevel = 0
    ELSE IF quantity > tankCapacity THEN
        fuelLevel = tankCapacity
    ELSE
        fuelLevel = quantity
    END IF
END FUNCTION

The setter could then be used in our code as follows:

playerCar.setFuelLevel(45) # reset fuel level to 45 litres

Polymorphism


Polymorphism means “multiple forms”. In OOP these multiple forms refer to multiple forms of the same method, where the exact same method name can be used in different classes, or the same method name can be used in the same class with slightly different paramaters. There are two forms of polymorphism, over-riding and over-loading.

Over-Riding Polymorphism


Let’s consider the following class diagram, where both the ElectricCar class and the PetrolCar class inherits from the Car class.
Polymorphism-Over-Riding-Class-Diagram
In this class, both the PetrolCar and the ElectricCar classes inherit the drive() method from the parent Car class. However this method will have to be implemented differently as you do not dive an electric car the same way you drive a petrol car. The code used in both methods will be different and overwrite the code of the parent class. This is what over-ride polymorphism means: when a method from a subclass has to be overwritten to override the method inherited form the parent class.

On occasions, as this is the case in our example, the drive() method of the Car class is only defined but not implemented at all (an empty function, without code) in the Car class. It will be overwritten and properly implemented in the sub classes instead.

Over-Loading Polymorphism


Overloading Polymorphism is used when a method can be implemented more than once to accept different parameters.

For instance, let’s consider our Superhero class to which we would like to add a method to set the Date of Birth (DoB) of a superhero. We could write this same method several times to accept different parameters such as:

The method could accept a date parameter as a string in the “DD/MM/YYYY” format.

PROCEDURE setDoB(string:date) # date in DD/MM/YYYY format

e.g.

spiderman.setDob("15/08/1962")

The same method can be implemented a second time, taking three parameters as follows:

PROCEDURE setDoB(integer:day, integer:month, integer:year)

e.g.

spiderman.setDob(15,08,1962)

Having multiple implementations of the same method to accept different parameters (different number of parameters or parameter of different data types) is called over-loading polymorphism.

Abstract Classes


An abstract class is a class that will never be used to instantiate objects directly. The intention is to create sub-classes that will inherit from this abstract class and from which we will be able to instantiate objects from.

For instance, let’s revisit the online shopping system where we are selling three types of items: MP3s, DVDs and Books. The MP3 class, DVD class and Book class inherit from the parent Item class. Objects will be instantiated from the MP3 class, DVD class or Book class. We will never instantiate an object directly from the Item class. This is why the Item class is called an abstract class.

So to sum up, an abstract class:

  • Cannot be instantiated directly.
  • Can only be used through inheritance.

Object Oriented Analysis & Design


When designing a program based on procedural programming, we can use four main tools:

  • Top Down Modular Design (Decomposition)
  • Flowcharts
  • Data Dictionary
  • Test Plans

These tools are still very relevant when designing OOP programs. However additional tools can be used to identify the various classes/objects needed, their attributes and behaviours, the relationships between classes (inheritance) and how the various objects will interact with each other. One of the most widely used tool is UML, a standard used to draw a range of diagrams to visualise and design a system based on an OOP approach.

Click on the following banner to learn more about UML:
UML-Diagrams

Tagged with: ,

Symmetric vs. Asymmetric Encryption

Cryptography is the art of encoding and decoding secret messages. Cryptographic techniques have been used for thousands of years, well before the introduction of computers, and the techniques have evolved since. (e.g. See how the Caesar Cipher was used by the roman empire 2000 years ago).

More recently, with the introduction of electronics and later on computer science, it has been possible to implement more advanced encryption techniques based on complex mathematical calculations. (e.g. See Alan Turing’s work on breaking codes encrypted by the Germans using the Enigma Machine during World War 2).

Symmetric Encryption


At this stage, encryption techniques were based on symmetric encryption algorithms. With such algorithms a single secret key is needed to both encrypt and decrypt a message. The secret key is possessed by both parties involved in the communication, the sender and the receiver.

Symmetric Encryption: The same cryptographic key is used both to encrypt and decrypt messages.

Symmetric Encryption: The same cryptographic key is used both to encrypt and decrypt messages.

The following algorithms use Symmetric Encryption: RC4, AES, DES, 3DES, QUA.

Symmetric keys are usually 128 or 256 bits long. The larger the key size, the harder the key is to crack. For example, a 128-bit key has around 340,000,000,000,000,000,000,000,000,000,000,000,000 encryption code possibilities. This means that a brute force attack (trying every possible key until you find the right one) is no longer a realistic approach to crack such a key.

Asymmetric Encryption


In today’s digital world, there has been a need to develop a different approach to encryption, called asymmetric encryption. With this approach a pair of linked keys is used and consists of a public key, used to encrypt data and a private key used to decrypt data. Both keys are different (but related). The public key, is available to everyone who wishes to send a message. On the other hand, the private key is kept at a secure place by the owner of the public key.

Asymmetric Encryption: A public key is used to encrypt plaintext into ciphertext whereas a private key is used to decrypt a ciphertext.

Asymmetric Encryption: A public key is used to encrypt plaintext into ciphertext whereas a private key is used to decrypt a ciphertext.

As they involve a pair of keys, asymmetric algorithms tend to be more complex to implement (and slightly slower to execute) than symmetric algorithms. The following algorithms use Asymmetric Encryption: RSA, Diffie-Hellman, ECC, El Gamal, DSA.

Asymmetric keys are typically 1024 or 2048 bits long which leads to 21024 or 22048 encryption codes. (We did not even try to write these numbers down as they would contain several hundreds digits!)

HTTPS and the SSL Handshake


On the internet, a lot of websites are now using the HTTPS protocol. This means that the communication between the web browser and the web server is encrypted in both directions (data being uploaded, such as your credit card number when you pay online, as well as data that is downloaded: the HTML code, the images and video clips, etc. that appear on a webpage).

The HTPPS protocol uses SSL (Secure Sockets Layer), a standard security technology to create an encrypted link between a server and a client.

The following steps describe the process of creating a secure connection between a web browser and a web server. It is called the SSL handshake and uses both symmetric encryption and asymmetric encryption:
SSL-Handshake

To recap, the 5 steps of a SSL handshake are:

  1. Browser sends an https://www… request.
  2. Web Server sends a digital certificate with its asymmetric Public Key.
  3. Browser generates a symmetric session key, encrypts it using the public key and sends it to the server.
  4. Server decrypts the encrypted session key using its asymmetric private key to get the symmetric session key.
  5. Server and Browser now encrypt and decrypt all transmitted data using the symmetric session key. The communication channel is secure as only the web browser and the server know the symmetric session key. The session key is only used for that session. If the browser was to connect to the same server at another time, a new session key would be created following the 5 steps of the SSL handshake.

Symmetric vs. Asymmetric Encryption


Check our online symmetric and asymmetric encryption tools:
Symmetric EncryptionAsymmetric Encryption
Tagged with: ,

Epoch/Unix Timestamp Converter

date-timeThe Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight GMT).

This explains how date & time values are actually stored on computers: using an integer value representing the number of seconds since 01/01/1970 00:00:00 GMT. Note that a negative value would represent a date prior to 01/01/1970.

For instance, the current date & time is:

Human Readable Date to Epoch Date Converter


Your first challenge is to write a script, using a programming language of your choice, to convert a date entered in a human readable format e.g. DD/MM/YYYY HH:MM:SS as an input and output the date using the Epoch/Unix Timestamp.

To do so you will need to calculate the number of seconds elapsed between the date given and 01/01/1970 00:00:00.

To simplify your calculations you can make the following assumptions:

  • There are 365.25 days in a year.
  • There are 30.44 days in a month (on average).
  • There are 24 hours in a day.
  • There are 60 minutes in an hour.
  • There are 60 seconds in a minute.

You can test you script with today’s date and compare the output of your program with the Epoch date that appears at the top of this blog post.

Epoch Date to Human Readable Date Converter


Your second challenge is to work out and output an actual date in the DD/MM/YYYY HH:MM:SS format from an Epoch timestamp.

Testing


Here are some key dates for you to test your scripts:

Epoch/Unix Timestamp Date Event
410270400 1 January 1983 ARPANET adopted TCP/IP on January 1, 1983, and from there researchers began to assemble the “network of networks” that became the modern Internet. The online world then took on a more recognizable form in 1990, when computer scientist Tim Berners-Lee invented the World Wide Web.
479736000 15 March 1985 The first domain name registered was Symbolics.com. It was registered March 15, 1985, to Symbolics Inc., a computer systems company in Cambridge, Massachussetts.
773409600 05 July 1994 Amazon was founded in the garage of Bezos’ rented home in Bellevue, Washington. In July 1995, the company began service as an online bookstore.
904910400 4 September 1998 The search engine Google was invented by computer scientists Larry Page and Sergey Brin. It was named after a googol (the name for the number 1 followed by 100 zeros) found in the book “Mathematics and the Imagination” by Edward Kasner and James Newman.
1075896000 4 February 2004 The social network Facebook was launched on February 4, 2004, by Mark Zuckerberg, along with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin Moskovitz and Chris Hughes

Using the time library


Note that, in Python, the time library already includes a few functions to manipulate Unix Epoch timestamps. You can for example use this library to convert a Unix Epoch Timestamp into a human readable format, using the following code:

unlock-access

Solution...

The solution for this challenge is available to full members!
Find out how to become a member:
➤ Members' Area

Hashing Algorithms for Integrity Validation

The enigma machine was used during World War II to encrypt secret messages.

The enigma machine was used during World War II to encrypt secret messages.

Imagine working for the British Secret Services during World War 2 or during the cold war. As part of your role, you would be expected to exchange secret messages with your allies.

Your messages would most likely be encrypted using various encryption techniques.

The issue is that, occasionally, you may receive messages but may not be 100% sure that these messages are genuine messages or not. It could be that your enemies are sending messages pretending to be one of your allies. They may use the same encryption techniques to lure you. Also, your enemies may have intercepted some of your messages and altered them to confuse you.

This is the reason why secret services had to come with a solution when sending and receiving messages to validate the integrity of a message: So, when a message is received, the recipient should be 100% confident that:

  • the message has been issued by the right person,
  • the message has not been tampered with before reaching its destination.

In the 1950’s the first hashing algorithms used to validate the integrity of a message were introduced. They were using the idea of using a complex mathematical calculations on the content of a message (the key) to generate a hash called a checksum that would be appended at the end of the message to be sent.

hashing-algorithm-checksum

Let’s consider the following secret message:

secret-message-sunday

Let’s consider a very basic hashing algorithm that takes this message as an input and returns the hash as being the number of characters of this message.

checksum = hash(message) = LENGTH(message)

When applying our hashing algorithm to our secret message we get a checksum of 23.

hashing-algorithm-sunday

We will now append this checksum at the end of the message before sending it. Our new message is now:

hashing-algorithm-sunday-23

The recipient of this message will run the same hashing algorithm with the content of the message. They will then compare the resulting checksum with the checksum that has been received. If the two checksums are equal, they can be fairly confident that the message is genuine.

If an enemy intercepts the message and tries to alter it, they will not know how the checksum was calculated. Hence they may change the content of the message but will not be able to recalculate the right checksum. (Provided that your hashing algorithm is not as obvious as calculating the number of characters in the message: This would be a very easy algorithm to guess and recreate).

A message which has been tampered with by an enemy may look like this:

A message which has been tampered with.

A message which has been tampered with.

hashing-algorithm-23-24When the recipient receives this message, they will apply the same hashing algorithm and get a checksum of 24. When comparing this with the checksum of the received message they will realise that both checksums do not match and hence will be able to identify that the message is invalid: It has either been produced by someone who does not know the hashing algorithm in use, or it has been tampered with before reaching its recipient. In both case the recipient will have to discard this message as it is not reliable.

Hashing Algorithms for Integrity Validation


To summarise what we have learned so far, the idea of a hashing algorithm used for integrity validation is to provide some assurance that a transferred message or file has arrived intact, that it has not been altered on its way to the recipient.

Note that alterations can be caused intentionally by a third party (e.g. a hacker) or can be the consequence of an unintentional “glitch” in the communication. (e.g. poor quality of communication link such as wifi interference, collisons of data packets on a TCP/IP network, human or sensor error when inputting a message or scanning a barcode, etc.)

Nowadays hashing algorithms used for integrity validation are widely used in a range of contexts such as:

  • Barcodes and ISBN book numbers use a similar approach called check digit.
  • The CSV number of a credit card is also a form of checksum used to validate credit cards.
  • The TCP/IP protocols (HTTP, FTP, SMTP, etc.) all use a checksum on all data packets being sent over the Internet, to ensure that the recipient can validate the integrity of the data packets being received.
  • Digital certificates and software licences also use a checksum to minimise the risk of fraudulent digital certificates and software license keys.
  • exe files of popular software include a checksum. This is to prevent malicious websites trying to get you to download software where the content of the exe file has been altered to add a virus or a trojan horse.

The hash of a hashing algorithm used for integrity validation is often called a checksum and is appended at the end of the data to be transferred.

Sometimes the hash is called a check digit if it only consists of one digit. This is the case for barcodes, ISBN numbers and credit card numbers where the last digit of the code is a check digit, the result of a complex calculation using all the other digits of the code.

You can check the following links to investigate the use of check digits on a barcode or on a credit card (using the Luhn Algorithm to validate a credit card number).

Hashing Algorithms for Storing Sensitive Data

passwordsMore and more online systems such as e-commerce websites, online banking apps, or social networks need to access some sensitive data about you including your password, your credit card details and more recently, some biometric data (e.g. for fingerprint authentication process).

The servers that store your personal data are at risk of being hacked and though the organisations who run these online services invest a lot of money in ensuring their servers are secure (firewalls, encryption of data, etc.) there is always a risk of a hacker accessing the data stored on their servers. This is why organisations prefer not to store very sensitive data directly (passwords, credit card numbers and biometric data). Instead these organisation use complex hashing algorithms to store hash values of your most sensitive data.

Hashing Algorithm


A hashing algorithm is a complex mathematical calculation that takes an input called the key (e.g. your credit card number or your password) to generate a hash value. The hash value often consists of a string of characters of a fixed size (e.g. 32 alpha-numeric characters).
hashing-algorithm
The hashing algorithm will always produce the same hash for a given key. The hash value will be fairly unique (different for each key, though on rare occasions two different keys can produce the same hash, this is called a collision).

When a user enter their password, for instance to login, the same hashing algorithm is used to recreate the hash value. This hash value can then be compared to the hash stored on the server for this user. If they are the same we can assume that the password entered by the user is correct.
hashing-algorithm-password

This works exactly the same with credit card numbers. When asked to pay online, the user enters their credit card number. The hashing algorithm is applied to the key (credit card number) to produce a hash value which can be compared with the hash value stored on the system for that user. If the two hash values are identical, we can assume that the credit card number entered is correct.

one-way-road-sign

A one-way process


Hash functions are one-way functions which means that with the key you can calculate the hash value, however with a hash value you cannot determine the key.

This is an essential characteristic of hashing algorithms which ensures that even if a hacker manages to access the organisation’s database, they will not be able to easily work out your password or credit card numbers from the hash values stored on the database.

Popular Hashing Algorithms


If you want to find out more about some of the most popular hashing algorithms you can investigate the following hash functions:

Hashing Algorithms for Memory Addressing

In this blog post, we will investigate the use of hashing algorithms to quickly locate a record in a large database.

Let’s consider the database of members of a social network such as Instagram, Twitter or Facebook.

Every time the user uses the website or their smartphone app to access the social network, the server retrieves the user’s login name and password. It then has to find the record matching the given username to access their data and verify their password.

login-process

Large social networks have millions of members. Locating a single record in a table that contains millions of records can be time consuming especially if you perform a linear search to locate the record. (Linear Big O Notation). Alternative algorithms such as a binary search can speed up the process (Logarithmic Big O Notation), however a binary search only works when the data is sorted in alphabetical order. This is unlikely to be the case for a social network members table as members will have joined the network at different dates, hence the data will not be sorted in alphabetical order of their usernames. The assumption is that the records are stored in chronological order (based on the date they first signed in).

Using a Hashing Algorithm


To solve this problem, social networks use hashing algorithms when storing and accessing data in a large database.

A hashing algorithm is a complex mathematical calculation that takes an input (a.k.a. the key) (in this case the username of the member) and returns a value called a hash value or hash. When used for memory addressing the hash value generated is the memory location of where the record is stored.
hashing-algorithm-for-memory-addressing

Let’s consider a very basic hashing algorithm used to identify the memory location of a new member who has just signed up.

Our hashing algorithm will add up all the ASCII values of each character of their username. We will assume our database will contain around 50 memory locations. So our hashing algorithm will use the remainder of dividing the total ASCII value by 50 to get a unique number between 0 and 50.

For instance, for a user called James Bond whose username is “Bond”, the resulting hash value would be:

hashKey("Bond") =  (ASCII("B") + ASCII("o") + ASCII ("n") + ASCII("d")) MOD 50
                =  (  66       +     111    +     110     +     100   ) MOD 50
                =  387 MOD 50
                =  37

The details can be stored on location 37 of the members table (hash table) provided below.
hashing-algorithm-for-memory-addressing-example

Every time the user James Bond accesses the social network, he will provide his username. As part of the login process, the server will use the hashing algorithm on this username to quickly calculate the memory location of where James Bond’s record is stored. This is a very efficient approach to access the data. (Constant Big O notation).

collision

Collisions


The hash values generated by a hashing algorithm should be fairly unique. However there will be occasions where two input values return the same hash value. For instance, let’s consider what would happen if a new user, Austin Powers signs in and their username is aPowers:

hashkey("aPowers") =  (ASCII("a") + ASCII("P") + ASCII ("o") + ASCII("w") + ASCII("e") + ASCII("r") + ASCII("s")) MOD 50
                   =  (  97       +     80     +     111     +     119    +    101     +     114    +   115     ) MOD 50
                   =  737 MOD 50
                   =  37

This hash is the same as the hash generated for James Bond’s username. This create a collision. In this case, instead of overwritting the content of memory location 37 (already used by James Bond) the algorithm will jump to location 37 and carry on a linear search from there to find the next empty memory location. This could be memory location 38 if it’s empty, 39 if not, and so on till an empty location is found.

This will slightly slow down the process when trying to locate Austin Powers’ record but would still be far more efficient than without using a hashing algorithm.

Complex hashing algorithms are designed the minimize the risk of collisions: The effectiveness of a hashing algorithm is based on the total number of unique values the algorithm will generate to minimize the risks of collisions.

Hashing Algorithm in Action!


We have implemented our basic hashing algorithm as described above.

Your task is to use it to generate the hash values for these new members of our social network.
Using these hash values, you will then be able to add their information on the members hash table (see “Members Table” tab below). You will notice that a few members have already been stored in this table.

Warning: On occasion the generated hash may generate a collision with an existing member. In this case you you will have to use the next available location to store the members detail in the members table.

The new members to add are:

  • Jason Bourne, username: jBourne
  • Johny English, username: English
  • Lara Croft, username: lCroft

Also, a few users are logging in. Can you use the hashing algorithm to quickly locate their records in the members hash table and access their details.

The users are:

  • jBauer
  • eHunt
  • Holmes
Hashing AlgorithmMembers Table / Hash Table

Memory Location Username Firstname Lastname
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50