Lets say you just started a restaurant business and want to keep a record of your customers and their order history. Not having any knowledge about database modeling, you decide to… wing it. Lets see how that goes.

Act 1 - Wing It and Pray

Your first customer, Peppa, walks in to your restaurant and orders beans and eggs. Lets add that to our table

Order Number customer name Contact item 1 item2 Total
001 Peppa 123-456 beans eggs $15

A few minutes later, as you are busy looking at this awesome table you just created, you hear someone knocking at your desk. It’s Peppa again - she forgot the milk. “No worries”, you say. You give her milk and add another entry in you table.

Order Number customer name Contact item 1 item2 Total
001 Peppa 123-456 beans eggs $15
002 Peppa 123-456 milk    

Wait… why is the total box empty? You forgot to take the payment. No worries because you have her number. You can just call her, and that’s what you do, only to find out that you made a mistake while writing the phone number. Your net worth just decreased by 2 dollars. So you wait and pray she comes back. Luckily, she does. You take your 2 dollars, and this time, you make sure you write her phone number correctly. You will have to make corrections at two places now because she has two orders. Also, you are short on paper and don’t want to waste any space by repeating phone number again and again.

Problem # 1: Redundancy repeating groups of data waste space and making changes to those groups becomes a headache

One solution is to make a separate tables for customers and orders.

customer name Contact
Peppa 123-456
Order Number customer name item 1 item2 Total
001 Peppa beans eggs $15
002 Peppa milk   $2

Separate tables avoid redundancy We can create a separate table for groups of data that are repeated (name , contact)

Great. But wait! George just walked in and ordered 3 items. We will need to add another column to our table. We can not know in advance how many items any customer will order. So how many columns should we add to our table?

  • If we have too many columns, most of the cells will be empty and a lot of space will be wasted.
  • if we have too few, then we might not have any space to write all the items in a order and we will loose information.

We definitely don’t want to put a limit on the maximum number of items a customer can order. We want to solve the problem, not change the problem to fit into our old solution.

Problem # 2: How many columns to add if something is related to more than one items of a single column?

An obvious solutions is to just create one column for items and fit all the items inside it.

Order Number customer name items Total
001 Peppa Beans, Eggs $15
002 Peppa Milk $2
003 George Beans, Eggs, Milk $17

This seems fine on paper, but it’ll cause some problems later when we computerize our data. I’ll explain that when we get to the implementation part. So, if that’s not an option, what if we just keep one column and add a separate entry for each item in the order?

Order Number customer name item Total
001 Peppa beans $15
001 Peppa eggs $15
002 Peppa milk $2
003 George beans $17

Now we are repeating ( 001 | Peppa | $15 ) ! Remember what we discovered earlier about redundancy? Lets create a third table for order items

Order Number item
001 beans
001 eggs
002 milk
003 beans
003 eggs
003 milk
Order Number customer name Total
001 Peppa $15
002 Peppa $2
003 George $17

When an item is related to more than one items (an order can have multiple items ), creating a separate table for it avoids redundancy (if we can’t put multiple values in one box)

At this point we have 3 Tables: Customers, Orders, Order-Items. And a brain that is exploding.

Act 2 - Mathematics to the rescue

Wouldn’t it be great if we could formalize what we learnt above into a list of rules to follow that will ensure that we don’t encounter any of the above problems. Turns out there are some rules called Normal Forms that capture all the lessons we learnt and more. Applying those rules is called normalization. The normalization process aims to minimize data duplications, avoid errors during data modifications, and simplify data queries from the database. I’ll write about that in part 2.