Winging Relational Database
Lets say you just started a restaurant business and want to keep a record of your customers and their order history. Not having any knowledge about database modeling, you decide to… wing it. Lets see how that goes.
Act 1 - Wing It and Pray
Your first customer, Peppa, walks in to your restaurant and orders beans and eggs. Lets add that to our table
Order Number | customer name | Contact | item 1 | item2 | Total |
---|---|---|---|---|---|
001 | Peppa | 123-456 | beans | eggs | $15 |
A few minutes later, as you are busy looking at this awesome table you just created, you hear someone knocking at your desk. It’s Peppa again - she forgot the milk. “No worries”, you say. You give her milk and add another entry in you table.
Order Number | customer name | Contact | item 1 | item2 | Total |
---|---|---|---|---|---|
001 | Peppa | 123-456 | beans | eggs | $15 |
002 | Peppa | 123-456 | milk |
Wait… why is the total box empty? You forgot to take the payment. No worries because you have her number. You can just call her, and that’s what you do, only to find out that you made a mistake while writing the phone number. Your net worth just decreased by 2 dollars. So you wait and pray she comes back. Luckily, she does. You take your 2 dollars, and this time, you make sure you write her phone number correctly. You will have to make corrections at two places now because she has two orders. Also, you are short on paper and don’t want to waste any space by repeating phone number again and again.
Problem # 1: Redundancy repeating groups of data waste space and making changes to those groups becomes a headache
One solution is to make a separate tables for customers and orders.
customer name | Contact |
---|---|
Peppa | 123-456 |
Order Number | customer name | item 1 | item2 | Total |
---|---|---|---|---|
001 | Peppa | beans | eggs | $15 |
002 | Peppa | milk | $2 |
Separate tables avoid redundancy We can create a separate table for groups of data that are repeated (name , contact)
Great. But wait! George just walked in and ordered 3 items. We will need to add another column to our table. We can not know in advance how many items any customer will order. So how many columns should we add to our table?
- If we have too many columns, most of the cells will be empty and a lot of space will be wasted.
- if we have too few, then we might not have any space to write all the items in a order and we will loose information.
We definitely don’t want to put a limit on the maximum number of items a customer can order. We want to solve the problem, not change the problem to fit into our old solution.
Problem # 2: How many columns to add if something is related to more than one items of a single column?
An obvious solutions is to just create one column for items and fit all the items inside it.
Order Number | customer name | items | Total |
---|---|---|---|
001 | Peppa | Beans, Eggs | $15 |
002 | Peppa | Milk | $2 |
003 | George | Beans, Eggs, Milk | $17 |
This seems fine on paper, but it’ll cause some problems later when we computerize our data. I’ll explain that when we get to the implementation part. So, if that’s not an option, what if we just keep one column and add a separate entry for each item in the order?
Order Number | customer name | item | Total |
---|---|---|---|
001 | Peppa | beans | $15 |
001 | Peppa | eggs | $15 |
002 | Peppa | milk | $2 |
003 | George | beans | $17 |
… | … | … | … |
Now we are repeating ( 001 | Peppa | $15 ) ! Remember what we discovered earlier about redundancy? Lets create a third table for order items
Order Number | item |
---|---|
001 | beans |
001 | eggs |
002 | milk |
003 | beans |
003 | eggs |
003 | milk |
Order Number | customer name | Total |
---|---|---|
001 | Peppa | $15 |
002 | Peppa | $2 |
003 | George | $17 |
When an item is related to more than one items (an order can have multiple items ), creating a separate table for it avoids redundancy (if we can’t put multiple values in one box)
At this point we have 3 Tables: Customers, Orders, Order-Items. And a brain that is exploding.
Act 2 - Mathematics to the rescue
Wouldn’t it be great if we could formalize what we learnt above into a list of rules to follow that will ensure that we don’t encounter any of the above problems. Turns out there are some rules called Normal Forms that capture all the lessons we learnt and more. Applying those rules is called normalization. The normalization process aims to minimize data duplications, avoid errors during data modifications, and simplify data queries from the database. I’ll write about that in part 2.