Winging Relational Database

Lets say you just started a restaurant business and want to keep a record of your customers and their order history. Not having any knowledge about database modeling, you decide to… wing it. Lets see how that goes.

Act 1 - Wing It and Pray

Your first customer, Peppa, walks in to your restaurant and orders beans and eggs. Lets add that to our table

Order Number	customer name	Contact	item 1	item2	Total
001	Peppa	123-456	beans	eggs	$15

A few minutes later, as you are busy looking at this awesome table you just created, you hear someone knocking at your desk. It’s Peppa again - she forgot the milk. “No worries”, you say. You give her milk and add another entry in you table.

Order Number	customer name	Contact	item 1	item2	Total
001	Peppa	123-456	beans	eggs	$15
002	Peppa	123-456	milk

Wait… why is the total box empty? You forgot to take the payment. No worries because you have her number. You can just call her, and that’s what you do, only to find out that you made a mistake while writing the phone number. Your net worth just decreased by 2 dollars. So you wait and pray she comes back. Luckily, she does. You take your 2 dollars, and this time, you make sure you write her phone number correctly. You will have to make corrections at two places now because she has two orders. Also, you are short on paper and don’t want to waste any space by repeating phone number again and again.

Problem # 1: Redundancy repeating groups of data waste space and making changes to those groups becomes a headache

One solution is to make a separate tables for customers and orders.

customer name	Contact
Peppa	123-456

Order Number	customer name	item 1	item2	Total
001	Peppa	beans	eggs	$15
002	Peppa	milk		$2

Separate tables avoid redundancy We can create a separate table for groups of data that are repeated (name , contact)

Great. But wait! George just walked in and ordered 3 items. We will need to add another column to our table. We can not know in advance how many items any customer will order. So how many columns should we add to our table?

If we have too many columns, most of the cells will be empty and a lot of space will be wasted.
if we have too few, then we might not have any space to write all the items in a order and we will loose information.

We definitely don’t want to put a limit on the maximum number of items a customer can order. We want to solve the problem, not change the problem to fit into our old solution.

Problem # 2: How many columns to add if something is related to more than one items of a single column?

An obvious solutions is to just create one column for items and fit all the items inside it.

Order Number	customer name	items	Total
001	Peppa	Beans, Eggs	$15
002	Peppa	Milk	$2
003	George	Beans, Eggs, Milk	$17

This seems fine on paper, but it’ll cause some problems later when we computerize our data. I’ll explain that when we get to the implementation part. So, if that’s not an option, what if we just keep one column and add a separate entry for each item in the order?

Order Number	customer name	item	Total
001	Peppa	beans	$15
001	Peppa	eggs	$15
002	Peppa	milk	$2
003	George	beans	$17
…	…	…	…

Now we are repeating ( 001 | Peppa | $15 ) ! Remember what we discovered earlier about redundancy? Lets create a third table for order items

Order Number	item
001	beans
001	eggs
002	milk
003	beans
003	eggs
003	milk

Order Number	customer name	Total
001	Peppa	$15
002	Peppa	$2
003	George	$17

When an item is related to more than one items (an order can have multiple items ), creating a separate table for it avoids redundancy (if we can’t put multiple values in one box)

At this point we have 3 Tables: Customers, Orders, Order-Items. And a brain that is exploding.

Act 2 - Mathematics to the rescue

Wouldn’t it be great if we could formalize what we learnt above into a list of rules to follow that will ensure that we don’t encounter any of the above problems. Turns out there are some rules called Normal Forms that capture all the lessons we learnt and more. Applying those rules is called normalization. The normalization process aims to minimize data duplications, avoid errors during data modifications, and simplify data queries from the database. I’ll write about that in part 2.