knowledge-kitchen

Database Design - Normalization of Relational Databases

Database Design

  1. Overview
  2. First Normal Form
  3. Second Normal Form
  4. Third Normal Form
  5. Fourth Normal Form
  6. Conclusions

Overview

Introduction

What are the normal forms in relational database design theory?

Redundancy

One of the goals of normalization is to avoid data redundancy - the repetition of facts in multiple places within the data.

An example table to hold students’ grades’, with lots of redundancy.

id name email assessment_title grade
1 John I. Rivero JohnIRivero@jourrapide.com Quiz #1 95
2 John I. Rivero JohnIRivero@jourrapide.com Quiz #2 78
3 John I. Rivero JohnIRivero@jourrapide.com Midterm Exam 82
4 Susan C. Anthony AnthonySusan@bmoney.com Quiz #2 64

Anomalies

Another goal is to avoid anomalies, which come in three types.

Insertion anomalies

Insertion anomalies occur when we are not able to insert certain attributes in the database without the presence of other attributes.

id name email assessment_title grade
1 John I. Rivero JohnIRivero@jourrapide.com Quiz #1 95
2 John I. Rivero JohnIRivero@jourrapide.com Quiz #2 78
3 John I. Rivero JohnIRivero@jourrapide.com Midterm Exam 82
4 Susan C. Anthony AnthonySusan@bmoney.com Quiz #2 64

Update anomalies

Update anomalies occur when a correct update of a record requires other actions, such as addition, deletion or both, in order to retain data integrity.

id name email assessment_title grade
1 John I. Rivero JohnIRivero@jourrapide.com Quiz #1 95
2 John I. Rivero JohnIRivero@jourrapide.com Quiz #2 78
3 John I. Rivero JohnIRivero@jourrapide.com Midterm Exam 82
4 Susan C. Anthony AnthonySusan@bmoney.com Quiz #2 64

Deletion anomalies

Deletion anomalies occur when you delete a record, but because of the design of the tables, you accidentally delete information you shouldn’t have.

id name email assessment_title grade
4 Susan C. Anthony AnthonySusan@bmoney.com Quiz #2 64
5 Mary G. Dickinson MaryGDickinson@jourrapide.com Quiz #1 95
6 JSandra B. Kile SandraBKile@teleworm.us Quiz #2 78
7 Helen C. Gonzalez HelenCGonzalez@teleworm.us Quiz #1 82

First Normal Form

Introduction

First normal form deals with the shape of a record type.

Second Normal Form

Introduction

Second and third normal forms both deal with the relationship between non-key and key fields.

Applicability

Second normal form only applies to tables whose primary key is composed of two or more fields.

CREATE TABLE (
    part TEXT NOT NULL
    warehouse TEXT NOT NULL
    quantity INTEGER
    warehouse-address TEXT
    PRIMARY KEY (part, warehouse)
)

Example

Take, for example, the following table showing parts inventories in various warehouses:

part warehouse quantity warehouse-address
Baby Bed Crib Screws Hardware Replacement Kit, cSeao 25-Set Avenel, NJ 2441 275 Omar Ave, Avenel, NJ 07001
Prime-Line N 7534 Bi-Fold Door Hardware Repair Kit Florence, NJ 1121 309 Cedar Ln, Florence, NJ 08518
HIMIKI Tailgate Hardware Rebuild Kit w/Handle Bezel Latch Cable Avenel, NJ 3567 275 Omar Ave, Avenel, NJ 07001

The composite primary key is composed of part and warehouse, meaning that the combination of those two fields is guaranteed to be unique for each record.

Problems

This example does not meet the requirements of second normal form.

Solutions

In order to normalize these records, we could easily split the data into two tables.

One table for parts:

part warehouse_id quantity
Baby Bed Crib Screws Hardware Replacement Kit, cSeao 25-Set 1 2441
Prime-Line N 7534 Bi-Fold Door Hardware Repair Kit 2 1121
HIMIKI Tailgate Hardware Rebuild Kit w/Handle Bezel Latch Cable 1 3567

And another for warehouses:

id address
1 275 Omar Ave, Avenel, NJ 07001
2 309 Cedar Ln, Florence, NJ 08518

Third Normal Form

Introduction

As with second normal form, third normal forms deals with the relationship between non-key and key fields.

Example

Take, for example, a table about employees at a company.

id employee department location
1 Henry K. Brinkman Accounting Fort Myers, FL
2 Darlene R. Gonzalez Marketing Jackson, MS
3 Abigail W. Wagner Sales Pleasantville, NJ
4 Norella T. Walker Accounting Fort Myers, FL

Problems

This example does not meet the requirements of third normal form.

Solution

As with violations of second normal form, the solution to a violation of third normal form is typically to split the data into multiple tables.

One table for employees:

id employee department_id
1 Henry K. Brinkman 1
2 Darlene R. Gonzalez 2
3 Abigail W. Wagner 3
4 Norella T. Walker 1

And another for departments:

id department location
1 Accounting Fort Myers, FL
2 Marketing Jackson, MS
3 Sales Pleasantville, NJ

Fourth Normal Form

Introduction

Fourth normal form is concerned with multi-valued facts, which we’ll show by example. In order for a record to meet fourth normal form, it must:

Example

For example, consider a situation where we intend to store employee’s skills and foreign language abilities.

A single employee who has multiple skills and/or multiple languages (two independent multi-valued facts about them) might [erroneously] be represented with two or more independent multi-valued fact fields.

id employee skill language
1 Henry K. Brinkman cook  
2 Henry K. Brinkman type  
3 Henry K. Brinkman   French
4 Henry K. Brinkman   German
5 Henry K. Brinkman   Greek
6 Norella T. Walker type  

Another Possibility

That same data, with two or more multi-value fact fields, might be represented a few different ways, including:

id employee skill language
1 Henry K. Brinkman cook French
2 Henry K. Brinkman type German
3 Henry K. Brinkman type Greek
4 Norella T. Walker type None

Solution

The solution to remove redundancy, ambiguity, and anomalies, as with previous normal forms, is to split the data up into multiple tables.

One for employees:

id employee
1 Henry K. Brinkman
2 Darlene R. Gonzalez
3 Abigail W. Wagner
4 Norella T. Walker

Solution (continued)

The solution, as with previous normal forms, is to split the data up into multiple tables.

Another for employees_skills:

employee_id skill
1 cook
1 type
2 type
4 type

Solution (continued again)

The solution, as with previous normal forms, is to split the data up into multiple tables.

And a third for employees_languages:

employee_id language
1 French
1 German
1 Greek

Conclusions

Thank you. Bye.