Normalization
First normal form (1NF)
A relation(or relation schema) in a given database is in first normal form
, if the domain of all attributes
of that relation is atomic. A domain is atomic if all the elements of that domain are considered to indivisible units. Suppose a relation employee
, has attribute name
, then the relation is not in first normal form
, because the elements of domain of attribute name
, can be divided into first name
and last name
.
In a nutshell, if a relation has composite attributes, then it is not in first normal form. Suppose we have the following relation:
EmpId | first name | last name | salary | position |
---|---|---|---|---|
Deptx-101 | John | smith | 12000 | intermediate |
Depty-201 | Carolyne | Williams | 18900 | manager |
The EmpId
of first row can be broken into : Deptx
(which is used to identiy department) and 101
, is a unique number assigned within the organization. Clearly, the domain of attribute EmpId
is not atomic and hence our relation is not in first normal form
.
Disadvantes faced:
- When such employee id’s are used, the department of an employee can be found by writing code that breaks up the structure of
EmpId
intoDeptx
and101
, which requires extra programming. Also information gets encoded in program rather than in database. - Suppose a particular employee has to change department, then the attribute
EmpId
, would have to be updated everywhere it is used.
We can make our relation satisfy first normal form
, by splitting it into following two relations:
Relation 1
EmpId | first name | last name | department |
---|---|---|---|
101 | John | smith | Deptx |
201 | Carolyne | Williams | Depty |
Relation 2
EmpId | salary | position |
---|---|---|
101 | 12000 | intermediate |
201 | 18900 | manager |
Now, if we have to change the department, we have to do it only once in the relation 1, also determining department is easier now.
Second Normal Form (2NF)
To normalize the database in the second form, there must not be any partial dependency of any column on primary key.
Let’s consider the following example:
id | name | dob | subject |
---|---|---|---|
1 | Mark | 1-1-1981 | Physics |
2 | Jack | 2-2-1982 | Math |
2 | Jack | 2-2-1982 | Biology |
3 | John | 3-3-1983 | Math |
This table is considered to have a composite primary key (id and subject), but the *name and dob columns only depends on the id, not the subject, so they have partial dependency on the primary key. As a result, we can see the redundancy of information in the table. To normalize the database in the second form, we must split this table into two tables like this:
Students table
id | name | dob |
---|---|---|
1 | Mark | 1-1-1981 |
2 | Jack | 2-2-1982 |
3 | John | 3-3-1983 |
Subjects table
student_id | subject |
---|---|
1 | Physics |
2 | Math |
2 | Biology |
3 | Math |
The student_id column of the Subjects table is a foreign-key that references the primary key id of the Students table.