What is Normalization?
Normalization is the process of eliminating redundancy in a database by organizing the data into several smaller tables. The large tables are normalized (converted) into smaller tables by using normalization rules, so that a database administrator can easily handle them. It also eliminates insertion, deletion, and update anomalies.
Importance of normalization in DBMS
Data redundancy, update anomaly, insertion anomaly, and deletion anomaly occur when a database is not organized properly. Eventually, it gets difficult to manage the database. The database also consumes a lot of memory. To overcome this challenge, the database is normalized.
Normalization rules in DBMS
The following normal forms are applied to eliminate redundancy in the database tables:
- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
- Boyce & Codd normal form (BCNF)
Studies regarding normalization methods in SQL are still going on, so further normal rules are being developed. However, in real-world applications, most of the results are achieved after applying 3NF.
DBMS normal forms with examples
Below is a detailed explanation for all the normalization forms with an example.
First normal form
The first normal form is considered the first step of the data normalization process. The goal is to achieve a table that can be extended easily. Retrieving data from such tables is also easier.
For a table to be in the first normal form, it should satisfy the following rules:
- The table should have atomic (single) valued attributes (columns).
- The domain of the data/ value stored in a column should not be changed.
- The columns in the database table should have unique names.
- The order of the data stored in the table does not matter.
Example: Consider the course details table below:
Course Name | Course Contents |
Programming | Java, C++, Python |
Management | Finance, Production |
Here, the course content column contains multiple values and therefore, it is not in 1NF. Below is the normalized version of the table that satisfies the 1NF rule and has no one-to-many relationship.
Course Name | Course Content |
Programming | Java |
Programming | C++ |
Programming | Python |
Management | Finance |
Management | Production |
Second normal form
For a table to be in 2NF, it has to satisfy the following rules:
- The database table must be in the first normal form.
- There should not be any partial dependency.
A table that is in 2NF has no partial dependency.
Partial dependency: When a non-prime attribute is functionally dependent on a part of the candidate key and not the entire key, it is called partial dependency.
Partial dependency can be removed by removing the attribute that causes partial dependency and transferring it to another table.
Example: Assume the database of a school that stores the data of teachers and their subjects. As you know, teachers can teach more than one subject at a school.
Teacher_ID | Subject | Teacher_Age |
15 | Chemistry | 32 |
15 | Biology | 32 |
47 | English | 35 |
55 | Math | 33 |
55 | Computer | 33 |
Here, the non-prime attribute Teacher_Age is dependent on Teacher_ID. Further, Teacher_ID is the proper subset of a candidate key. This indicates that the table is not in 2NF. To convert it into 2NF, decompose it into two tables:
Table 1: Teacher_Detail table
Teacher_ID | Teacher_Age |
15 | 32 |
47 | 35 |
55 | 33 |
Table 2: Teacher_Subject table
Teacher_ID | Subject |
15 | Chemistry |
15 | Biology |
47 | English |
55 | Math |
55 | Computer |
Third normal form
The third normal form reduces data duplication and also helps to achieve data integrity. A table is in 3NF if it satisfies the following conditions:
- The table is in 2NF.
- The table does not contain any transitive dependency.
For every non-trivial functional dependency A → B to be in 3NF, it should fulfill any of these conditions.
- A is a super key.
- B is a prime attribute.
If there are two functional dependencies A →B and B → C, then A →C is known as a transitive dependency.
Example: Assume that a company wants to create a database to store the employees' addresses. So, they create a table called Employee_info, as shown below.
Emp_ID | Emp_Name | Emp_Zipcode | Emp_State | Emp_City |
111 | John | 25025 | UP | New York |
112 | George | 12542 | MP | Barcelona |
113 | Mary | 26555 | MP | Norway |
114 | Lisa | 26555 | AP | Illinois |
115 | Adam | 58622 | JK | Ampton |
Here the super key is: {Emp_ID}, {Emp_ID, Emp_Name}, {Emp_ID, Emp_Name, Emp_Zipcode}, {Emp_ID, Emp_Name, Emp_Zipcode, Emp_State}, etc.
Candidate key- {Emp_ID)
All the attributes excluding {Emp_ID} are non-prime attributes.
Emp_State and Emp_City are dependent on Emp_Zipcode, which is dependent on Emp_ID. The non-prime attributes (Emp_State, Emp_City) are transitively dependent on Emp_ID (super key). Thus, it is clear that the table is not in 3NF.
To convert the table in 3NF, move the Emp_City and Emp_State to a new table and make Emp_Zipcode the primary key of the table. Now, the Employee_info table is normalized.
Table 1- Employee table
Emp_ID | Emp_Name | Emp_Zipcode |
111 | John | 25025 |
112 | George | 12542 |
113 | Mary | 26555 |
114 | Lisa | 26555 |
115 | Adam | 58622 |
Table 2- Employee_Zipcode table
Emp_Zipcode | Emp_State | Emp_City |
25025 | UP | New York |
12542 | MP | Barcelona |
26555 | MP | Norway |
26555 | AP | Illinois |
58622 | JK | Ampton |
Boyce-Codd normal form
Boyce-Codd normal form is an advanced version of the third normal form. For a table to be in BCNF, it should fulfill the following conditions:
- The table should be in 3NF.
- For every functional dependency A → B, A has to be a super key.
Example- Assume that a company wants to create a database to store the details of employees working in more than one department. The table to store the data of the employees will be as follows:
Emp_ID | Emp_Nation | Emp_Dept | Dept_Type | Emp_Dept_No. |
111 | Paris | Account | R251 | 452 |
111 | Paris | Design | R234 | 236 |
222 | Russia | Sales | R438 | 565 |
222 | Russia | Manufacturing | R721 | 412 |
Here, functional dependencies are:
Emp_ID → Emp_Nation
Emp_Dept → {Dept_Type, Emp_Dept_No.}
Candidate Key- {Emp_ID, Emp_Dept}
Here, Emp_Dept, and EMP_ID are not the only keys. So, the table is not in BCNF. To convert the table into BCNF, decompose it as shown below.
Table 1- Employee_Nation table
Emp_ID | Emp_Nation |
111 | Paris |
222 | Russia |
Table 2- Employee_Dept table
Emp_Dept | Dept_Type | Emp_Dept_No. |
Account | R251 | 452 |
Design | R234 | 236 |
Sales | R438 | 565 |
Manufacturing | R721 | 412 |
Table 3- Employee_Dept_Mapping table
Emp_ID | Emp_Dept |
111 | Account |
111 | Design |
222 | Sales |
222 | Manufacturing |
Functional Dependencies:
Emp_ID → Emp_Nation
Emp_Dept → {Dept_Type, Emp_Dept_No.}
Candidate Keys:
For table 1: Emp_ID
For table 2: Emp_Dept
For table 3: {Emp_ID, Emp_Dept}
Here, the left-hand side of both functional dependencies is a key. Hence, the table is in BCNF.
Context and Applications
Database administrators apply normalization methods in database management to organize complex tables and to make it easier to retrieve data from the database. Further, competitive exams ask questions related to all of the normalization forms.
- Bachelor of Computer Science
- Masters in Data Science
- Master of Science in Data Analytics
Practice Problems
Q1. Select the correct answer from the following statements.
- Every relation in 3 normal form is also in BCNF.
- Every relation in BCNF is also in 3 normal form.
- A relation is in 3NF if each non-prime attribute of the relation is fully dependent on every key of the relation and has a z-score.
- A relation cannot be in BCNF and 3NF together.
Correct Option- b
Explanation: BCNF is a higher and stricter version of 3NF. This indicates that for a relationship to be in BCNF, it needs to be in 3NF.
Q2. The relational database of a Student Performance (Name, CourseNo, EnrollmentNo, Grade) has the functional dependencies mentioned below:
Name, Course No. → Grade
Enrolment No., Course No. → Grade
Name → Enrolment No.
Enrolment No. → Name
What will be the highest normalization form in this database?
- 2NF
- 3NF
- BCNF
- 1NF
Correct Option- b
Explanation: Assume that the attributes (Name, Course No., Enrolment No., Grade) are (A, B, C, D). According to the information, the functional dependencies in the relational model are:
AB → D
CB → D
A → C
C → A
So, we have two candidate keys AB and CB.
Both AB → D and CB → D satisfy BCNF, since the left side is a super key.
However, A → C and C → A are not in BCNF. Therefore, check if they satisfy 3NF.
Since C and A attributes, on RHS of both functional dependencies are prime attributes, they are in 3NF.
Thus, the highest normal form of the whole relation is 3NF.
Q3. When is a table said to be in 3NF?
- It is in 2NF and has differential expression.
- It is in 2NF and has no transitive dependencies.
- It is in BCNF and has differential expression technique.
- It is in 2 normalization forms and has a spike-in structure.
Correct option- b
Explanation: For a table to be in 3NF, it has to be in 2NF and must not be transitively dependent on a prime key attribute.
Q4. Which normal form converts composite attributes into individual attributes?
- First normal form
- Second normal form
- Third normal form
- BCNF
Correct Option- a
Explanation: In 1NF, the table is normalized to remove the duplicate information (redundant data).
Q5. Which is true for tables in 2NF?
- Removes the possibility of insertion anomalies.
- Eliminate all hidden dependencies.
- Contain a composite key and multi-valued attributes.
- Contain non-key fields dependent on primary keys.
Correct Option- b
Explanation: For a table to be in 2NF, it has to be in 1NF and should not have any partial dependency.
Common Mistakes
The Boyce-Codd normal form is often considered the fourth normal form. However, BCNF is neither the third normal form nor the fourth normal form. BCNF is stricter than 3NF. It can be considered as 3.5NF.
Related Concepts
- Functional dependency
- Relational decomposition
- Anomalies in DBMS
- Types of database key
Want more help with your computer science homework?
*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.
Normalization Homework Questions from Fellow Students
Browse our recently answered Normalization homework questions.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.