First, Second, and Third Normal Form


Functional Dependence

Column A depends on column B means each B value is related to at most one A value.

  • Columns A and B may be simple or composite
  • denoted as ‘A B’

Functional dependence is when one column depends on another.

  • reflects business rules
  • e.g., each student receives one letter grade in a course
    • indicates the Grade column depends on the composite column (StudentID, CourseCode)

Multivalued dependence and join dependence entail dependencies between three or more columns.

  • are complex and uncommon

Normal Forms

Redundancy is the repetition of related values in a table.

  • causes database management problems
    • when related values are updated, all copies must be changed
      • makes queries slow and complex
    • if copies are not updated uniformly,
      • copies become inconsistent and the correct version is uncertain

Normal forms are rules for designing tables with less redundancy.

  • are numbered first through fifth
  • Boyce-Codd is an improved version of third normal form
  • six normal forms comprise a sequence
    • with each successive normal form allowing less redundancy

Normal forms

  • redundancy occurs when a dependence is on a column that is not unique
    • Boyce-Codd normal form eliminates all dependencies on non-unique columns
      • is the most important normal form
  • fourth normal form eliminates multivalued dependencies
  • fifth normal form eliminate join dependencies

First Normal Form

  • Every cell of a table contains exactly one value

A table is in first normal form (1NF) when the table has a primary key and every cell contains exactly one value.

  • two corollaries:
    • every non-key column depends on the primary key
      • includes partial dependence
      • each PK value is unique
      • each non-key cell contains exactly one value (atomic)
      • so each PK value is related to exactly one non-key value
    • has no duplicate rows
      • every row contains a different PK value and therefore every row is different
  • in practice, databases allow tables with duplicate rows and no PK
    • these are usually temporary tables
    • when data is moved to a permanent table, duplicate rows are removed and a PK is created

Alternative definitions of first normal form

  • first normal form is commonly defined in several ways:
    • the table has a primary key
    • every non-key column depends on the primary key
    • the table cannot have duplicate rows
    • every cell contains exactly one value
  • first 3 are equivalent
  • last definition is true of any relational table
    • allows for duplicate rows and no primary key

Second Normal Form

A table is in second normal form (2NF) when all non-key columns depend on the whole primary key.

  • non-key column cannot depend on part of a composite primary key
  • table with a simple primary key is automatically in second normal form
  • table in second normal form is also in first normal form by definition

Third Normal Form

  • Redundancy can occur in a second normal form table when a non-key column depends on another non-key column

A table is in third normal form when all non-key columns depend on the key, the whole key, and nothing but the key.

  • informal definition
    • accurate when the primary key is the only unique column
  • is also in first and second normal forms, by definition
  • no transitive dependencies

A candidate key is a simple or composite column that is unique and minimal.

  • minimal means all columns are necessary for uniqueness
  • can contain NULL values
  • table may have several candidate keys
  • designer designates one candidate key as the primary key

A super key is an attribute or set of attributes used to uniquely identify all attributes in a relation.

  • does not have to be minimal
    • can contain extra columns/attributes
    • can include redundant attributes/columns
  • can contain NULL values
  • is not necessarily a candidate key
  • candidate key is a minimal super key

non-key column is a column that is not contained in a candidate key.

A table is in third normal form if, whenever a non-key column A depends on column B, then B is unique.

  • columns A and B may be simple or composite
  • although B is unique, B is not necessarily minimal
    • and therefore is not necessarily a candidate key