Data Classification


Data classification is the process of applying confidentiality and privacy labels to information based on the adverse effect of unauthorized disclosure.

  • typing schemas tag data assets so that they can be managed through the information lifecycle
  • determines data governance and retention processes:
    • data governance
      • is a collection of processes detailing how data is collected and accessed during the data’s life cycle
    • data retention
      • is a collection of processes detailing how data is stored for a specified amount of time

Data Classification Schema

A data classification schema is a decision tree for applying one or more tags or labels to each data asset.

  • multiple kinds of classification schemas
    • based on the degree of confidentiality required:
      • Public (unclassified)
        • no restrictions on viewing the data
        • only presents a risk when availability or integrity is compromised
        • may require authorization before release
      • Confidential
        • information is sensitive but can be declassified
        • suitable for viewing only by personnel within the organization and possibly by trusted third parties under conditions such as NDAs
        • does not necessarily include information requiring protection at the national security level
      • Secret
        • information that, if disclosed, could cause serious damage to national security
        • restricted to individuals with a need to know
      • Top Secret
        • highest level of classification
        • information whose unauthorized disclosure could cause exceptionally grave damage to national security
        • extremely restricted and monitored
    • based on the kind of information asset:
      • Proprietary
        • aka intellectual property (IP)
        • is information created and owned by the company, typically about the products or services that they make or perform
      • Private/personal data
        • information relates to an individual identity
      • Sensitive
        • label is usually used in the context of personal data privacy-sensitive information about a subject that could harm them if made public and could prejudice decisions made about them if referred to by internal procedures
        • as defined by the EU’s GDPR
          • includes religious beliefs, political opinions, trade union membership, gender, sexual orientation, racial or ethnic origin, genetic data, and health information
      • Restricted
        • refers to sensitive information that requires stringent controls and limited access due to its highly confidential nature
        • includes data that, if disclosed or accessed by unauthorized individuals, could cause significant harm to individuals, organizations, or national security
    • based on NIST SP 800-53B:
      • low level impact
        • is a data classification level indicating unauthorized disclosure causes a limited adverse effect
      • moderate impact level
        • is a data classification level indicating unauthorized disclosure causes a serious adverse effect
      • high impact level
        • is a data classification level indicating unauthorized disclosure causes a catastrophic adverse effect