Data Classification Framework
Classify data; apply controls per class.
Classes
Data classification is the discipline of deciding which data needs which protection. Without it, the team applies the same controls to everything, which means either over-protecting public data (wasting effort) or under-protecting sensitive data (creating risk). The classification is the input that makes all subsequent control decisions tractable.
The standard four-class framework:
- Public.: Data the company actively wants to be public. Marketing materials, blog posts, public API responses, the company website. The control is "this is meant to be public; do not accidentally treat it as confidential." Counterintuitively, public data needs protection too: from accidental modification, from defacement, from being mistaken for internal-only material.
- Internal.: Data that should not be public but is not particularly sensitive. Internal documentation, employee org charts, project plans, business plans (in early stages). Loss of internal data is embarrassing but not catastrophic. The control is "internal employees can see this; the public cannot."
- Confidential.: Data whose loss would meaningfully harm the company. Customer data, financial records, source code, security configurations, sensitive business plans. Loss is a real incident with real consequences. The control is "only specific roles need this; everyone else should not have access."
- Restricted.: Data whose loss would be catastrophic. Cryptographic keys, root credentials, certain customer PII, regulated data (PHI, payment data). Loss triggers regulatory notification and serious customer harm. The control is "very few people have access; access is heavily audited; loss triggers immediate response."
- Standard framework.: The four-class framework is widespread because it covers most real-world classification needs without being too granular to apply consistently. Some organizations use three classes (collapse internal and confidential); some use five (split restricted into "highly restricted" and "regulated"). The exact count is less important than having a framework.
Classification is the foundation. Every other data-protection control is calibrated against the class assigned to the data.
Controls
Each class has a different set of required controls. The mapping is documented and enforced; it makes the classification operational rather than aspirational.
- Higher class equals stricter controls.: Public data has minimal controls (mostly availability and integrity). Internal data adds access control (authentication required). Confidential data adds encryption at rest and in transit, access logging, and least-privilege access. Restricted data adds MFA requirements, deeper audit logging, regular access reviews, and often physical or technical separation.
- Encryption requirements per class.: Public data may not be encrypted (the cost is unnecessary). Internal data is encrypted at rest. Confidential data is encrypted at rest and in transit, with managed keys. Restricted data is encrypted with HSM-backed keys, with split-knowledge key management, and often with field-level encryption.
- Access control per class.: Public is open. Internal requires authentication. Confidential requires authentication, role membership, and is logged. Restricted requires authentication, MFA, role membership, and is heavily logged. Each step up adds friction in proportion to the data's sensitivity.
- Audit per class.: Public access is not specifically audited. Internal access is logged in aggregate. Confidential access is logged per request. Restricted access is logged per request, alerted on anomalies, and reviewed periodically. The audit cost matches the sensitivity.
- Mapped per class, documented, enforced.: The control matrix is documented in the security policy. The enforcement is technical (access controls in databases, encryption in storage tiers, audit logging in observability). The class drives the controls automatically; humans do not have to remember which controls apply to which data.
The control mapping is what turns the classification from a label into an operational practice. Without the mapping, the classification is documentation; with it, it is policy.
Apply
The third leg is application: actually tagging data stores with their class and enforcing the corresponding controls. This is the operational work that requires sustained attention.
- Tag every data store with class.: S3 buckets, RDS databases, NoSQL collections, file shares, search indices, data warehouses, log streams. Each gets a class tag. The tag is enforced via cloud-platform policy: untagged stores cannot be created, or are created with the most restrictive defaults.
- Enforces controls automatically.: The class tag drives downstream policy. A bucket tagged "restricted" automatically gets encryption with the right key, the right access policy, the right logging configuration. The tag is the entry point; the controls follow.
- Discovery for legacy data.: Existing data stores need to be classified. The team runs a discovery exercise: list every store, identify its data, assign a class. Tools like Macie (AWS), DLP (GCP), or Privacera can help by scanning content and suggesting classifications. The exercise is one-time; the maintenance is incremental.
- Compliance trail.: The classification documentation and the enforcement evidence are the audit trail for compliance frameworks. SOC 2, HIPAA, PCI DSS, ISO 27001 all expect data classification; the artifacts produced by this practice satisfy them directly.
- Reclassify when use changes.: Sometimes data changes class. A data store that started as internal becomes confidential when it starts holding customer data. The reclassification is a deliberate event with documented reasoning. The controls update to match.
Data classification is one of those security disciplines that produces compounding returns: each correctly classified data store is one less unknown for the security team to investigate during incidents. Nova AI Ops integrates with cloud-platform tagging, surfaces unclassified data stores, audits the controls applied per class, and tracks the classification coverage across the data inventory.