Concepts and Strategies to Protect Data
Data types, classifications, the three data states, sovereignty, and the toolbox — encryption, hashing, masking, tokenization, DLP — that keeps sensitive information from walking out the door.
Data protection is not one control — it is a stack of controls matched to where the data lives and who is allowed to see it. Every Domain 3.3 question tests whether you can pick the right tool for the right state: at rest, in transit, or in use.
Three mental hooks carry most questions: (1) Classify before you protect — you cannot defend what you have not labeled (public, sensitive, confidential, restricted, private, critical). (2) State drives the control — TLS for transit, disk/field-level encryption for rest, enclaves or tokenization for in-use. (3) Sovereignty is a legal layer on top of technical protection — encryption does not exempt you from GDPR or data-residency law. The exam repeatedly distinguishes hashing from encryption, masking from obfuscation, and tokenization from encryption — learn those four pairs cold.
Data types. The first job is knowing what kind of data you are holding, because different types trigger different legal and business obligations. Regulated data is governed by law — GDPR for EU personal data, HIPAA for US health information, PCI DSS for card data, SOX for financial reporting. Trade secrets derive value from being kept secret (formulas, proprietary algorithms). Intellectual property (IP) covers patents, copyrights, and trademarks. Legal information includes privileged communications and litigation holds. Financial information covers accounts, transactions, and M&A data. Data may be human-readable (documents, emails) or non-human-readable (binary protocol payloads, machine logs) — both can be sensitive.
Data classifications. Classification is the label that drives handling. The Security+ objectives call out: sensitive (non-public, protection required), confidential (internal; disclosure would harm the organization), public (approved for external release), restricted (tightly controlled, need-to-know, often regulated or top secret), private (personal/individual-linked data — PII), and critical (essential to business operations). Labels should be simple enough that staff apply them correctly; over-classification is as bad as under-classification.
Data states. Every piece of data is in exactly one of three states at any moment. Data at rest sits on disk, tape, or object storage — protect it with full-disk encryption (BitLocker, LUKS), transparent database encryption (TDE), field-level encryption, and access control lists. Data in transit moves across a network — protect it with TLS 1.2+ for application traffic, IPSec for VPN tunnels. Data in use is in RAM or CPU registers while being processed — the hardest state. Protections include memory encryption, confidential computing, and secure enclaves like Intel SGX and AMD SEV, which isolate sensitive computation from the host OS and even the hypervisor.
Data sovereignty and geolocation. Sovereignty is the legal principle that data is subject to the laws of the country where it physically resides. EU, Russia, China, and many others have enacted data-residency laws that restrict cross-border transfer. Geolocation is the physical location of the data — drives sovereignty obligations and also latency and disaster-recovery design. Cloud providers offer region selection precisely so customers can meet these constraints; use tagging and policy to enforce.
Methods to secure data. Geographic restrictions: cloud region locking and policy to prevent out-of-region writes. Encryption: symmetric (AES) or asymmetric (RSA, ECC) transformation requiring a key to reverse — protects confidentiality and is applied to all three states. Hashing: one-way transformation producing a fixed-length digest; used for integrity (file verification) and password storage (always with a salt and a key-stretching function such as bcrypt, scrypt, or Argon2). Masking: display-time partial obfuscation (show last 4 of SSN); the underlying value is unchanged. Tokenization: replace the sensitive value with a random token and keep the original in a tightly controlled vault; PCI DSS scope is dramatically reduced because the token has no mathematical relationship to the PAN. Obfuscation: make data harder to read without cryptographic guarantee (base64, hex encoding) — a speed bump, not a protection. Segmentation: isolate data stores by sensitivity (separate databases, VLANs, or tenants). Permission restrictions: least privilege enforced through ACLs, RBAC, or ABAC — the everyday control most incidents bypass because it was misconfigured.
Data Loss Prevention (DLP). Content inspection that searches traffic, endpoints, or cloud storage for sensitive patterns — credit card numbers, SSNs, intellectual-property markers. Three deployment points: endpoint DLP (agent on the device, can stop a USB copy), network DLP (inline at egress or email gateway, inspects outbound traffic), and cloud DLP (typically via a Cloud Access Security Broker, scans SaaS stores). Common actions: block, quarantine, alert, redact. Tuning is the hard part — too strict breaks the business, too loose misses the breach.
| State | Where it lives | Primary controls | Exam cue |
|---|---|---|---|
| At rest | Disk, tape, object storage, backups | BitLocker/LUKS, TDE, field encryption, ACLs | “Laptop stolen”, “Backup tape lost” |
| In transit | Network, API calls, replication | TLS 1.2+, IPSec, SSH | “User uploads to S3”, “App calls database” |
| In use | RAM, CPU registers, cache | Enclaves (SGX/SEV), confidential computing, tokenization | “Processing PAN in memory”, “Protect key during compute” |
| Technique | Reversible? | When to use | Common confusion |
|---|---|---|---|
| Encryption | Yes, with key | Confidentiality for any state | Not integrity by itself |
| Hashing | No (one-way) | Integrity, password storage (+ salt + bcrypt/scrypt/Argon2) | Not encryption — no key |
| Masking | Partial (display only) | Showing last-4 of SSN/PAN to call-center reps | Underlying value is unchanged |
| Tokenization | Only via vault lookup | PCI PAN storage, removing systems from scope | Token has no math link to original |
| Obfuscation | Trivially | Deter casual observation (base64, hex) | Not a security control by itself |
| Classification | Meaning | Typical example |
|---|---|---|
| Public | Approved for external release | Marketing site, published annual report |
| Private | Individual-linked data (PII) | Employee home address, customer email |
| Sensitive | Non-public; protection required | Internal roadmap, non-PII HR data |
| Confidential | Disclosure would harm the org | M&A plans, source code |
| Restricted | Need-to-know; often regulated | Cardholder data (PCI), PHI (HIPAA), classified |
| Critical | Essential to business operations | Customer database, production encryption keys |
A mid-size retailer is preparing for its PCI DSS assessment. The e-commerce platform stores the full primary account number (PAN) in a central orders database to support refunds and recurring billing. The QSA has flagged the orders database, the analytics warehouse, the customer-service tooling, and the reporting BI system as “in scope” — four systems now subject to the full PCI controls regime. The CISO is meeting the lead DBA to decide how to fix it.
CISO ↔ Lead DBA
PCI Scope ReductionYour customer-service agents verify callers by confirming the last four digits of a card on file. The current console displays the full PAN from the database. You must change it — but the reps still need something to verify callers. You can modify the console, the database, or both. Budget is small; the change ships in two sprints.
Data masking at the console
Leave the database as-is; change the console to render only the last four digits of the PAN. Reps can verify callers but cannot see or export the full number.
Tokenize the PAN at the database
Replace every PAN in the database with a token; store the real PAN in a vault. Console only ever shows the token’s last-four metadata.
Option A is the right call for this scope.
The requirement is display-level: reps need to confirm the last four, nothing more. Masking at the console satisfies that in one change, in one sprint, on one codebase. It keeps the underlying data available for legitimate business operations (refunds, chargebacks) without forcing a full tokenization project.
Why Option B is the wrong fit here: tokenization is the right answer when the goal is to reduce PCI scope across multiple downstream systems. For a single console change, it is a hammer for a thumbtack — the vault, migration, and API redesign cost months, not two sprints. Save tokenization for the scope-reduction project and use masking for the display-only fix.
Both Options are valid controls — the exam tests whether you know which control fits which problem. Masking hides part of the display; tokenization replaces the stored value. Problem = display → masking. Problem = scope reduction → tokenization.
- A Full-disk encryption on the application server
- B TLS 1.3 between the app and the card network
- C A secure enclave (Intel SGX or AMD SEV) to isolate the processing from the host OS
- D Daily tape backups of the transaction log
Correct: C. The data is in use — actively in RAM. Enclaves (SGX/SEV) and confidential computing are the Security+-cited controls for that state, because they protect the computation from the host OS and even the hypervisor.
A wrong: full-disk encryption protects data at rest. B wrong: TLS protects data in transit — not in memory. D wrong: backups are a recovery control, not a data-in-use protection.
Source: CompTIA SY0-701 Objectives v5.0 — 3.3 Data Protection
- A Encrypt the PAN column with TDE
- B Replace the PAN with a vault-backed token; share only the token with downstream systems
- C Mask the PAN to show only the last four digits in reports
- D Hash the PAN using SHA-256 before writing to the warehouse
Correct: B. Tokenization replaces the PAN with a random value that has no mathematical relationship to the original. Systems that never see the real PAN fall out of PCI scope, which is the specific business goal stated.
A wrong: TDE still leaves the PAN accessible to queries, so downstream systems remain in scope. C wrong: masking is a display-layer control and does not change the stored data, so scope is unchanged. D wrong: hashing is one-way; reporting would lose the ability to correlate transactions to a customer, which defeats the use case (and PCI still treats truncated or hashed PANs carefully).
Source: CompTIA SY0-701 Objectives v5.0 — 3.3 Data Protection
- A Storing data in a Frankfurt region guarantees it is governed only by EU law
- B Locality is the physical storage location; sovereignty is the legal regime that applies, and the two can require separate controls and agreements
- C Sovereignty only applies to government data; commercial data uses locality rules
- D Both terms refer to the same concept and can be used interchangeably
Correct: B. The Security+ objectives treat locality and sovereignty as related-but-distinct: where data is vs. what law applies. Parent-company jurisdiction, cross-border transfer mechanisms, and customer nationality can all add obligations beyond the physical region.
A wrong: region placement is necessary but not sufficient — cross-border access by staff or parent-company law can still trigger other regimes. C wrong: sovereignty applies broadly, including to commercial data. D wrong: the exam explicitly distinguishes the terms.
Source: CompTIA SY0-701 Objectives v5.0 — 3.3 Data Protection