Using Data Sources to Support an Investigation
Firewall, application, endpoint, OS, IDS/IPS, network, and metadata logs — pick the right source for the question you are trying to answer.
Every investigation is a question: who authenticated at 02:14?, which external IP did the compromised host reach?, what payload was actually exfiltrated? Security+ 4.9 asks you to match the question to the data source. Different logs answer different questions, and picking the wrong source wastes time while evidence ages.
The mental model: firewall logs tell you allow/deny at the perimeter, endpoint/Sysmon tells you what ran on the host, OS security logs (Windows 4624 etc., Linux auth.log) tell you who authenticated, NetFlow tells you who talked to whom, packet capture tells you exactly what was said, and metadata (email headers, EXIF, MAC times) tells you attribution and context. Match question to source in the first two seconds of an investigation.
Log data. The primary raw material.
- Firewall logs — allow/deny decisions, source and destination IP/port, timestamps, interface, rule id. Answers: “did this traffic cross the perimeter and which rule matched?”
- Application logs — business transactions (order placed, transfer initiated), error stacks, authentication events at the app layer. Answers: “what did the user attempt within the app?” Structured JSON logs beat free-form strings every time.
- Endpoint logs — process creation, file writes, registry changes, network connections, command line. Sysmon on Windows is the gold-standard augmentation. EDR telemetry is enriched endpoint log. Answers: “what ran on this host and what did it touch?”
- OS-specific security logs — Windows Event Log (key IDs: 4624 successful logon, 4625 failed logon, 4672 special privileges assigned, 4648 explicit credential use, 4688 new process). Linux: /var/log/auth.log, /var/log/secure, auditd. Answers: “who authenticated, when, and from where?”
- IDS/IPS logs — alerts with signature/rule id, packet payload snippet, classification. Answers: “did we match a known attack pattern?”
- Network logs — NetFlow / IPFIX / sFlow (metadata), DNS queries (who resolved what), DHCP leases (who had which IP when), proxy logs. Answers: “which hosts talked to which external services, how much, when?”
- Metadata — file MAC times (modified/accessed/created), email headers (received-from, authentication results), image EXIF (camera, GPS, time). Answers: “where did this artifact come from and when did it move?”
Data sources (beyond raw logs).
- Vulnerability scans — “what known weaknesses are present?” Useful to correlate a finding to a confirmed exploit.
- Automated reports — daily/weekly posture summaries; context for what “normal” looked like before the incident.
- Dashboards — the visualization layer on top of logs; quick anomaly detection by eye.
- Packet captures — full fidelity. Wireshark / tcpdump / Zeek. Expensive to store, essential for payload reconstruction and deep protocol analysis.
Windows Event IDs to know cold.
- 4624 — An account was successfully logged on. Includes logon type (2 interactive, 3 network, 10 remote interactive).
- 4625 — An account failed to log on. Useful for detecting brute force or kerberoasting.
- 4672 — Special privileges assigned to new logon. Fires for admin-equivalent logons.
- 4648 — A logon was attempted using explicit credentials (RunAs patterns).
- 4688 — A new process has been created (with command line if audit-cmdline is enabled). Key for detecting LOLBin abuse.
Log integrity — centralization matters. An attacker with admin rights on a compromised host can clear or edit local logs. Centralized log shipping (to a SIEM or durable store) before an incident means you still have the record even if the host is wiped. Centralization is a control, not just a convenience.
| Investigation Question | Best Log / Data Source | Backup Source |
|---|---|---|
| Who authenticated at 02:14 and from where? | Windows Event 4624 / Linux auth.log / IdP logs | VPN / SSO logs |
| Was there a brute-force attempt on this account? | Windows 4625 / auth.log failed entries / IdP failed login events | WAF logs if web-facing |
| Which external IPs did the compromised host contact? | Firewall logs + NetFlow + DNS queries | Proxy logs |
| What exact payload was exfiltrated? | Packet capture (PCAP) | Proxy logs if HTTP(S)-decrypted |
| What process ran at the time of the alert? | Endpoint logs / Sysmon / EDR (process creation, 4688) | Parent-child process tree in EDR |
| Did a known-signature attack fire? | IDS/IPS alert log | EDR detection events |
| Was a file modified/accessed at a specific time? | Endpoint FIM + MAC times + Sysmon file events | Backup timestamps for corroboration |
| Where did this email really come from? | Email headers (metadata) | Secure email gateway transit logs |
| Which DNS queries preceded the breach? | DNS logs / DNS filter logs / passive DNS | NetFlow for destination inference |
| What volumes of traffic flowed between subnets? | NetFlow / IPFIX | Firewall counters |
| Windows Event ID | Meaning | Investigative Value |
|---|---|---|
| 4624 | Successful logon | Who authenticated, when, how (logon type) |
| 4625 | Failed logon | Brute-force detection, typo signal |
| 4672 | Special privileges assigned | Admin-equivalent logon occurred |
| 4648 | Logon with explicit credentials | RunAs / credential theft signal |
| 4688 | New process created (with cmdline) | LOLBin detection, lateral movement |
Question drives source. Auth questions → OS security log / IdP. External communication → firewall, DNS, NetFlow. Payload detail → PCAP. Host activity → endpoint/Sysmon. Attribution → metadata. Centralize early so compromised hosts cannot erase the evidence.
HR has referred a departing employee for investigation based on a tip: the person may have downloaded customer lists before giving notice. The IR team needs to reconstruct what happened over the last 30 days. Multiple data sources are available; the question is which to pull and in what order.
Suspected insider data exfil — picking the right logs
HR referral · 30-day lookback · customer list suspectedStart with the most specific log, widen to corroborate. DLP and endpoint tell you what was copied; proxy and email tell you where it went; file-server audit tells you what was touched. Triangulate across sources and NTP-synchronized timestamps. Preserve everything under legal hold before normal rotation deletes it.
Investigations reward logging discipline done years earlier. The organization that lit up DLP, Sysmon, object-level file auditing, and proxy logging before any incident is the one that can answer HR’s question in hours. The one that did not will spend weeks guessing. Investing in logging is investing in future investigations.
On the exam: “what payload was exfiltrated?” → packet capture. “who authenticated?” → OS security log. “which domains did host resolve?” → DNS logs.
An investigator needs to reconstruct the exact HTTP request body of a suspected data exfiltration from a compromised internal server to an external IP. NetFlow and PCAP are both available; PCAP captures cover the relevant time window for critical links. Which is the right primary source for THIS question?
NetFlow — faster to query, lower storage
Shows src/dst/port/bytes/flags over the time range. Efficient for volume questions.
Packet capture — full content for payload reconstruction
Reconstructs the exact HTTP request body, headers, and sequence of bytes sent.
Option B fits better — payload reconstruction requires PCAP
Option B: NetFlow contains metadata only; it tells you that a connection happened, not what was said. To see the actual HTTP request body (or any application-layer payload), you need the packet capture. For this specific question — what was exfiltrated — PCAP is the only source.
Option A’s kernel of truth: For volume or pattern questions (“how much, how often, to which destinations”), NetFlow is the right first stop. Payload questions require PCAP.
On the exam: “reconstruct payload” / “exact content” / “exfil details” → PCAP. “volume” / “who talked to whom” → NetFlow.
4.9 is question-to-source matching. Build a reflex map: “who authenticated” → OS security log / IdP; “which external IPs” → firewall + NetFlow + DNS; “what payload” → packet capture; “what ran on host” → endpoint / Sysmon / EDR / 4688; “where did email come from” → email headers. Centralize logs so admin-level compromise cannot erase them.
- A Firewall logs
- B Windows Event Log 4624 (successful logon) on the target server or the domain controller
- C EDR process-creation events
- D NetFlow records
Correct: B. 4624 is the canonical Windows “who authenticated” event and includes the logon type field. Firewall logs show traffic, EDR shows processes, NetFlow shows volumes.
Source: CompTIA SY0-701 Objectives v5.0 — 4.9
- A NetFlow metadata
- B Full packet capture (or decrypted proxy logs) for the relevant link and time window
- C DHCP lease file
- D Windows 4625 events
Correct: B. Payload reconstruction requires full content — PCAP or a decrypted proxy log. NetFlow has no payload; DHCP and 4625 are unrelated to HTTP payload.
Source: CompTIA SY0-701 Objectives v5.0 — 4.9
- A Trust the local cleared logs
- B Centralized log forwarding (to a SIEM or write-once store) configured before the incident
- C Run chkdsk
- D Reimage and hope
Correct: B. Centralized forwarding means the off-host copy survives local tampering. The design decision must exist before the incident to have forwarded the data.
Source: CompTIA SY0-701 Objectives v5.0 — 4.9