Skip to content

Module 14 — Alert Triage & Incident Response

Type 6 · Reconstruct — triage a real alert through a structured process and run a lab incident end to end (NIST lifecycle), driving the case through a minimal triage harness; you commit a documented verdict that reconstructs what happened and why. (Secondary: Decision/ADR — the verdict memo is the defend-your-call discipline.) Go to the hands-on lab →

Last reviewed: 2026-06

Defensive Operationsan alert is a question; triage and IR are how you answer it without panic.

Difficulty: Intermediate  ·  Estimated time: ~5–7 hrs (study + lab)  ·  Prerequisites: Foundations

In 60 seconds

An alert is a question, not a verdict: "is this real, and how bad?" Triage is the fast filter; incident response is the disciplined sequence you run when triage says "real," so you don't improvise under pressure. The backbone is the NIST lifecycle (prep → detection/analysis → containment/eradication/recovery → post-incident). The order isn't bureaucracy — it makes you contain before you eradicate and capture the lesson after. The most undervalued phase is the post-incident review, where an incident becomes a new detection.

Why this matters

Detections fire; now what? In the 2013 Target breach, the FireEye malware-detection system did its job — it fired urgent alerts as the attackers installed exfiltration malware — but Target's security team neither acted on the alarms nor let the tool auto-delete the malware, and 40 million payment cards and 70 million customer records walked out the door anyway (US Senate Commerce "Kill Chain" staff report, 2014). A detection is worthless if no one triages it. A SOC lives or dies on a repeatable process: triage the alert (real or noise?), and when it's real, run a disciplined incident response — contain, eradicate, recover, learn — without missing steps under pressure. The NIST lifecycle is the backbone; the lab runs an incident end to end through a small Python triage harness, and TheHive is the free, open-source case-management platform you'd run the same workflow in for real (the Learn path walks it).

Objective

Triage a real alert through a structured process, and run a lab incident end to end (NIST lifecycle) on a triage harness — the same workflow a case-management platform externalises — to a documented verdict.

The core idea

An alert is a question, not a verdict: "is this real, and if so, how bad?" Triage is the fast filter — true/false positive, severity, scope — and incident response is what you run when triage says "real": a disciplined sequence so you don't improvise under pressure and skip a step that matters.

The mental model

The backbone is the NIST lifecycle (prep → detection/analysis → containment/eradication/recovery → post-incident), the same shape as SANS's PICERL mnemonic. The order isn't bureaucracy — it exists so that mid-crisis you contain before you eradicate, and you actually capture the lesson after. A case-management platform like TheHive externalises the discipline — observables, timeline, and verdict in one place, so nothing critical lives only in one analyst's head.

flowchart LR
    P["Preparation"] --> D["Detection<br/>& analysis"]
    D --> C["Containment"]
    C --> E["Eradication"]
    E --> R["Recovery"]
    R --> PI["Post-incident<br/>review"]
    PI -.->|feeds new detections| P

The order is load-bearing: contain before you eradicate (or the attacker walks back in), and the loop closes at the post-incident review — the phase most often skipped, and the only one that turns an incident into a new detection.

The gotcha

The classic failures here are emotional, not technical: eradicating before you understand scope (so the attacker simply walks back in), or containing so abruptly that you destroy the evidence you needed to answer "how did they get in?" The process exists precisely so it holds when you're stressed and the clock is running.

Go deeper: the post-incident review is where a SOC improves

The most undervalued phase is the post-incident review — it's where an incident turns into a new detection, a hardening change, or a hunt, which is the only way a SOC gets better instead of just busier. Skip it and you re-fight the same incident; the 2013 Target breach (detection fired and was ignored) is the canonical reminder that the missing piece is process, not telemetry.

AI caveat

A model drafts timelines and summaries from your notes fast (a genuine time-saver under pressure), but it will state a conclusion your evidence doesn't support, and in IR a wrong verdict has consequences. AI drafts the narrative; you verify every step against the evidence and own the call.

Learn (~4 hrs)

The process & platform - How to: TheHive — a free, open-source incident response platform (video) — case management for IR. - TheHive documentation — alerts, cases, observables, and the analyst workflow.

The method - NIST SP 800-61r2 — Computer Security Incident Handling Guide — the IR lifecycle; read the phases (prep, detection/analysis, containment/eradication/recovery, post-incident).

The cautionary tale - A "Kill Chain" Analysis of the 2013 Target Data Breach (US Senate Commerce staff report, PDF) — ~15 pp; read the Executive Summary and Section A. Detection fired and was ignored — the canonical example of why triage discipline matters as much as detection coverage.

Key concepts

  • Triage: alert → true/false positive → severity
  • The NIST IR lifecycle (and SANS PICERL)
  • Cases, observables, and timelines
  • Containment vs eradication vs recovery
  • The post-incident review (where the learning is)
  • The Target 2013 lesson: an un-triaged alert is the same as no alert

AI acceleration

A model drafts incident timelines and summaries from your notes fast — a real time-saver under pressure. But it'll also state a conclusion your evidence doesn't support; in IR a wrong verdict has consequences. AI drafts the narrative; you verify every step against the evidence and own the call.

Check yourself

  • Why does the NIST order put containment before eradication — what goes wrong if you reverse them?
  • The Target detection system fired correctly and the breach still happened — what does that tell you about where SOCs actually fail?
  • Which IR phase is most often skipped, and why is skipping it the reason a SOC stays busy without getting better?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).