Skip to content

Lab 12 — Malware Artifacts in IR: CAPA and YARA

Hands-on lab · ← Back to the module concept

Setup

This is a reference lab — it ships a one-command environment in the companion plaintext-labs repo:

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/forensics/12-malware-artifacts-ir
make up         # builds container with capa and yara; ships benign PE samples
make fetch-data # OPTIONAL: pull a REAL Latrodectus sample from MalwareBazaar (needs Auth-Key)
make demo       # runs capa over samples/ and fires yara rules/ against them
make shell      # interactive shell for investigation
make down        # stop when done

The lab ships data/samples/ — three benign PE binaries compiled from simple C programs. No malware is committed; the lab practices the tooling and output interpretation against safe samples. The YARA rules targeting the loader family are in rules/latrodectus_loader.yar.

Optional real target (advanced). make fetch-data pulls a genuine Latrodectus loader sample from MalwareBazaar (abuse.ch) — the same loader family from the DFIR Report "Lunar Spider" case this track is anchored to — so you can tune the YARA rule against the real thing. The sample is live malware: make fetch-data downloads the password-protected zip (password infected) but does not unzip it. Only detonate or even unpack it inside an isolated analysis VM you own — never on your host. A free abuse.ch Auth-Key is required (see PROVENANCE.md).

Do not commit real malware to this repository, and do not run YARA/CAPA against binaries on systems you don't own or aren't authorised to analyse. The committed samples directory contains only benign binaries. Any real malware (the MalwareBazaar sample) is handled exclusively in an isolated analysis environment.

Scenario

The IR team has recovered the loader binary dropped on BEACHHEAD-WS01 — in the anchored DFIR Report "Lunar Spider" case, this is the Latrodectus loader delivered via Form_W-9*.jsupdate.msi. The binary has been safely uploaded to an isolated analysis environment (not this lab). Your task is to practice the triage workflow: understand CAPA's capability output format, write a YARA rule that matches the loader based on its known characteristics (C2 string, network imports), and confirm the benign samples in the lab do not match your rule (false-positive check). For the advanced path, make fetch-data pulls a real Latrodectus sample so you can validate the rule against the genuine family.

Do not test YARA rules or CAPA against binaries on systems you don't own or aren't authorised to analyse. In production, all sample analysis is conducted in an isolated environment.

Do

Part 1 — CAPA capability profiling

  1. [ ] make demo — read the CAPA output for each sample. For each binary, note:
  2. The top-level capabilities listed (e.g., "link function at runtime," "create process").
  3. The ATT&CK technique mapped to each capability (shown in the [ATTCK] column).
  4. How many capabilities does the most "feature-rich" benign binary have?

  5. [ ] Run CAPA manually on the largest sample inside make shell:

    capa /data/samples/<largest_binary>
    
    Find a capability that maps to ATT&CK T1059 or T1547. Does the sample actually use that capability maliciously? What additional evidence would you need to conclude "yes"?

  6. [ ] Interpret a hypothetical CAPA output. The data/hypothetical_capa_output.txt file shows what CAPA would output for a loader with network C2, persistence via scheduled task, and process injection capability. Read it and write three sentences: what this binary can do, what the IR team should check next to confirm it did do those things, and which other modules' artifacts would provide that confirmation.

Part 2 — YARA rule writing

  1. [ ] Examine rules/latrodectus_loader.yar — the pre-written YARA rules for the Latrodectus loader. Read every condition. Why is each rule scoped to PE files? What would happen if you removed the uint16(0) == 0x5A4D check?

  2. [ ] Run the rules against the benign samples:

    yara -r /rules/latrodectus_loader.yar /data/samples/
    
    Do any benign samples match? If yes, identify which condition caused the match and explain how you'd tighten the rule.

  3. [ ] Write your own YARA rule. Open a new rule file (rules/custom.yar) and write a rule that matches any PE file containing both the string workspacin.cloud (the real Latrodectus C2 from the case) and an import of WSAConnect or connect. Test it against the benign samples to confirm no false positives.

  4. [ ] Add a meta section to your rule with: author, description, date, reference (cite a MITRE ATT&CK technique), and hash (the SHA256 of the sample). If you ran make fetch-data, use the real Latrodectus sample's SHA-256; otherwise use the placeholder in data/hypothetical_capa_output.txt. This is production rule hygiene.

  5. [ ] (Advanced) Validate against the real sample. If you fetched the Latrodectus sample with make fetch-data, unzip it (password infected) inside an isolated analysis VM only, and run rules/latrodectus_loader.yar and your custom.yar against it. Does the curated rule fire? Record the sample's SHA-256 in your meta and note which strings matched a real binary versus a benign one.

Success criteria — you're done when

  • [ ] You have documented the capability profile of each benign sample (number of capabilities, highest ATT&CK technique severity).
  • [ ] You have written three sentences interpreting the hypothetical dropper's CAPA output.
  • [ ] rules/latrodectus_loader.yar runs cleanly against the benign sample directory with zero false positives.
  • [ ] rules/custom.yar — your rule — also produces zero false positives on the benign set.
  • [ ] Both rules have complete meta sections.

Deliverables

Commit to your portfolio repo: - rules/custom.yar — your YARA rule with complete meta section. - capability-analysis.md — the three-sentence dropper interpretation + false-positive analysis.

Do not commit PE binaries — reference sample filenames in your analysis.

Automate & own it

Required. Write a Python script triage_samples.py that: 1. Accepts a directory of binary files. 2. Runs yara (via subprocess) against each file using all .yar files in a specified rules directory. 3. For each match, prints: filename, matched rule name, matched strings. 4. Outputs a summary: total files scanned, total matches, list of matched files.

Have a model draft the script; test it against the lab's benign sample set and confirm the output is correct before using it on anything else. Commit triage_samples.py.

AI acceleration

Describe the Latrodectus loader's characteristics to a model ("PE file, imports WSAConnect and CreateRemoteThread, contains string 'workspacin.cloud', uses UPX packing with section name .upx0") and ask it to draft a YARA rule. Compare the model's draft to rules/latrodectus_loader.yar in the lab. Where does the model's rule differ? Is it more or less specific? Run both against the benign samples — does the model's version produce false positives that the curated rule avoids? The comparison teaches rule quality faster than reading about it.

Connects forward

The YARA rule you write becomes a retroactive hunt artifact: in a real engagement, it goes to the threat intel team and to the EDR for a fleet-wide search. Module 13 (IR Process) documents this handoff as part of the NIST "Containment, Eradication, and Recovery" phase. Track 04 (Malware Analysis) covers the deep reverse-engineering that follows triage.

Marketable proof

"I triage suspicious binaries using CAPA capability profiling, write YARA rules from sample characteristics, validate them against benign baselines, and hand off operationalisable IOCs to threat intel — without running the malware."

Stretch

  • Add a CAPA rule to rules/ (in CAPA's YAML format) that matches a binary which resolves a network function at runtime (e.g., via GetProcAddress). Test it against the benign samples. Observe how the CAPA rule grammar differs from YARA's.
  • Use YARA's pe module to write a rule that matches any PE binary whose import hash (imphash) matches a specific value — the imphash ties samples from the same build environment together across campaigns. Document why imphash-based rules degrade over time (hint: recompiling changes the imphash even if the code is identical).

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).