Skip to content

Lab 09 — Unpacking & Obfuscation

Hands-on lab · ← Back to the module concept

Setup

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/09-unpacking-obfuscation
make up
make fetch-sample      # pulls a real, genuinely UPX-packed sample from MalwareBazaar into the isolated container
make demo

⚠ This lab unpacks a live malware sample. Handle it accordingly — and never execute it. - Static only. This module unpacks and reads; it never runs the sample — entropy, section names, strings, upx -d, YARA. Unpacking is a static transform; do not detonate the binary or the payload it carries. - Isolation. All work stays inside the isolated container; never copy the sample to your host. - Hygiene. The sample is fetched at lab time (password-protected zip, password infected) and is never committed.gitignore covers samples/. make fetch-sample needs a free abuse.ch Auth-Key (set MB_AUTH_KEY). The upx tag is the reliable way to guarantee a genuinely UPX-packed binary. - Offline fallback. No key / MalwareBazaar unreachable? Skip make fetch-sample; make demo falls back to the bundled synthetic target — compiled from benign data/target.c, packed in-container with UPX — plus data/encoded_strings.py for the string-deobfuscation half.

Scenario

A triage queue drops a Windows executable an EDR flagged as "high entropy, likely packed." Open it in strings and you get almost nothing — a short stub, the import table mostly gone, and a tell-tale .UPX0 / .UPX1 section pair. That is the point of packing: the real code is compressed and only decompresses in memory at runtime, so static tooling sees a near-empty shell. This is MITRE ATT&CK T1027.002 — Software Packing, and UPX is the canonical packer: because it ships a standard unpack stub, upx -d reverses it back to the original payload without ever running it.

Your job is to defeat two layers of obfuscation, statically. First, confirm the binary is packed (entropy above 7.0, the .UPX0/.UPX1 section names, the UPX banner), unpack it with upx -d, and re-run strings and imports to reveal everything packing hid. Second, decode a loader script's XOR-obfuscated strings to recover the IOCs hidden from naive string extraction. The real packed sample (make fetch-sample) is the primary artifact; the bundled synthetic target + encoded_strings.py are the offline fallback that the demo and steps below exercise identically.

Throughout, the sample = the real UPX-packed PE that make fetch-sample drops in samples/ (path printed by the target). In offline mode it is the bundled synthetic target, compiled and packed in-container by make demo.

Do

The steps read against the sample — the real UPX-packed PE in /lab/samples/ after make fetch-sample (use the path it printed). If you skipped the fetch, work the offline fallback in the indented notes: a synthetic target you pack yourself, which exercises the identical mechanism.

  1. [ ] Confirm the sample is packed — entropy + section names. Run python3 /lab/data/entropy_check.py /lab/samples/<sha256> and record the entropy. A packed binary runs hot — sections (or the whole file) above 7.0. Then dump section names (strings /lab/samples/<sha256> | grep -iE 'UPX|This file is packed', or objdump -h for the PE section table) and find the .UPX0 / .UPX1 pair and the UPX banner. Those two signals together — high entropy and UPX section names — are your "this is UPX-packed" verdict.

Hint: entropy below ~6.5 is typical for unpacked code; a UPX-compressed section sits above 7.0. This is T1027.002.

Offline fallback: start from the clean synthetic binary. sha256sum /lab/data/target and python3 /lab/data/entropy_check.py /lab/data/target (records a low-entropy baseline), then cp /lab/data/target /tmp/target_packed && upx --best /tmp/target_packed to create your own packed copy. Re-run entropy_check.py on /tmp/target_packed and watch entropy cross 7.0 — that delta is the teachable signal.

  1. [ ] Snapshot what packing hid — strings and imports before unpacking. Run strings and (for the PE) list imports (objdump -p, or pefile if you brought it) on the still-packed sample. You should see almost nothing of substance — a short stub and a near-empty import table. Save this "before" view; the contrast in step 4 is the lesson.

  2. [ ] Unpack with the UPX stub. Run upx -d /lab/samples/<sha256> -o /tmp/unpacked (work on a copy; upx -d rewrites in place otherwise). UPX decompresses cleanly because it ships a standard unpack stub — no execution required. Re-run entropy_check.py on /tmp/unpacked: entropy should drop back toward normal, confirming the compressed layer is gone.

Hint: if upx -d errors, the sample may be a UPX-lookalike or have had its header tampered with — note it and try another sample from the tag.

Offline fallback: upx -d /tmp/target_packed -o /tmp/target_unpacked, then sha256sum /tmp/target_unpacked and compare against the clean-binary hash from step 1 — they must match, proving upx -d is loss-free. (For the real sample you have no clean reference hash, so the drop in entropy and the restored strings/imports are your proof instead.)

  1. [ ] Re-analyse the unpacked payload — strings and imports after. Re-run strings and the import dump on /tmp/unpacked. Compare against your step-2 "before" snapshot: the import table is now populated and real strings (URLs, paths, error messages) appear. This is why you unpack — the indicators packing concealed are now visible for triage.

  2. [ ] Decode the obfuscated loader strings. Packing hides a binary's strings; in-script obfuscation hides them inside source. Examine data/encoded_strings.py — an XOR-encoded byte array with a decode loop. Identify the key byte and the encoded blobs, then run python3 /lab/data/decode.py to recover the IOCs (a C2 URL, a registry key, a mutex). Record them.

  3. [ ] Map to ATT&CK. The packing maps to T1027.002. The string encoding maps to T1027.013. Record both IDs in your notes alongside the evidence (the entropy before/after, the UPX section names, the restored imports, and the recovered IOCs).

  4. [ ] Author a YARA rule on the UPX section names and prove it (the build half). Detecting the packer is only half the job — turn the indicator into a detection. Write upx-packed.yar that keys on the UPX section-name strings you confirmed in step 1 (UPX0, UPX1, and the $Info: This file is packed with the UPX banner), requiring more than one of them in the condition so a stray substring can't trip it. Then prove the two-sided result: run it against the packed sample (the real /lab/samples/<sha256>, or /tmp/target_packed in the fallback) — it must match — and against the unpacked twin (/tmp/unpacked, or /tmp/target_unpacked) — it must not match — and against /bin/ls (a benign, unpacked control), which must stay quiet too. That no-match on the unpacked twin is the whole point: the rule detects packing, not the payload, so unpacking the sample makes it disappear from the rule. If it fires on the unpacked binary, your strings leaked out of the UPX stub — narrow them to the section names only. Defeating the obfuscation and authoring the packer-detection rule that proves you recognised it are equal halves. Hint: yara /path/to/rule /path/to/file; no pe module is needed — these are plain section-name strings, so an any of them / 2 of them string condition is enough.

Success criteria — you're done when

  • [ ] The packed sample is confirmed packed: entropy > 7.0 and the .UPX0/.UPX1 section names (or UPX banner) are present.
  • [ ] upx -d succeeds and entropy drops back toward normal on the unpacked output (offline fallback: pre-pack and post-unpack SHA-256 hashes match).
  • [ ] The before/after strings (and import) comparison shows the indicators packing had hidden.
  • [ ] decode.py output shows the plaintext IOC strings.
  • [ ] Your notes contain both ATT&CK technique IDs with evidence.
  • [ ] upx-packed.yar matches the packed sample and does not match the unpacked twin (or /bin/ls) — the build half, proven two-sided.

Deliverables

Commit to your portfolio repo: - packing-notes.txt — entropy before/after, the confirmed UPX section names, the before/after strings/import comparison (offline fallback: pre-pack and post-unpack hashes with equivalence confirmed), recovered IOC strings, and ATT&CK IDs. - upx-packed.yar — the authored packer-detection rule, with the match (packed sample) / no-match (unpacked twin, /bin/ls) proof recorded in packing-notes.txt. - Do not commit the fetched sample, any unpacked payloads, compiled binaries, or entropy reports containing sample hashes.

Automate & own it

Required. Write triage_packed.sh — a shell script that accepts a binary path as its argument and: 1. Computes and prints the SHA-256 hash. 2. Runs strings and checks for UPX signatures, printing [PACKED] UPX detected or [CLEAN] No UPX signature. 3. If UPX is detected, attempts to unpack with upx -d, then re-hashes the result and prints both hashes with a match/mismatch verdict.

Draft the script with AI, then test it against both the packed and clean binary to confirm both code paths work. Commit triage_packed.sh.

AI acceleration

Paste data/encoded_strings.py into a model and prompt: "This Python script XOR-encodes strings. Identify the key, decode all encoded byte arrays, and list the plaintext strings." Verify each decoded string manually by running decode.py and comparing. Note any discrepancy.

Connects forward

Module 10 covers the case where static unpacking fails — the packer is custom and only unpacks at runtime. You will use strace and LD_PRELOAD tricks to extract the payload from a running process without a debugger.

Marketable proof

"I can identify a packed binary by entropy signature, apply a static unpacker, verify payload integrity by hash, and decode XOR-obfuscated strings from loader scripts — producing ATT&CK-tagged evidence for an IR case file."

Stretch

  • Modify the UPX section names with upx --force plus a hex-editor patch (rename UPX0/UPX1 to arbitrary 4-byte names — a trick real loaders use to defeat naive packer rules), then confirm your step-8 rule now misses the packed binary. Add a second condition to upx-packed.yar that keys on a packer-independent signal instead — the high .text-section entropy or the UPX decompressor-stub byte pattern — and show it re-catches the renamed sample. This is the arms race in miniature: string rules are brittle, structural ones are not.
  • Extend triage_packed.sh to run your upx-packed.yar as part of its detection step, so the script both classifies (rule fires) and unpacks in one pass.

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).