Lab 07 — Disassembly Basics¶
Hands-on lab · ← Back to the module concept
Setup¶
git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/07-disassembly-basics
make up
make fetch-sample # pulls a real GuLoader sample from MalwareBazaar into the isolated container
make demo
⚠ This lab disassembles a live malware sample. Handle it accordingly. - Static only. Nothing in this lab executes the sample — analysis is
objdump,radare2, and (optionally) Ghidra disassembly. Do not run it. - Isolation. All work stays inside the isolated container; never copy the sample to your host. - Hygiene. The sample is fetched at lab time (password-protected zip, passwordinfected) and is never committed —.gitignorecoverssamples/.make fetch-sampleneeds a free abuse.ch Auth-Key (setMB_AUTH_KEY). - Offline fallback. No key / MalwareBazaar unreachable? Skipmake fetch-sample;make demofalls back to the bundled syntheticcrackme— the same XOR-decode mechanism, deliberately legible.
Scenario¶
A triage queue drops a sample flagged as GuLoader (a.k.a. CloudEyE) — the shellcode-based downloader dissected by CrowdStrike and Unit 42 (MITRE S0561). GuLoader stores its strings — and most of its own shellcode — encrypted, unpacked at runtime by a small decode stub: a handful of executable bytes at the front that XOR-decode the encrypted tail before control jumps into it. strings finds nothing useful; the readable form only exists after the stub runs. Your job is to recover it the analyst's way — disassemble the stub, identify the XOR loop, extract the key, and reconstruct the plaintext without ever executing the binary.
The same mechanism, made legible, lives in the bundled crackme (compiled from data/crackme.c, which you read only after the analysis): one XOR-decode loop over a static encrypted buffer. The synthetic binary is the offline fallback — start there to learn the loop shape, then point the same workflow (objdump/radare2/Ghidra) at the real GuLoader PE in /lab/samples/. Treat the disassembly as your only source of truth.
Do¶
-
[ ] Generate a flat disassembly with
Openobjdump./tmp/crackme.asmand find thedecodefunction. (Hint: search for<decode>in the objdump output.) -
[ ] Identify the loop structure. In the
decodefunction, locate the instruction sequence that forms the loop: the counter increment, thecmportestinstruction, and the conditional jump. Write down the loop bounds — how many iterations does it run? -
[ ] Extract the XOR key. Find the
xorinstruction inside the loop. What register or immediate value is the key? Is it a single byte (constant) or is it derived from the loop counter? -
[ ] Read the encrypted buffer from the disassembly. The encrypted bytes are stored as a static array in the binary. Locate the
.rodatasection in objdump output or the data segment reference in thedecodefunction. Read the raw bytes. -
[ ] Recover the plaintext string manually. Apply the XOR key to each byte. Write the result. If your recovered string is readable ASCII, you found the key. If it is not, re-examine the key width and any endianness considerations. (Hint:
python3 -c "print(bytes(b ^ KEY for b in ENCRYPTED_BYTES))") -
[ ] Confirm with radare2. Run:
Compare radare2's disassembly ofdecodeto objdump's. Note any differences in how each tool presents the same instructions. -
[ ] Make demo runs all of the above automatically and shows the recovered string. Verify your manual recovery matches the demo output.
-
[ ] Turn the same workflow on the real GuLoader sample (opt-in — needs
make fetch-sample). The syntheticcrackmetaught you the loop shape; now find it in the wild. Disassemble the real PE in/lab/samples/(objdump -D -M intel, orradare2 -q -c 'aa; pd 80 @ entry0', or Ghidra headless for a graph view) and locate the decode stub the CrowdStrike/Unit 42 writeups describe: a short run of executable bytes that XOR-decodes the encrypted region before jumping into it. You will not get a clean nameddecodesymbol — that is the point. Identify the XOR instruction, note the key, and write down where the encrypted region begins. (You do not need to fully unpack it — recognising the stub and its key is the deliverable. Static only; never execute.) -
[ ] Author a YARA hex-pattern rule on the recovered stub and prove it (the build half). Recovering the string is only half the job — the byte sequence of the
decodeloop is a signature. Capture the exact opcode bytes of the XOR loop from your objdump output (themovzx/xor/mov/inc/cmp/jnesequence — mask register-encoding or address bytes that vary with??wildcards so the pattern is stable). Writedisasm-xor.yarwith that hex string{ ... }in the condition; optionally also include the encrypted blob bytes from.rodata. Then prove the two-sided result:yara disasm-xor.yar /lab/crackmemust match, andyara disasm-xor.yar /bin/ls(a benign control with its own, different code) must not match. If the rule fires on/bin/ls, your pattern is too generic — it caught a common compiler idiom, not this stub; widen the captured run or add the blob bytes until onlycrackmematches. Recovering the plaintext and authoring the byte-pattern detection that proves you isolated the right bytes are equal halves. (Hint: get the bytes fromobjdump -d— each instruction's machine code is in the second column; use??for the operand bytes you expect to differ across builds, keep the opcodes fixed.)
Success criteria — you're done when¶
- [ ] The XOR key is correctly identified (exact byte value).
- [ ] The plaintext string is recovered manually and matches the demo output.
- [ ] The loop structure (bounds, increment, condition) is documented in your notes.
- [ ] Both objdump and radare2 disassembly outputs are reviewed.
- [ ]
disasm-xor.yarmatches/lab/crackmeand does not match/bin/ls— the build half, proven two-sided.
Deliverables¶
disassembly-notes.md — your annotated analysis: the loop structure (in pseudocode), the XOR key, the encrypted bytes, and the recovered string. annotate_asm.py (see Automate & own it), disasm-xor.yar (with the match/no-match proof recorded in the notes). Commit all three.
Automate & own it¶
Required. Write annotate_asm.py: a Python script that reads the objdump output file and identifies all xor instructions, printing each one with the surrounding 3 lines of context and a label: [LOOP-XOR] if the xor uses a register operand (variable key), [CONST-XOR] if the xor uses an immediate (constant key). AI can draft the regex parsing; you write the operand classification logic by hand (this is the learning — classifying xor rax, rax as a zero-idiom, not a key, requires understanding the instruction).
AI acceleration¶
Paste the decode function disassembly into an AI and ask "what does this function do and what is the XOR key?" Then verify the key independently by tracing through the instructions manually. Check whether the AI's key is a byte or a wider value — this is the most common error.
Connects forward¶
The XOR decryption skill you practice here is the foundation for manual unpacking (reversing a packer's decryption stub) and for writing YARA rules that match on byte patterns in encrypted or obfuscated data. In Track 05 (Cloud Security) and Track 06 (Active Directory), you will see similar obfuscation patterns in PowerShell payloads.
Marketable proof¶
"I disassemble compiled binaries, identify XOR obfuscation loops, extract keys manually, and recover hidden strings without executing the sample."
Stretch¶
- Find any function in the binary that uses a
callinstruction. Trace the arguments: which registers are set before the call, and what does the System V ABI say each register holds? Verify against the C source after the exercise. - Modify
crackme.cto use a two-byte XOR key (XOR each byte withkey[i % 2]). Recompile and re-run your analysis — how does the assembly change?
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).