Lab 08 — Decompilation & Code Analysis¶

Hands-on lab · ← Back to the module concept

Setup¶

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/08-decompilation-code-analysis
make up
make fetch-sample      # pulls a real Agent Tesla sample from MalwareBazaar into the isolated container
make demo

⚠ This lab decompiles a live malware sample. Handle it accordingly. - Static only. Nothing in this lab executes the sample — analysis is decompilation (dnSpy/ILSpy on the .NET assembly, or retdec/Ghidra). Do not run it. - Isolation. All work stays inside the isolated container; never copy the sample to your host. - Hygiene. The sample is fetched at lab time (password-protected zip, password infected) and is never committed — .gitignore covers samples/. make fetch-sample needs a free abuse.ch Auth-Key (set MB_AUTH_KEY). - Offline fallback. No key / MalwareBazaar unreachable? Skip make fetch-sample; make demo falls back to the bundled synthetic target.c — the same string-decode-from-a-buffer mechanism, deliberately legible.

Scenario¶

A triage queue drops a sample flagged as Agent Tesla — the .NET infostealer FortiGuard dissects (MITRE S0331). Agent Tesla doesn't leave its C2 host, SMTP credentials, or panel paths sitting in plaintext: per the FortiGuard analysis, "all the constant strings in the .NET program are encoded and saved within a large buffer, and every string is assigned an index" — at use, an index is passed to a decode function that returns the plaintext. strings on the assembly yields the encoded blob, not the config. Your job is to recover it the analyst's way: decompile the assembly, find the string-decode routine, identify the algorithm, and reconstruct the plaintext config — without executing the binary.

The same mechanism, made legible, lives in the bundled data/target.c (read it only after the analysis): a small C program implementing the classic RC4 key-scheduling + keystream that underpins this style of string protection. The container compiles it stripped (-O2 -s) and runs the OSS retdec decompiler to reconstruct pseudo-C — your warm-up for reading a decode routine with no symbols. Start there to learn the loop's fingerprint, then turn dnSpy/ILSpy (or Ghidra/retdec) on the real Agent Tesla assembly in /lab/samples/.

Do¶

[ ] Compile the target binary. Run make demo first to see the full automated flow. Then make shell to work interactively. Inside the container: gcc -O2 -s -o /tmp/target /lab/data/target.c — the -s flag strips symbols, simulating a real stripped binary.

Hint: check that the output binary exists with file /tmp/target; confirm it reports "stripped" in the output.

[ ] Decompile with retdec. Run retdec-decompiler /tmp/target -o /tmp/target.c. Examine /tmp/target.c — this is the pseudo-C output. Focus on the function retdec names something like main or unknown_function.

Hint: cat /tmp/target.c or open it in less. Look for a for loop with a modulo (%) operation.

[ ] Trace the key-scheduling loop. Find the loop that initialises a 256-element array. It should have the pattern: arr[i] = i followed by a second loop that swaps elements using a key. Map each variable to its semantic role: which one is the index? Which one is the running sum? Which one is the key byte?

Hint: a 256-byte state array initialised to the identity, then permuted by a running index that mixes in % key_length of a key, is the fingerprint of a well-known stream cipher's key schedule — identify which one in step 6.

[ ] Rename in your notes. You cannot rename in-place in the retdec output, but you can annotate. Copy the relevant function to a file analysis-notes.txt in /lab/ and add C-style comments naming each variable (/* key_index */, /* swap_temp */, etc.).
[ ] Cross-check against strings. Run strings /tmp/target and objdump -d /tmp/target | grep -A 20 '<main>' to see the disassembly alongside the decompiled version. Confirm the loop boundaries match.
[ ] Write the algorithm summary. In analysis-notes.txt, add a two-sentence summary: the algorithm family, the key length used in the demo, and one distinguishing characteristic visible in the decompiled output.
[ ] Turn the same skill on the real Agent Tesla sample (opt-in — needs make fetch-sample). The synthetic target.c taught you to read a no-symbol decode routine; now find one in the wild. Open the assembly in /lab/samples/ with a .NET decompiler (dnSpy or ILSpy on your host; or extract IL with monodis in-container) and locate the string-decode method the FortiGuard writeup describes — the one that takes an integer index and returns a plaintext string from the encoded buffer. Trace it: where is the encoded buffer? What transform (XOR/RC4/AES, base64 wrapper) does the decoder apply? Annotate one decoded call site in analysis-notes.txt. (Heavily obfuscated or packed samples may show only a stub — note that too; recognising the protection layer is a valid finding. Static only; never execute.)

Success criteria — you're done when¶

[ ] retdec-decompiler output exists at /tmp/target.c and is non-empty.
[ ] Your analysis-notes.txt has renamed variables for every loop variable in the KSA function.
[ ] You can state in two sentences what algorithm the function implements and how you know.
[ ] The disassembly (objdump) loop boundaries match the decompiler's loop boundaries.

Deliverables¶

Commit to your portfolio repo: - analysis-notes.txt — annotated pseudo-C with renamed variables and a two-sentence algorithm summary. - Do not commit the compiled binary, the retdec raw output, or any sample captures.

Automate & own it¶

Required. Write a Python script annotate_decomp.py that: 1. Takes a retdec pseudo-C file as input (sys.argv[1]). 2. Detects the presence of a modulo-256 loop (regex on % 256 or i < 256). 3. Prints [MATCH] Possible key-scheduling loop at line N for each hit. 4. Exits with code 0 if a match is found, 1 if not.

Draft it with AI assistance, then read every line and confirm the regex is not over-broad (test it against a benign file with no loops). Commit annotate_decomp.py alongside your notes.

AI acceleration¶

Paste the decompiled function into a model. Prompt: "This is retdec pseudo-C output from a stripped binary. Identify the algorithm, name each variable by its role, and explain the loop in plain English." Use the response as a hypothesis — verify each rename against the data flow before accepting it. Note which renames you accepted and which you changed.

Connects forward¶

Module 09 picks up where this module leaves off: once you can read decompiled logic, you need to handle the case where the code is packed and the decompiler sees a stub rather than the real algorithm. Unpacking is prerequisite to decompilation on most real samples.

Marketable proof¶

"I can open a stripped binary in a free decompiler, identify a custom cipher implementation from the loop structure, rename variables to match the algorithm's semantics, and document my findings in a format suitable for an IR case file."

Stretch¶

Modify data/target.c to add a second function that implements a simple Caesar cipher. Recompile, re-decompile, and confirm retdec produces two recognisable functions. Extend annotate_decomp.py to detect Caesar-style add/modulo loops as well as XOR-swap loops.

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).