Module 03 — Static Analysis — Strings & PE¶

Type 1 · Concept Autopsy — read a PE's strings, import address table, per-section entropy, and COFF timestamp to reason out intent (a suspicious-API combination is a capability declaration), deliverable a JSON metadata report plus a corpus-verified YARA rule that fires on the malicious PE and stays quiet on a benign control. (Secondary: Tool-Build — the metadata extractor is reusable.) Go to the hands-on lab →

Last reviewed: 2026-06

Malware Analysis — before you execute a single instruction, the binary tells you what it intends to do — if you know where to look.

Difficulty: Intermediate · Estimated time: ~5–7 hrs (study + lab) · Prerequisites: Foundations

In 60 seconds

Before a binary runs a single instruction, it declares what it intends to do — if you know where to look. The Import Address Table is the highest-signal static data: CreateRemoteThread + VirtualAllocEx + WriteProcessMemory together are process injection, not a guess. Strings expose C2 domains, registry keys, and mutexes; the COFF timestamp catches lazy forgers. You turn those findings into a YARA rule — and prove it fires on the malicious PE but stays quiet on a benign control. Static analysis is the safe first move: the sample never runs.

Why this matters¶

Agent Tesla has been one of the most prevalent commodity infostealers in the world for years — a .NET keylogger-and-credential-stealer sold as malware-as-a-service and delivered by the millions through phishing attachments. It is exactly the sample a triage analyst meets on a normal Tuesday, and it is the one you analyze in this lab. Static analysis is the safest place to start: the sample never runs. For many triage and classification tasks, static analysis alone is enough to determine the family, identify its capabilities, and write a detection rule — without ever touching a sandbox. An Agent Tesla binary declares itself statically: its import table reveals the keylogging hooks, its strings expose the credential-store paths and the SMTP server it exfiltrates to, and its section layout tells you whether it's packed. The cost of getting static analysis wrong is low (you miss something); the cost of skipping it is that you begin dynamic analysis blind. (FortiGuard's dissection of an Agent Tesla variant — keylogging, browser/email/FTP credential theft, SMTPS exfil on port 587 — is the worked example behind this module.)

Objective¶

Extract all human-readable strings and the full import address table from a PE binary, calculate per-section entropy, parse the COFF timestamp, and produce a JSON metadata report — then author a YARA rule from those findings (the suspicious-API combination plus a distinctive string) and prove it matches the malicious PE but stays quiet on a benign control. Reasoning about which imports and strings suggest malicious intent and turning that judgement into a corpus-verified detection are equal halves.

The core idea¶

The mental model

A compiled binary is not an opaque blob — it is a structured container (the PE format) that declares its intentions. The Import Address Table (IAT) is the highest-signal static data: every Windows API the binary told the OS it intends to call. CreateRemoteThread + VirtualAllocEx + WriteProcessMemory together are the process-injection primitive; WSAStartup / connect / send / recv is a socket client. You're not guessing. (Agent Tesla's keylogger surfaces as SetWindowsHookEx / GetAsyncKeyState — declared statically, before a single instruction runs.)

Strings extraction is the complement. strings (or its Unicode-aware equivalent) reads every sequence of printable characters above a minimum length out of the binary's raw bytes. In an unpacked binary, you will find: URLs (C2 domains), file paths the malware touches, registry keys it reads or writes, error messages (which are often the clearest indicator of purpose), mutex names (used by malware to detect if a previous instance is already running), and sometimes Base64-encoded payloads or embedded shell commands. A mutex name like Global\\{a37f78d2-3b1c-4e5a} is far more unique than a hash — it is searchable across samples.

Go deeper: the COFF timestamp

The COFF timestamp in the PE header records when the linker ran. Attackers frequently zero it, set a plausible-but-fake date (timestamp manipulation, T1027.005), or leave the real compile time. A 2015 timestamp on a file that arrived in 2024 is suspicious; a future timestamp is a near-certain forgery. It won't catch a careful attacker who sets a convincing date, but it catches lazy ones at zero cost.

The gotcha

Static analysis only sees what's unpacked. The entropy story from Module 02 continues here: strings on a packed PE returns almost nothing useful from the code section — you get the unpacker stub's strings, not the payload's. Don't mistake a quiet IAT/strings dump for a benign binary; it may just mean "packed — unpack first." (The stub's strings are still informative: they often identify the packer, which maps to a family cluster.)

Learn (~4 hrs)¶

The real sample — what you'll triage (~25 min) - FortiGuard — Analysis of a New Agent Tesla Spyware Variant — a vendor dissection of the exact family the lab analyzes: the keylogging and browser/email/FTP credential-theft capabilities, the SMTPS (port 587) exfiltration, and the sample IOC hashes. Read it to know what the strings and IAT should reveal before you open the binary.

PE format internals - Corkami — PE101 (illustrated reference) — the best one-page visual map of the PE format. Print it or keep it open. (~20 min.)

Import table and API semantics - MalAPI.io — a searchable reference mapping Windows API functions to malware techniques. Look up every import you don't recognise. Essential for IAT analysis.

Strings analysis - Mandiant — FLOSS (FLARE Obfuscated String Solver) README — why plain strings is not enough: FLOSS additionally extracts the obfuscated and stacked strings malware decodes at runtime. Read "What does FLOSS do?" to know what each output category means before you triage the sample.

pefile Python library - pefile GitHub README + examples — the API reference; focus on OPTIONAL_HEADER, DIRECTORY_ENTRY_IMPORT, and sections attributes.

Key concepts¶

PE header fields: MZ stub, COFF header, Optional Header, section table
Import Address Table (IAT) — the capability declaration
strings output: what each category of string implies
Shannon entropy per section: packed vs. unpacked code
COFF compile timestamp and timestamp manipulation (T1027.005)
Mutex strings as family indicators
MITRE ATT&CK T1106 (Native API), T1027 (Obfuscated Files or Information)
Author then verify: write the YARA rule on the recovered IAT/string findings and prove it matches the sample, not the benign corpus — the build half
Real worked family: Agent Tesla (.NET infostealer) — the IAT exposes its keylogging hooks, the strings expose credential-store paths and the SMTP exfil server; the sample declares its purpose statically

AI acceleration¶

AI is excellent at pattern-matching a list of imports against known malicious combinations — give it your IAT dump and ask "which combinations of these APIs are associated with which MITRE ATT&CK techniques?" Verify the result against MalAPI.io; AI sometimes hallucinates API names or maps them to incorrect techniques.

AI caveat

A model is fast at mapping imports to ATT&CK, but it hallucinates API names and mis-maps techniques. Treat its output as a lead list to confirm against MalAPI.io — never as the finding itself.

Check yourself

Why is "the binary declares its intentions" a more accurate description of the IAT than "a list of functions"?
Your strings/IAT dump comes back nearly empty. What does that most likely mean, and what does it not mean?
"It matched the sample" — why isn't a freshly authored YARA rule done until it's run against a benign control?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).