Lab 01 — Map an Attack Surface¶

Hands-on lab · ← Back to the module concept

Setup¶

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/offensive/01-recon
make up      # builds the Python recon harness
make demo    # runs 4-step passive recon on example.com bundled data (offline)
make down

Bundled data: data/crt_sh.json (8 CT log entries), data/dns_enum.json (12 resolved subdomains, NS/MX/SPF), data/tech_stack.json (8 fingerprinted hosts). The recon.py harness implements the four passive recon steps — CT log parsing, DNS grouping, tech fingerprinting, priority scoring — without live network calls, so the demo is deterministic.

Authorization: this app is yours — attack it freely. The habit still matters everywhere else: only test systems you own or have explicit written permission to test (DVWA, PortSwigger Academy, targets you own).

Scenario¶

A client has asked for an external attack-surface assessment. Start with no credentials — only a domain name. Map every externally visible asset, fingerprint the stack, and identify priority targets for the next phase.

This lab is live-first. The primary path is real passive recon (crt.sh + DNS) against a real domain you control or one that is in scope for a public bug-bounty program — that is how the work is actually done, and it pulls real edge CVEs by name. The bundled example.com dataset (RFC-2606 reserved) is the offline fallback so the demo is deterministic and so you can validate the harness before pointing it at a live target. Do the live run; fall back to the bundled data only when you have no authorized domain handy.

Live mode — the primary path (authorized targets only)¶

Run real passive recon against a domain you control (your own site, a lab tenant) or a domain that is in scope for a public bug-bounty program (read the program scope first — only in-scope assets). This is the real workflow; do this before falling back to the bundled data.

# 1. Pull live crt.sh output for your authorized domain:
curl -s "https://crt.sh/?q=%.yourdomain.com&output=json" > data/crt_sh.json

# 2. Resolve each discovered host (requires the host CLI):
for sub in $(jq -r '.[].name_value' data/crt_sh.json | sort -u); do
  host $sub 2>/dev/null | grep "has address" | awk '{print $1, $NF}'
done

# 3. Fingerprint and score with the harness against your populated data:
make shell && python3 recon.py --report

Authorization: only run live recon against assets you own or that are explicitly in scope (your domain, a written-permission engagement, a bug-bounty program's listed scope). Passive recon still creates logs on the target's DNS servers.

Do¶

Run steps 1–5 against your live target where you have one; use the bundled example.com dataset as the offline fallback otherwise.

[ ] Run the recon (live, or make demo for the bundled fallback). Read the priority ranking: what are the top three targets and why does each score high? Which CVE is the most critical?
[ ] Add a new CT entry for a backup VPN host (e.g. vpn2.<domain>) — live, this surfaces naturally from crt.sh; offline, add it to data/crt_sh.json as vpn2.example.com. Re-run and confirm it appears in the subdomain list.
[ ] If the SPF record includes a third-party relay (e.g. sendgrid.net), what does that mean from a phishing-simulation perspective? What check would confirm whether the target actually uses that relay for outbound email?
[ ] The score_interest() function in recon.py uses hardcoded rules. Extend it: if the tech stack contains WordPress, add 15 points (WP has a large CVE surface and many discoverable plugins). Confirm a WordPress host (e.g. www.example.com in the bundled data) rises in the ranking.
[ ] Run python3 recon.py --report (in the container shell via make shell) and read the generated recon-report.md. This is your deliverable template.

Success criteria — you're done when¶

[ ] You can explain why certificate transparency is more comprehensive than brute-force DNS enumeration for passive recon.
[ ] The score_interest() WordPress extension fires correctly.
[ ] You generated recon-report.md with the asset inventory and top-target rationale.

Deliverables¶

recon-report.md (generated by python3 recon.py --report): the scope statement, full asset inventory with sources, and the top three priority targets with CVE justification. This feeds directly into module 02's scanning phase.

AI acceleration¶

Have a model interpret the tech-stack fingerprint and suggest CVEs to check — then validate each against NVD before including it in your report. Models hallucinate CVE numbers; always cross-reference the NVD entry.

Automate & own it¶

Required. With AI drafting and you reviewing every line: extend recon.py with a --live flag that calls the real crt.sh API (https://crt.sh/?q=%.{domain}&output=json) and writes the response to data/crt_sh.json before running the analysis. Commit the extended script.

Connects forward¶

The priority targets from this lab — especially vpn.example.com (FortiGate CVE-2024-21762) and jira.example.com (Confluence CVE-2023-22515) — are the scope input for module 02 (active scanning). The subdomains feed module 03 (vuln ID).

Marketable proof¶

"I map an external attack surface passively — CT logs, DNS enumeration, tech fingerprinting, and priority scoring — and deliver a structured recon report the way bug-bounty and red-team engagements actually start."

Stretch¶

Add email-infrastructure recon: check the DMARC record (_dmarc.<domain>), parse the SPF include chain recursively, and assess the spoofing risk level.

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).