Andrewliera/GhostMap
GitHub: Andrewliera/GhostMap
Stars: 0 | Forks: 0

# GhostMap
A behavioral web reconnaissance tool for authorized bug bounty research.
GhostMap crawls a target with a real browser, captures the network traffic the browser sees, classifies the endpoints it observes, and surfaces the ones most likely to be worth manual investigation. It's designed to fit into how bug bounty hunters actually work: scoped to one program at a time, careful about authorization, and structured so verification state survives across re-scans.
ghostmap workspace init
ghostmap -w --idor --html-report
ghostmap workflow -w
ghostmap verify -w --finding F-a3f1c8 --status testing
## Why this exists
Most security recon tools optimize for finding *more*. GhostMap optimizes for finding *less, but reliable*. After watching scans surface hundreds of false positives — Datadog RUM beacons flagged as IDOR candidates, ad-tracker pixels scored as priority endpoints, sequential numeric IDs in third-party analytics URLs lighting up the dashboard — the design goal became: filter aggressively, surface only what a hunter could plausibly verify in 30 minutes, and make the audit trail of *what was tested and why* easy to reconstruct months later.
The tool is built around the workflow a bug bounty hunter actually uses: one program at a time, careful scope verification before every scan, manual verification of each finding before submission, and persistent state across the many re-scans a single engagement requires.
## Design decisions worth calling out
**Scope-aware authorization gating.** Every program has its own YAML scope file declaring in-scope hosts, out-of-scope paths, and per-program permissions for active behaviors (IDOR mutation, hidden route probing, authenticated scanning). The scan refuses to run against out-of-scope hosts and refuses active behaviors the program hasn't permitted. This isn't policy in the user's head — it's enforced in code. ([`core/scope.py`](core/scope.py))
**Cross-scan finding registry with stable IDs.** Findings are keyed by a content hash of (finding type, normalized target, distinguishing parameter), not by sort position in a particular scan. `F-a3f1c8` for a given IDOR candidate stays `F-a3f1c8` across every re-scan, so verification state, evidence bundles, and report drafts attach to the actual finding, not to a number that shifts when the next scan finds something new. ([`core/workspace.py`](core/workspace.py))
**Defense-in-depth credential redaction.** When a scan uses `--cookie` or `--auth-header`, the auth credential never reaches disk. The crawler redacts request/response headers at capture time. The output writer runs a second redaction pass before serializing. POST bodies are scrubbed for JSON tokens, form-encoded credentials, Bearer/Basic auth values, OTP fields, and CSRF tokens. The goal is that someone reading a scan JSON committed to a public repo by mistake can't extract a usable session. ([`core/auth.py`](core/auth.py))
**Workspaces separate code from data.** Earlier versions wrote scan output to the same directory that held renderer modules, making `rm output/*.json` a foot-gun. Workspaces give each program its own directory tree: scope file, scans, reports, evidence bundles, and the findings registry. No accidental deletion of renderer code while cleaning up old scans.
## What it actually does
Input: one target URL + a scope file + optional auth credential
│
▼
┌───────────────────┐
│ Playwright crawl │ follows in-scope links only, captures network XHRs
└───────────────────┘
│
▼
┌───────────────────┐
│ Analyzers │ noise filter, IDOR candidate detector, response
│ │ classifier, hidden route prober (if scope permits)
└───────────────────┘
│
▼
┌───────────────────┐
│ Workspace │ stable F-### IDs, deduplicated across scans,
│ registry │ verification state attached to findings
└───────────────────┘
│
▼
Output: scan JSON, HTML dashboard, markdown triage workflow,
evidence-redacted by default
## Current state
- 99 unit tests, ruff clean, pyright clean
- Tested against six real bug bounty programs (gonzagatech [self-owned], Compass, Meesho, Twilio, Remitly, others) with permissions explicitly verified before each scan
- Single-author project, ~3500 LOC Python
- Not yet packaged for PyPI; install from source
## Install
Requires Python 3.10+ and Playwright.
git clone https://github.com/andrewliera/ghostmap.git
cd ghostmap
pip install -e .
playwright install chromium
## Quickstart
# Initialize a workspace for one bug bounty program
ghostmap workspace init
# (auto-uses scopes/.yaml if it exists)
# Run a scoped scan
ghostmap https://target.example.com -w --idor --html-report
# Triage the findings
ghostmap workflow -w
cat workspaces//reports/workflow.md
# Mark findings as you work through them
ghostmap verify -w --finding F-a3f1c8 --status testing --note "reproducible in browser"
# Attach sanitized evidence
ghostmap evidence -w --finding F-a3f1c8 \
--request evidence/req.txt --response evidence/resp.txt
# Draft a conservative bug bounty report
ghostmap report -w --finding F-a3f1c8
## Authenticated scanning
For programs that explicitly permit authenticated automated testing:
# Store the credential in an environment variable so it never enters shell history
read -s GHOSTMAP_TOKEN
export GHOSTMAP_TOKEN
# Use --auth-header for bearer-token APIs (most modern SPAs)
ghostmap https://target.example.com/dashboard -w \
--auth-header "Authorization: Bearer $GHOSTMAP_TOKEN" \
--html-report
# Or --cookie for session-cookie APIs
ghostmap https://target.example.com/dashboard -w \
--cookie "$GHOSTMAP_COOKIE" \
--html-report
# When finished, invalidate the credential and unset the variable
unset GHOSTMAP_TOKEN
The credential is redacted from all on-disk output. It's still present in memory during the scan, and your responsibility to invalidate when the engagement is done.
## Scope files
Each program gets a YAML file declaring what's testable:
program: example-bbp
platform: hackerone
in_scope:
hosts:
- example.com
- "*.example.com"
url_patterns: []
out_of_scope:
hosts:
- admin.example.com
path_patterns:
- "^/admin"
- "^/internal"
- "^/staging"
permissions:
active_idor_testing: false # automated identifier mutation
hidden_route_probing: true # wordlist-based route discovery
authenticated_scanning: false # use of --cookie / --auth-header
notes: |
Source:
Last verified: YYYY-MM-DD
A starter template lives at [`scopes/_template.yaml`](scopes/_template.yaml). Defaults are strict on purpose — enable active behaviors only after reading the specific program's rules.
## What this tool is NOT
- Not a vulnerability scanner. It surfaces candidates worth manual investigation. It doesn't claim to find vulnerabilities.
- Not autonomous. Every active behavior requires the user to verify scope and program rules permit it.
- Not for unauthorized testing. The scope gate exists to protect both the user and the target. Running scans outside the scope file's in-scope hosts requires explicit permissive mode and is your legal responsibility.
- Not a substitute for understanding the target. The hunter's job is the manual verification work — the tool just makes the candidate list smaller and the evidence cleaner.
## Roadmap
Near-term:
- Expand vendor noise filter (Taboola, BidSwitch, Adobe Audience Manager, 1rx.io, additional ad-tech)
- SPA-aware crawler wait strategies for sites where `wait_until="commit"` returns before JS hydration
- Authenticated recon helper: parse a HAR file to discover XHR endpoints the crawler can't reach via link extraction
Medium-term:
- Per-finding stable hash that's also human-meaningful (not just `F-XXXXXX`)
- Optional hosted scan tier for users who want to run scans from a clean IP and shared workspace
- Burp Suite session import — pull auth context from an existing Burp project
## Project status and contact
GhostMap is an open-source project, MIT licensed.
Built by Andrew Gonzaga ([andrewliera](https://github.com/andrewliera)) — software engineer, embedded systems and AppSec background. I'm available for AppSec consulting, security tooling work, and full-time security engineering roles. Reach me at andrew.lira.gonzaga@gmail.com or via [gonzagatech.com](https://gonzagatech.com).
If GhostMap helps you on a real engagement, I'd genuinely like to hear about it. If it surfaces a false positive that I should be filtering, please open an issue with the URL pattern.
## License
MIT — see [LICENSE](LICENSE).