erkanrzgc/cyberm4fia-scanner

GitHub: erkanrzgc/cyberm4fia-scanner

Stars: 0 | Forks: 1

cyberm4fia-scanner

mission


 ██████╗██╗   ██╗██████╗ ███████╗██████╗ ███╗   ███╗██╗  ██╗███████╗██╗ █████╗

██╔════╝╚██╗ ██╔╝██╔══██╗██╔════╝██╔══██╗████╗ ████║██║  ██║██╔════╝██║██╔══██╗

██║      ╚████╔╝ ██████╔╝█████╗  ██████╔╝██╔████╔██║███████║█████╗  ██║███████║

██║       ╚██╔╝  ██╔══██╗██╔══╝  ██╔══██╗██║╚██╔╝██║╚════██║██╔══╝  ██║██╔══██║

╚██████╗   ██║   ██████╔╝███████╗██║  ██║██║ ╚═╝ ██║     ██║██║     ██║██║  ██║

 ╚═════╝   ╚═╝   ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝     ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝

python modules skills tests license AI output ci last commit

cyberm4fia-scanner is an AI-powered autonomous penetration testing framework for web applications, APIs, networks, and cloud infrastructure.
nuclei-fast scanning, Burp-deep verification, methodology-driven exploitation, ChatGPT-grade reasoning — in one box.

## Why cyberm4fia-scanner | What others give you | What cyberm4fia-scanner gives you | |---|---| | Template / fingerprint hits | **Verified exploits** — active verifiers (Playwright, preload list, polyglot probe) promote `Missing_*` advisories to `*_Exploitable` only when the bug is actually triggerable | | A flat finding list | **Attack chain detection** — Missing CSP + reflected input → Stored XSS Exfil; Insecure Cookie + XSS → Cookie Theft Chain (deterministic patterns + AI-discovered chains) | | 5-line CVE descriptions | **93 AI-loaded methodologies** (`core/ai_skills/`) — every offensive/defensive skill ships its own playbook the AI consults per finding | | One huge `index.html` | **SARIF + Burp XML + Markdown + HTML + JSON + JSONL** — drop into GitHub Code Scanning, Burp Pro, DefectDojo, Jenkins warnings-ng, GitLab SAST without glue code | | Hard fails / pass | **Per-severity exit codes** — `0` clean / `1` critical / `2` high / `3` medium / `4` low — gated by `SCAN_EXIT_THRESHOLD` so CI/CD pipelines pick their threshold | | Vendor lock | **NVIDIA NIM only** — no OpenAI / Anthropic dependency; defaults to `meta/llama-3.3-70b-instruct`, dual-model routing for cost | | "Re-run from scratch" on crash | **Phase-resume + URL-resume** — checkpoints persist per phase + per URL; resume continues right where you stopped | | "Trust the results" | **0-day machine validation gates** — every finding traverses `suspected → evidence_confirmed → verified → exploitable` and SPA / catch-all 200s are dropped by a content-fingerprint filter | ### Core capabilities - **90+ attack modules** spanning web, API, network, cloud, and OSINT — not just an `nmap` / `nuclei` wrapper. - **Self-healing exploit agent** — the LLM writes exploit code, a sandbox runs it, and errors loop back as repair attempts until it works. - **Adaptive LLM orchestration** — the planner reacts to recon + findings each round and chains discoveries (e.g. `LFI` → log poisoning → `RCE`) instead of running a fixed checklist. - **External tool adapter layer** — battle-tested CLIs (`masscan`, `sslyze`, `wpscan`, `arjun`, `nuclei`, `gowitness`, `gitleaks`, `testssl.sh`, `smbmap`, `kube-hunter`, `CloudHunter`) plug in under one `BaseTool` contract. - **MITRE ATT&CK tagged findings**, scope enforcement, and a sandboxed exploit runner — built for authorized testing, not pranks. ## Comparison vs other scanners | | **cyberm4fia** | nuclei | OWASP ZAP | Burp Pro | Acunetix | |---------------------------------------|:--------------:|:-------:|:---------:|:--------:|:--------:| | Template-based detection | ✅ | ✅ | ✅ | ✅ | ✅ | | AI-driven exploit generation | ✅ | ❌ | ❌ | ❌ | partial | | Skill / methodology-based reasoning | ✅ (93 skills) | ❌ | ❌ | ❌ | ❌ | | Multi-stage attack chain detection | ✅ (det+AI) | ❌ | ❌ | manual | partial | | Content-fingerprint FP filter | ✅ (simhash) | ❌ | partial | ❌ | ✅ | | Active exploit verifiers (Playwright) | ✅ | ❌ | ❌ | ✅ | ✅ | | 0-day validation gates | ✅ | ❌ | ❌ | ❌ | ❌ | | SARIF + Burp XML + DefectDojo | ✅ | partial | partial | native | partial | | Per-severity CI exit codes | ✅ | ❌ | partial | ❌ | partial | | Phase-level scan resume | ✅ | ❌ | ❌ | partial | ❌ | | Headless / SPA crawl | ✅ | ❌ | ✅ | ✅ | ✅ | | Open source / self-hostable | ✅ | ✅ | ✅ | ❌ | ❌ | Niche: **AI-augmented offensive scanner with verifiable exploits** — between nuclei's template velocity, Burp's manual depth, and an AI assistant's reasoning. ## Architecture ┌─────────────────────────────────────────────────────────────────────────────┐ │ scanner.py / cyberm4fia CLI / REST API / Interactive wizard / MCP server │ └────────────────────────────────────────┬────────────────────────────────────┘ │ ┌─────────────────┴──────────────────┐ │ Phase pipeline (checkpointed) │ └─────────────────┬──────────────────┘ │ ┌─────────────┬─────────────┬─────────┴────────┬──────────────┬───────────┐ │ recon │ discovery │ per-URL active │ post-scan │ reporting│ │ subdomain │ crawler │ XSS/SQLi/LFI │ auth bypass │ SARIF │ │ tech detect │ fuzzer │ SSRF/CMDi/SST │ business log │ Burp XML │ │ port scan │ param disc │ CORS/CSRF/XXE │ race / jwt │ HTML/MD │ │ wayback │ api spec │ passive hooks │ smuggle/proto│ findings │ └──────┬──────┴──────┬──────┴──────────┬───────┴──────┬───────┴───────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ Cross-cutting: scope enforcement · request budget · │ │ WAF auto-calibration · proxy interceptor · OOB client │ └─────────────────────────────────────────────────────────┘ │ ┌─────────────────┴──────────────────┐ │ Finding pipeline │ └─────────────────┬──────────────────┘ │ ┌────────────┬──────────┬────────┴────────┬───────────────┬──────────┐ │ normalize │ SPA-FP │ active verifier │ chain detector│ AI:FP │ │ +registry │ filter │ (CJ/HSTS/MIME/ │ (det patterns │ + remed │ │ +CVSS/CWE │ (simhash)│ Referrer/Perm) │ + AI-discov) │ + skill │ └────────────┴──────────┴─────────────────┴───────────────┴──────────┘ │ ┌─────────────────┴──────────────────┐ │ Validation gates (0-day machine) │ │ suspected → evidence_confirmed → │ │ verified → confirmed → exploitable │ └─────────────────┬──────────────────┘ │ ▼ ┌──────────────────────────────────────────┐ │ scans// : report.html, .md, │ │ results.sarif, issues.burp.xml, │ │ findings.json, scan.json, pocs/*.html │ └──────────────────────────────────────────┘ ## What's new — last 3 sprints - **`feat(verifiers+ci+exports)`** — 5 active verifiers (Clickjacking, HSTS, MIME, Referrer, Permissions-Policy), Burp Issues XML export, per-severity CI exit codes, GitHub Code Scanning SARIF upload, auth-session audit module, `docs/INTEGRATIONS.md`. - **`feat(fp+headers)`** — SPA-fallback FP filter (simhash + DOM-skeleton + title hash), `Missing_Security_Header` → real exploit payload + chain promotion, 5 new offensive/defensive SKILL.md (clickjacking, hsts-downgrade, mime-confusion, referrer-policy-leak, defensive-fp-filter), 4 new chain patterns. - **`feat(hardening)`** — Phase-boundary checkpoints in `ScanSession` (resume skips completed phases + persists mid-pipeline findings), real-binary integration suite for all 10 external tool wrappers, AI budget enforcement, sandboxed exploit runner. ## 60-Second Quick Start git clone https://github.com/erkanrzgc/cyberm4fia-scanner.git cd cyberm4fia-scanner pip install -r requirements.txt # Full passive + active scan python3 scanner.py -u https://your-target.example/ --all # + AI analysis (set NVIDIA_API_KEY first) export NVIDIA_API_KEY=nvapi-... python3 scanner.py -u https://your-target.example/ --all --ai ## Demo ### Web Application Scanning | Module | Flag | Description | |---|---|---| | XSS | `--xss` | Reflected and stored XSS checks with context-aware payload selection. | | SQLi | `--sqli` | Union-based SQL injection with blind fallback and exploit post-processing. | | LFI | `--lfi` | Local File Inclusion checks against traversal and wrapper payloads. | | RFI | `--rfi` | Remote File Inclusion checks for remote fetch and execution sinks. | | CMDi | `--cmdi` | OS command injection checks with optional interactive shell workflow. | | SSRF | `--ssrf` | Server-Side Request Forgery checks including cloud metadata probes. | | CSRF | `--csrf` | CSRF token and form protection checks for discovered forms. | | CORS | `--cors` | Cross-Origin Resource Sharing misconfiguration checks. | | Header Injection | `--header-inject` | CRLF and header injection checks. | | DOM XSS | `--dom-xss` | DOM-based XSS checks with Playwright browser execution. | | SSTI | `--ssti` | Template injection checks for common server-side template engines. | | XXE | `--xxe` | XML External Entity injection checks. | | Open Redirect | `--redirect` | Redirect abuse checks across discovered URLs. | | Passive Scan | `--passive` | Passive checks for headers, debug leakage, and lightweight disclosures. | | Secrets Scan | `--secrets` | HTML and JavaScript secret exposure scanning for API keys and tokens. | | OOB Testing | `--oob` | Out-of-band callback support for blind vulnerability verification. | ### API Security | Module | Flag | Description | |---|---|---| | API Scanner | `--api-scan` | OWASP API tests with OpenAPI import, schema-aware bodies, and auth intel. | ### Network & Infrastructure | Module | Flag | Description | |---|---|---| | Recon | `--recon` | Deep port, DNS, and TLS recon. Lightweight server and header intel runs on every scan. | | Subdomain Discovery | `--subdomain` | Subdomain enumeration for the target host. | | Endpoint Fuzzer | `--fuzz` | Directory and API endpoint brute forcing with smart 404 calibration. | | Crawler | `--crawl` | Recursive crawler with form and link discovery. | | Headless Discovery | `--headless` | Playwright-based SPA rendering and background endpoint discovery. | | Cloud Buckets | `--cloud` | Open S3, Azure Blob, and GCP bucket detection. | | Subdomain Takeover | `--takeover` | Dangling DNS and takeover fingerprint checks. | | Credential Spray | `--spray` | Default credential checks for exposed services. | ### Intelligence & OSINT | Module | Flag | Description | |---|---|---| | Technology Fingerprinting | `--tech` | Wappalyzer-style technology detection with CVE enrichment. | | OSINT Enrichment | `--osint` | Shodan InternetDB, WHOIS, and ASN enrichment. | | Email Harvesting | `--email` | Email discovery from public sources and on-page content. | ### Automation & Reporting | Module | Flag | Description | |---|---|---| | JWT Attack Suite | `--jwt` | Weak secret, algorithm confusion, and claim tampering checks. | | Race Condition | `--race` | TOCTOU and replay-style concurrency checks. | | HTTP Smuggling | `--smuggle` | CL.TE and TE.CL request smuggling checks. | | Prototype Pollution | `--proto` | Node.js prototype pollution probes. | | Deserialization | `--deser` | Insecure deserialization checks. | | Business Logic | `--bizlogic` | Multi-step business logic flaw checks. | | Vulnerability Chaining | `--chain` | Attack path correlation across discovered findings. | | Wordlist Generation | `--wordlist` | Site-specific password wordlist generation. | | AI Analysis | `--ai` | Dual-model AI with autonomous exploit agent, PoC generation, and false-positive filtering. | | Proxy Interceptor | `--proxy-listen PORT` | Built-in MITM proxy to capture traffic and feed scanner workflows. | | PoC Generator | `(auto)` | Automatic HTML and JSON proof-of-concept generation for findings. | | Template Engine | `(auto via --all)` | Built-in template-based checks that can be enabled through all-modules mode. | ## Quick Start The scanner now defaults to a powerful **Interactive Setup Wizard** built with `rich`. You can easily run it from anywhere in your terminal by just passing the target URL: # Start Interactive Interface directly with a target cyberm4fia https://target.com This will instantly display the banner and directly ask you for the scan modes, attack profiles, and runtime behavior! If you prefer classical CLI usage: # Full scan via CLI python3 scanner.py -u https://target.com --all # Specific modules python3 scanner.py -u https://target.com --xss --sqli # API scan with local OpenAPI spec python3 scanner.py -u https://api.target.com --api-scan --api-spec openapi.yaml # Multi-target python3 scanner.py -l targets.txt --all # Through proxy python3 scanner.py -u https://target.com --all --proxy socks5://127.0.0.1:9050 # Session resume python3 scanner.py --resume scan1.json ## 🧨 Active Exploitation Framework cyberm4fia-scanner goes beyond finding vulnerabilities—it verifies and exploits them. By selecting an attack profile that supports it, or simply passing the `--exploit` flag, the scanner activates post-exploitation modules: - **Interactive Shells:** Catch reverse shells automatically when Command Injection or RCE is discovered. - **Out-of-Band (OOB) Testing:** Spin up local HTTP listeners to detect blind/asynchronous vulnerabilities (supports auto-port fallback). - **Automated Looting:** Extract and dump database contents (SQLi) or grab sensitive system files (LFI) directly into a `loot/` directory. - **Offline PoC Generation:** Generate standalone `.html` or `.json` artifacts that securely demonstrate the exact vulnerability (e.g. Clickjacking, CSRF HTML forms). - **Auto-Pwn Hand-off:** Automatically generates ready-to-run Nuclei templates or MSF (Metasploit) commands to reproduce and exploit findings. - **Headless Browser Escalation:** Uses Playwright to drive active DOM XSS or CSRF payload execution directly within a real headless Chromium instance. ## 🛡️ Built-in Proxy Interceptor You can route your manual browser traffic directly through the scanner using the built-in MITM proxy. # Starts proxy on port 8081 specifically scoped to target.com python3 scanner.py --proxy-listen 8081 --scope-proxy target.com Any traffic you generate through your browser will be automatically intercepted, fed into the main scanning engine, and dynamically tested for vulnerabilities in real-time. ## Usage with AI The scanner integrates an **NVIDIA-powered AI system** for autonomous exploit generation, intelligent analysis, and automated PoC creation. ### Prerequisites You need an **NVIDIA API Key**. Set it as an environment variable: export NVIDIA_API_KEY="your_nvapi_key_here" ### AI-Powered Scanning # Full scan with AI enabled (NVIDIA API) python3 scanner.py -u https://target.com --all --ai # AI + specific modules python3 scanner.py -u https://target.com --xss --sqli --ai # Run Multi-Agent Autonomous Agent mode python3 scanner.py -u https://target.com --agent --ai ### Unified AI Architecture (70B) Both exploit and code generation roles now utilize the high-performance **meta/llama-3.3-70b-instruct** model via NVIDIA NIM for maximum fidelity: | Role | Implementation | What It Does | |---|---|---| | 🐇 **Exploit Agent** | Llama 3.3 70B | Payload crafting, WAF bypass, exploit planning, false-positive filtering | | 🧠 **Code Agent** | Llama 3.3 70B | PoC script writing, remediation code, code analysis | ### AI Exploit Agent When standard payload lists fail, the **Autonomous AI Exploit Agent** takes over: **Supported vulnerability types:** XSS, SQLi, LFI, CMDi, SSRF **Anti-hallucination pipeline:** Every AI-generated exploit is validated with regex checks + AI double-verification. Confidence below 70% is automatically rejected. ### Public Exploit Intelligence The scanner automatically searches for known public exploits when CVEs are discovered: Technology Detected → SiberAdar CVE Feed → Public Exploit Search ↓ ExploitDB (searchsploit) — offline archive GitHub PoC — starred repositories sploitscan — multi-source aggregation ↓ AI Agent uses real PoCs as templates Optional tools for enhanced coverage: # ExploitDB offline archive sudo apt install exploitdb # Multi-source exploit search pip3 install sploitscan ### AI Features Summary | Feature | Description | |---|---| | **Autonomous Exploit Agent** | Iterative think→generate→execute→validate→learn loop | | **Dual-Model Routing** | WhiteRabbitNeo for exploits, Qwen3-Coder for code | | **Anti-Hallucination** | Regex + AI verification, 70% confidence threshold | | **PoC Generation** | Auto-generates cURL, Python scripts, Nuclei templates | | **Exploit Chain Detection** | 7 built-in patterns + AI-discovered chains | | **Public Exploit Search** | ExploitDB, GitHub, sploitscan integration | | **WAF Bypass Mutation** | Evolving AI mutation engine for adaptive bypass | | **False Positive Filtering** | AI-assisted validation reduces noise | | **Remediation Guidance** | AI-generated code fixes and best practices | | **Executive Summaries** | AI-written C-level security reports | ## Scan Modes ## Attack Profiles | Profile | Coverage | Included Flags | Suggested Extras | |---|---|---|---| | `1-Fast Recon` | Recon, subdomain discovery, endpoint fuzzing, technology intel, and passive checks. | `--fuzz`, `--passive`, `--recon`, `--subdomain`, `--tech` | `--crawl`, `--osint`, `--headless` | | `2-Core Web Vulns` | Core web checks like XSS, SQLi, file inclusion, CMDi, CSRF, CORS, and DOM XSS. | `--cmdi`, `--cors`, `--csrf`, `--dom-xss`, `--header-inject`, `--lfi`, `--passive`, `--rfi`, `--sqli`, `--xss` | `--secrets`, `--oob`, `--headless`, `--exploit` | | `3-Advanced / Modern` | JWT, deserialization, SSTI, race, prototype pollution, SSRF, business logic, API, OOB, and XXE coverage. | `--api-scan`, `--ato`, `--auth-bypass`, `--bizlogic`, `--deser`, `--file-upload`, `--forbidden-bypass`, `--jwt`, `--oob`, `--proto`, `--race`, `--redirect`, `--smuggle`, `--ssrf`, `--ssti`, `--xxe` | `--tech`, `--passive`, `--chain`, `--exploit` | | `4-All-In-One` | Enables nearly every scan module except opt-in extras like AI and SARIF. | `(auto via --all)`, `--api-scan`, `--ato`, `--auth-bypass`, `--bizlogic`, `--chain`, `--cloud`, `--cmdi`, `--cors`, `--crawl`, `--csrf`, `--deser`, `--dom-xss`, `--dorking`, `--email`, `--file-upload`, `--forbidden-bypass`, `--fuzz`, `--har-output`, `--header-inject`, `--headless`, `--html`, `--jwt`, `--lfi`, `--oob`, `--osint`, `--passive`, `--proto`, `--race`, `--recon`, `--redirect`, `--rfi`, `--secrets`, `--smuggle`, `--spray`, `--sqli`, `--ssrf`, `--ssti`, `--subdomain`, `--takeover`, `--tech`, `--urlscan`, `--wayback`, `--xss`, `--xxe`, `Git History Secret Scanner (white-box + exposed-.git probe)`, `Multi-Provider Asset Search (Censys/ZoomEye/FOFA/Onyphe/Netlas/FullHunt/LeakIX)`, `Nuclei Community Templates`, `SSH/FTP Brute-Force`, `Scan Drift Detection` | `--wordlist`, `--exploit` | | `5-Custom Choice` | Ask every module prompt one by one. | `manual selection` | - | | `6-Web Recon + Audit` | OctoScan-style web chain: tech intel + nuclei community templates + endpoint fuzz + crawl + 7-provider asset search + passive. | `--cookie COOKIE`, `--cors`, `--crawl`, `--fuzz`, `--header-inject`, `--passive`, `--recon`, `--subdomain`, `--tech`, `Multi-Provider Asset Search (Censys/ZoomEye/FOFA/Onyphe/Netlas/FullHunt/LeakIX)`, `Nuclei Community Templates`, `csp_bypass` | `--secrets`, `git_history` | ## CI/CD & Integrations Every scan emits **SARIF** (`results.sarif`), **Burp Issues XML** (`issues.burp.xml`), enhanced JSON, HTML, Markdown, and PoC HTMLs. Import targets: | Target | Format | One-liner | |---|---|---| | **GitHub Code Scanning** | SARIF | upload via `github/codeql-action/upload-sarif@v3` (already wired in `.github/workflows/security-scan.yml`) | | **Burp Suite Pro** | Burp Issues XML | *Project → Import issues → `issues.burp.xml`* | | **DefectDojo** | SARIF or Burp XML | API: `POST /api/v2/import-scan/` (see `docs/INTEGRATIONS.md`) | | **Jenkins** | SARIF | `recordIssues(tools: [sarif(pattern: 'scans/**/results.sarif')])` | | **GitLab Ultimate** | SARIF as SAST | `artifacts.reports.sast: scans/*/results.sarif` | ### Per-severity exit codes for pipeline gating python3 scanner.py -u https://my-app/ --xss --sqli --sarif # exit 0 = clean # exit 1 = CRITICAL findings # exit 2 = HIGH findings # exit 3 = MEDIUM findings # exit 4 = LOW/INFO findings # exit 10 = scanner internal error Pick a threshold with `SCAN_EXIT_THRESHOLD=critical|high|medium|low|info|never`. Full mapping table + Burp / DefectDojo / Jenkins / GitLab examples in **[`docs/INTEGRATIONS.md`](docs/INTEGRATIONS.md)**. ## REST API The scanner includes a FastAPI-based REST API with auto-generated documentation. python3 scanner.py --api --port 8080 | Endpoint | Method | Description | |---|---|---| | `/api/scan` | POST | Start a new scan | | `/api/scan/{id}` | GET | Get scan results | | `/api/scans` | GET | List all scans | | `/api/report/{id}` | GET | Download HTML report | | `/api/scan/{id}` | DELETE | Cancel a scan | | `/docs` | GET | Swagger UI | | `/redoc` | GET | ReDoc | ## Project Structure cyberm4fia-scanner/ ├── scanner.py # main orchestrator ├── api_server.py # FastAPI REST API ├── modules/ # 40+ scanning modules ├── utils/ # HTTP client, WAF detection, auth, AI engine │ ├── ai.py # dual-model AI client (WhiteRabbitNeo + Qwen3-Coder) │ ├── ai_exploit_agent.py # autonomous exploit agent + chain detector │ └── exploit_finder.py # ExploitDB, GitHub PoC, sploitscan search ├── payloads/ # XSS, SQLi, LFI, SSRF, CMDi payload files ├── wordlists/ # fuzzer wordlists ├── tests/ # pytest test suite (470+ tests) ├── .github/workflows/ # CI/CD pipelines ├── .env.example # environment variable template └── requirements.txt # dependencies ## Configuration Copy `.env.example` to `.env` and set your values: cp .env.example .env | Variable | Description | |---|---| | `NVIDIA_API_KEY` | NVIDIA NIM API key for AI-powered scanning | | `WATCHSTACK_API_KEY` | WatchStack.io API key for verified PoC intelligence (free tier: 30 req/min) | | `GITHUB_TOKEN` | GitHub API token for higher rate limits on PoC search | | `SHODAN_API_KEY` | Shodan API key for OSINT enrichment | | `DEFAULT_THREADS` | Default thread count | | `DEFAULT_DELAY` | Default request delay | | `VERIFY_SSL` | SSL verification toggle | | `HTTP_PROXY` | Proxy URL | ## Testing pip install pytest pytest tests/ -v ## Legal Disclaimer ## License This project is licensed under the MIT License. See the LICENSE file for more details.