niteshghimire0147/web-vuln-scanner

GitHub: niteshghimire0147/web-vuln-scanner

一款集成攻击链关联分析和 CVSS v3.1 评分的 Web/API/AI 漏洞扫描器，覆盖 OWASP 三大安全体系，支持 CI/CD 安全门禁与多种报告格式输出。

Stars: 0 | Forks: 0

# 🛡️ Web 漏洞扫描器 v2.0.0 [![CI](https://github.com/niteshghimire0147/web-vuln-scanner/actions/workflows/ci.yml/badge.svg)](https://github.com/niteshghimire0147/web-vuln-scanner/actions) [![Python](https://img.shields.io/badge/Python-3.9%2B-blue?logo=python)](https://python.org) [![OWASP](https://img.shields.io/badge/OWASP-Top%2010%20%7C%20API%20%7C%20AI-red)](https://owasp.org) [![MITRE ATT&CK](https://img.shields.io/badge/MITRE-ATT%26CK%20Mapped-orange)](https://attack.mitre.org) [![测试](https://img.shields.io/badge/Tests-65%2F65%20Passing-brightgreen)]() [![许可证](https://img.shields.io/badge/License-MIT-green)](LICENSE) [![仅限授权使用](https://img.shields.io/badge/Use-Authorized%20Testing%20Only-yellow)]() ## ⚡ 核心功能 | | 功能 | |---|---| | 🔍 | Web 漏洞扫描 — OWASP Top 10 (2021) | | 🌐 | API 安全测试 — OWASP API Top 10 (2023) | | 🤖 | AI / LLM 安全测试 — OWASP AI Top 10 (2025) | | 🧠 | 攻击链关联引擎 | | 📊 | CVSS v3.1 严重性评分 — 针对每个发现自动应用 | | ⚡ | 多线程扫描引擎 — ThreadPoolExecutor | | 🧾 | HTML 报告（始终生成）+ 可选 JSON / Markdown 输出 | | 🔐 | CI/CD 安全门控支持 — 可配置的退出代码阈值 | ## 目录 - [为什么这个项目很重要](#why-this-project-matters) - [攻击链智能](#-attack-chain-intelligence) - [安全覆盖矩阵](#security-coverage-matrix) - [MITRE ATT&CK 映射](#mitre-attck-mapping) - [安全方法论](#security-methodology) - [架构概览](#architecture-overview) - [安装说明](#installation) - [使用说明](#usage) - [CI/CD 集成](#cicd-integration) - [扫描模块](#scanner-modules) - [Payload 系统](#payload-system) - [误报过滤](#false-positive-filtering) - [报告输出](#report-output) - [测试](#-testing) - [安全测试环境](#safe-testing-environment) - [技术栈](#tech-stack) - [已知限制](#known-limitations) - [路线图](#roadmap) - [核心亮点](#-key-highlights) - [免责声明](#disclaimer) ## 为什么这个项目很重要 ### 攻击者的视角现实世界中的入侵很少孤立地利用单个漏洞。攻击者会将多个发现串联起来：暴露的 `.git/config` 泄露了凭证，凭证随后被用于对具有破损授权的管理 endpoint 进行身份验证访问，进而暴露出 SQL 注入向量。那些不建模这些关系而仅仅报告单个发现的扫描器，会低估组织面临的真实风险。此工具围绕**攻击链关联**而构建 — 它不仅识别漏洞的存在，更识别漏洞如何叠加放大。每个发现都会与相邻发现进行交叉比对，以暴露出扁平化清单报告会遗漏的可利用路径。 ### 为什么选择 CVSS v3.1 评分 [CVSS v3.1](https://www.first.org/cvss/v3.1/specification-document) 提供了一个与供应商无关的标准化严重性评分（0.0–10.0），该评分源自攻击向量、复杂度、所需权限、用户交互和影响范围。仅凭严重性标签（高/中/低）不足以对异构发现进行优先级排序。CVSS 评分能够实现： - 跨漏洞类别的一致性严重性比较 - 基于风险的修复优先级排序 - 与行业标准漏洞数据库（NVD, CVE）保持一致 - 基于数值阈值的 CI/CD 门控逻辑 ### 为什么攻击链很重要漏洞关联将孤立的发现转化为可操作的威胁模型。信息泄露发现（CVSS 5.3）与身份验证绕过（CVSS 7.5）相结合，可能会共同实现完全的账户接管 — 这种复合风险特征是任何一个单独发现都无法表达的。`core/attack_chain.py` 引擎显式地对这些关系进行建模。 ## 🔥 攻击链智能传统的扫描器孤立地报告漏洞。本框架将发现结果关联为**多步骤攻击链**，以反映真实对手的行为方式。 ### 攻击链流程图 ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ ADVERSARY ENTRY POINTS │ ├──────────────┬──────────────────┬──────────────────┬────────────────────────┤ │ Info Leak │ SQL Injection │ XSS / CSRF │ API / SSRF │ │ A05:2021 │ A03:2021 │ A03:2021 │ A10:2021 │ └──────┬───────┴────────┬─────────┴────────┬──────────┴──────────┬────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐ ┌───────────────────────┐ │ .git exposed │ │ DB dump / │ │ Session cookie │ │ Cloud metadata fetch │ │ .env leaked │ │ auth bypass │ │ theft via JS │ │ 169.254.169.254 │ │ stack trace │ │ UNION SELECT│ │ DOM injection │ │ Internal port scan │ └──────┬───────┘ └──────┬──────┘ └───────┬────────┘ └──────────┬────────────┘ │ │ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ ▼ ▼ ┌──────────────────────┐ ┌───────────────────────┐ │ CREDENTIAL ACCESS │ │ SESSION HIJACKING │ │ T1110 · A07:2021 │ │ T1539 · A07:2021 │ │ Default creds │ │ Cookie theft │ │ JWT alg confusion │ │ Token replay │ └──────────┬───────────┘ └──────────┬────────────┘ │ │ └──────────────┬────────────────────┘ │ ▼ ┌────────────────────────┐ │ PRIVILEGE ESCALATION │ │ T1078 · A01:2021 │ │ IDOR · Forced Browse │ │ Verb Tampering │ └────────────┬───────────┘ │ ┌──────────────┼──────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │ DATA THEFT │ │ LATERAL │ │ PERSISTENCE │ │ T1041 │ │ MOVEMENT │ │ T1505 │ │ PII · Keys │ │ API abuse │ │ Backdoor user │ │ DB records │ │ BOLA chain │ │ Token plant │ └──────────────┘ └──────────────┘ └────────────────┘ ``` ### 链条示例 — 由引擎建模 ``` [INFO DISCLOSURE]──────► [BROKEN AUTH]──────────────► [ACCOUNT TAKEOVER] .env exposes JWT Weak secret cracks token Full admin access [SQL INJECTION]────────► [AUTH BYPASS]──────────────► [DATA EXFILTRATION] UNION SELECT leaks Password hash exposed DB dump via API [XSS]──────────────────► [SESSION HIJACK]────────────► [PRIVILEGE ESCALATION] Payload reflected Cookie stolen via JS IDOR to admin panel [SSRF]─────────────────► [CLOUD METADATA]────────────► [CREDENTIAL LEAKAGE] URL param injected AWS IMDSv1 reached IAM keys extracted [API MISCONFIGURATION]─► [BOLA]──────────────────────► [UNAUTHORIZED EXFIL] No rate limit / auth Object ID enumerated All user records dumped [PROMPT INJECTION]─────► [LLM OUTPUT ABUSE]──────────► [DATA LEAKAGE via AI] Instruction override Unsafe content rendered Sensitive data in reply ``` **已建模的 MITRE ATT&CK 对手移动：** ``` Reconnaissance ──► Initial Access ──► Execution ──► Privilege Escalation ──► Exfiltration T1592 T1190 T1059 T1078 T1041 ``` 每条链都映射到 MITRE ATT&CK 战术和 OWASP 类别。`core/attack_chain.py` 引擎在扫描完成时根据规则注册表评估所有发现 — 新的关联规则只需要一个单独的 `ChainRule` 条目，无需更改代码。 ## 安全覆盖矩阵 ### OWASP Web 应用 Top 10 (2021) | 模块 | OWASP 类别 | 检测技术 | |---|---|---| | `bac` | A01: Broken Access Control | IDOR、路径遍历、强制浏览、HTTP 动词篡改 | | `crypto` | A02: Cryptographic Failures | HTTP 使用、不安全的 Cookie、缺少 HSTS、凭证泄露 | | `sqli` | A03: Injection | 基于错误的 SQLi、基于时间的盲注 SQLi、数据库指纹识别 | | `xss` | A03: Injection | 反射型 XSS、payload 注入、反射验证 | | `insecure-design` | A04: Insecure Design | 业务逻辑滥用、缺失工作流验证 | | `headers` | A05: Security Misconfiguration | 缺失安全头（CSP, HSTS, X-Frame-Options, XCTO） | | `vuln-components` | A06: Vulnerable Components | 过时的库、暴露的版本指纹 | | `auth` | A07: Identification & Authentication Failures | 默认凭证、JWT 配置错误、弱会话 | | `integrity` | A08: Software & Data Integrity Failures | 不安全的更新、不受信任的输入流 | | `logging` | A09: Security Logging & Monitoring Failures | 缺失日志、冗长的错误暴露 | | `ssrf` | A10: Server-Side Request Forgery | 云元数据滥用、环回、URL 注入 | ### OWASP API 安全 Top 10 (2023) | API 风险 | 检测技术 | |---|---| | API1: BOLA | 对象 ID 篡改、未授权访问验证 | | API2: Broken Authentication | Token 绕过、会话验证失败 | | API3: Broken Object Property Level Authorization | JSON 响应中的过度数据暴露 | | API4: Unrestricted Resource Consumption | 速率限制绕过、请求泛洪 | | API5: Broken Function Level Authorization | 未授权 endpoint 方法访问 | | API6: Unrestricted Access to Business Flows | 工作流滥用模拟 | | API7: SSRF | 内部 endpoint 探测、元数据访问 | | API8: Security Misconfiguration | CORS 配置错误、调试信息暴露 | | API9: Improper Inventory Management | 影子/已弃用的 API 发现 | | API10: Unsafe API Consumption | 不受信任的外部 API 使用检测 | ### OWASP AI 安全 Top 10 (2025) | AI 风险 | 检测技术 | |---|---| | LLM01: Prompt Injection | 越狱尝试、指令覆盖 payload | | LLM02: Insecure Output Handling | 模型输出中的 XSS/SSTI payload | | LLM03: Training Data Poisoning | 恶意模式注入检测 | | LLM04: Model Denial of Service | Token 泛洪、资源耗尽 | | LLM05: Supply Chain Vulnerabilities | 不安全的模型/API 依赖项 | | LLM06: Sensitive Information Disclosure | 基于提示的数据泄露尝试 | | LLM07: Insecure Plugin Design | 插件/工具执行滥用 | | LLM08: Excessive Agency | 权限过高的 AI 操作 | | LLM09: Overreliance | 缺乏对 AI 输出的验证 | | LLM10: Model Theft | 提取尝试 / 模型探测 | ## MITRE ATT&CK 映射 | 漏洞 | 技术 ID | 技术名称 | 战术 | |---|---|---|---| | SQL Injection | T1190 | Exploit Public-Facing Application | Initial Access | | XSS | T1059.007 | JavaScript Execution | Execution | | SSRF | T1190 | Exploit Public-Facing Application | Initial Access | | Path Traversal | T1083 | File and Directory Discovery | Discovery | | Information Disclosure | T1592 | Gather Victim Host Information | Reconnaissance | | Credential Attack | T1110 | Brute Force | Credential Access | | API Abuse | T1071 | Application Layer Protocol Abuse | Command & Control | | Prompt Injection | T1059 | Command and Scripting Abuse | Execution | ## 安全方法论 ### 扫描阶段架构 ``` CLI Entry Point (main.py) │ ▼ Session Setup ──────── Auth handler, proxy config, custom headers │ ├──────────────────────────────────┐ ▼ ▼ Passive Module Execution Active Web Crawler (no payload injection) (depth-limited, scope-enforced) headers, info, crypto, Form extraction, URL param auth, api, ai discovery, endpoint dedup │ │ │ ▼ │ Active Module Execution │ (payload injection) │ sqli, xss, bac, ssrf │ │ └──────────────┬───────────────────┘ ▼ Result Collector (thread-safe, deduplicated) │ ▼ CVSS v3.1 Scoring Engine │ ▼ Attack Chain Correlation (vulnerability relationship mapping) │ ▼ False-Positive Filter (reflection validation, error signature matching) │ ▼ Report Generator HTML | JSON | Markdown ``` ### 模块交互模型 **被动模块**直接针对目标源执行 — 没有表单提交，没有参数注入。它们检查服务器响应、头和配置信号。被动结果立即可用，并在主动扫描开始之前输入到攻击链关联中。 **主动模块**使用爬虫发现的 endpoint。`EndpointManager` 跨发现源进行去重，并公开类型化的查询接口（`with_params()`, `api_endpoints()`），以便每个主动模块仅接收与其技术相关的 endpoint 子集。 `attack_chain.py` 中的**漏洞关联**在聚合的发现集上运行。它应用基于规则的关系建模：暴露技术版本细节的信息泄露发现会升级相邻的 Vulnerable Component 发现；在 Injection 发现之前出现的 Authentication failures 会生成严重性升级的复合攻击链条目。 ### 误报消减策略扫描器在最终确定发现之前应用两层验证： 1. **签名确认**：SQLi 发现在响应体中需要可识别的数据库错误模式。在干净基线请求中也出现的通用 HTTP 500 响应将被抑制。 2. **反射验证**：XSS 发现在响应中需要未编码的 payload 存在。HTML 实体编码的反射（`<script>`）被归类为不可利用并被过滤。阈值可通过 `config.yaml` 配置，以在不同目标环境中平衡敏感度和噪音。 ## 架构概览 ### 数据流 ``` ┌──────────────────────────────┐ │ CLI · main.py │ │ URL · modules · auth · proxy │ └──────────────┬───────────────┘ │ ┌──────────────▼───────────────┐ │ Session Setup │ │ cookies · headers · proxy │ └──────────────┬───────────────┘ │ ┌────────────────────────┴──────────────────────┐ │ │ ┌───────────▼────────────┐ ┌──────────────▼──────────────┐ │ Passive Modules │ │ Active Crawler │ │ headers · info │ │ depth-limited · scoped │ │ crypto · auth │ │ form extraction │ │ api · ai │ │ URL param discovery │ └───────────┬─────────────┘ └──────────────┬──────────────┘ │ │ │ ┌──────────────▼──────────────┐ │ │ Active Modules │ │ │ sqli · xss · bac · ssrf │ │ └──────────────┬──────────────┘ │ │ └───────────────────────┬───────────────────────┘ │ ┌─────────────▼────────────────┐ │ Result Collector │ │ deduplicate · normalize │ └─────────────┬────────────────┘ │ ┌─────────────▼────────────────┐ │ CVSS v3.1 Scoring │ │ 0.0 – 10.0 per finding │ └─────────────┬────────────────┘ │ ┌─────────────▼────────────────┐ │ Attack Chain Correlation │ │ model adversary paths │ │ MITRE ATT&CK · OWASP refs │ └─────────────┬────────────────┘ │ ┌─────────────▼────────────────┐ │ False-Positive Filter │ │ signature · reflection check │ └──────┬──────────┬────────────┘ │ │ ┌─────────────────┘ └─────────────────┐ │ │ ┌──────────▼──────────┐ ┌────────────▼────────────┐ │ HTML Report │ │ JSON Report · Markdown │ │ visual dashboard │ │ SIEM · CI/CD · tickets │ └─────────────────────┘ └─────────────────────────┘ ``` ### 文件结构 ``` web-vuln-scanner/ │ ├── main.py CLI entry point and scan orchestrator │ ├── core/ Engine layer │ ├── target.py Target metadata, scope enforcement │ ├── crawler.py Enhanced crawler with endpoint management │ ├── endpoint_manager.py Thread-safe endpoint registry and deduplication │ ├── scanner_engine.py ThreadPoolExecutor-based module runner │ ├── result_collector.py Thread-safe finding aggregator with dedup and CVSS │ ├── attack_chain.py Vulnerability correlation and chain modeling │ ├── cvss.py CVSS v3.1 base score calculator │ ├── report.py HTML and JSON report generation │ ├── auth.py Authentication handler (cookies, headers, proxy) │ └── utils.py HTTP session factory, URL normalization, timing │ ├── modules/ Scanner modules (one per vulnerability class) │ ├── scanner_base.py Abstract base class — standardizes finding schema │ ├── header_auditor.py A05:2021 — Security header analysis │ ├── info_disclosure.py A05:2021 — Sensitive path and error detection │ ├── sql_injection.py A03:2021 — Error-based and time-based blind SQLi │ ├── xss_scanner.py A03:2021 — Reflected XSS with reflection validation │ ├── broken_access_control.py A01:2021 — IDOR, path traversal, verb tampering │ ├── cryptographic_failures.py A02:2021 — TLS, cookie flags, HSTS │ ├── broken_auth_scanner.py A07:2021 — Default credentials, lockout, JWT │ ├── ssrf_scanner.py A10:2021 — Cloud metadata, loopback, header injection │ ├── api_scanner.py API Top 10 (2023) — BOLA, rate limits, CORS │ ├── ai_scanner.py AI Top 10 (2025) — Prompt injection, model theft │ ├── crawler.py Web crawler with form and parameter extraction │ └── false_positive_filter.py Post-scan noise reduction │ ├── reporter/ Output layer │ ├── html_reporter.py Interactive HTML dashboard with severity charts │ ├── json_reporter.py Machine-readable structured JSON │ └── markdown_reporter.py Human-readable Markdown format │ ├── utils/ Shared infrastructure │ ├── logger.py Structured, color-coded logging (all output to stderr) │ ├── config.py YAML configuration loader │ ├── mitre.py MITRE ATT&CK technique mapping database │ └── payload_loader.py OWASP-mapped payload file resolver (-w support) │ ├── data/ OWASP-mapped payload files (auto-loaded per module) │ ├── sqli.txt SQL injection payloads (error-based, time-based, union) │ ├── xss.txt XSS payloads with reflection marker │ ├── ssrf.txt SSRF probe targets (loopback + cloud metadata) │ ├── bac.txt Forced browsing and IDOR probe paths │ ├── auth.txt Default credentials and auth bypass payloads │ └── paths.txt Information disclosure path list │ ├── tests/ pytest test suite with HTTP mocking ├── examples/ Docker Compose lab environment (DVWA) └── config.yaml Default scan configuration ``` ### 组件职责 | 组件 | 职责 | |---|---| | `main.py` | CLI 参数解析、会话构建、扫描阶段编排、报告分发 | | `core/target.py` | 不可变目标表示、同源范围强制执行 | | `core/endpoint_manager.py` | 具有类型化查询接口的线程安全 endpoint 去重注册表 | | `core/scanner_engine.py` | 通过 ThreadPoolExecutor 执行并行模块、模块适配器模式 | | `core/result_collector.py` | 发现规范化、通过指纹去重、应用 CVSS 评分 | | `core/attack_chain.py` | 发现关系建模、严重性升级、链条条目生成 | | `core/cvss.py` | 根据发现类型计算 CVSS v3.1 基础分数 | | `modules/scanner_base.py` | 标准化发现 schema 工厂、共享日志接口 | ## 安装说明 ``` git clone https://github.com/niteshghimire0147/web-vuln-scanner.git cd web-vuln-scanner python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt # 可选：开发依赖项（testing、linting） pip install -r requirements-dev.txt ``` **要求：** Python 3.9+ ## 使用说明 ``` python main.py --url [OPTIONS] Options: --url, -u Target URL (required) --modules Comma-separated module list (default: all) headers,info,sqli,xss,bac,crypto,auth,ssrf,api,ai --depth Crawler recursion depth (default: 2) --max-pages Maximum pages to crawl (default: 50) --timeout HTTP request timeout in seconds (default: 10) --delay Inter-request delay for rate limiting (default: 0) --cookie Session cookie string ("PHPSESSID=abc; security=low") --header Additional HTTP header — repeatable ("Name: Value") --proxy HTTP/HTTPS proxy URL ("http://127.0.0.1:8080") -o, --output Report base name saved in output/ (default: scan__) --json Also save a JSON report (output/.json) --markdown Also save a Markdown report (output/.md) -w, --wordlist Custom payload: file path, folder, or literal string --fail-on Exit code 1 threshold: critical / high / medium / none (default: high) -v, --verbose Enable verbose progress output Exit codes: 0 No findings at or above --fail-on threshold 1 Findings detected at or above threshold (security alert, not build failure) ``` ### 示例 ``` # 全面扫描 — HTML 报告自动保存到 output/scan__.html python main.py -u http://testapp.local -v # 指定名称的全面扫描 + 所有格式 python main.py -u http://testapp.local -o report --json --markdown -v # 带身份认证会话的注入聚焦扫描 python main.py -u http://app.com --modules sqli,xss,bac --cookie "session=abc123" # 自定义 payload 文件覆盖 python main.py -u http://app.com --modules sqli,xss -w custom/my_payloads.txt # 来自文件夹的自定义 payload（加载其中的所有 .txt 文件） python main.py -u http://app.com --modules sqli -w custom/sqli-payloads/ # 限速深度爬取 python main.py -u http://app.com --depth 3 --max-pages 200 --delay 0.5 # API 与 AI 安全评估 python main.py -u http://api.target.com --modules api,ai,auth,crypto --json # 通过 Burp Suite 拦截流量 python main.py -u http://target.com --proxy http://127.0.0.1:8080 -v ``` ## CI/CD 集成该扫描器被设计为自动化流水线中的安全门控。所有人类可读的输出（banner、进度、摘要）都发送到 **stderr**。**stdout** 保持静默 — 机器可读的数据仅写入文件。 ### 退出代码契约 | 代码 | 含义 | |---|---| | `0` | 扫描完成 — 没有达到或超过配置阈值的发现 | | `1` | 安全警报 — 检测到达到或超过阈值的发现 | ### `--fail-on` 阈值控制 ``` # 仅审计 — 始终以退出代码 0 退出，仍报告发现的问题（永不阻断 pipeline） python main.py -u https://staging.app.com --json --fail-on none # 仅在发现 CRITICAL 级别问题时阻断 python main.py -u https://staging.app.com --json --fail-on critical # 在发现 HIGH 或 CRITICAL 时阻断（默认） python main.py -u https://staging.app.com --json --fail-on high # 在发现 MEDIUM 及以上级别时阻断 python main.py -u https://staging.app.com --json --fail-on medium ``` ### 支持的 CI 平台 | 平台 | 集成方法 | |---|---| | GitHub Actions | `continue-on-error: true` + `upload-artifact` — 参见 `.github/workflows/ci.yml` | | GitLab CI | `allow_failure: true` 配合 JSON artifact | | Jenkins | `catchError(buildResult: 'UNSTABLE')` 块 | | Azure DevOps | `continueOnError: true` 任务标志 | ### GitHub Actions 快速示例 ``` - name: Security Scan id: scan run: python main.py --url ${{ vars.STAGING_URL }} --json --fail-on critical continue-on-error: true - name: Upload Report uses: actions/upload-artifact@v4 with: name: security-report path: output/*.json - name: Enforce Gate if: steps.scan.outcome == 'failure' run: echo "CRITICAL findings — review artifact" && exit 1 ``` ## 扫描模块 | 模块 | OWASP 参考 | 检测技术 | |---|---|---| | `headers` | A05:2021 | 缺失 CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy | | `info` | A05:2021 | 25+ 敏感路径探测、堆栈跟踪和冗长错误 | | `sqli` | A03:2021 | 基于错误（20+ DB 签名）+ 基于时间的盲注、数据库指纹识别 | | `xss` | A03:2021 | 反射型 XSS、8 种 payload 变体、未转义反射验证 | | `bac` | A01:2021 | IDOR、路径遍历、强制浏览、HTTP 动词篡改 | | `crypto` | A02:2021 | 明文 HTTP 传输、Cookie 标志分析、缺少 HSTS | | `auth` | A07:2021 | 默认凭证枚举、锁定绕过、JWT 算法混淆 | | `ssrf` | A10:2021 | 云元数据探测 (AWS/GCP/Azure)、环回注入、基于 Header 的 SSRF | | `api` | API Top 10 (2023) | BOLA、破损的身份验证、缺少速率限制、CORS 配置错误、影子 API | | `ai` | AI Top 10 (2025) | Prompt 注入、不安全的输出处理、模型窃取、供应链暴露 | ## Payload 系统每个注入模块自动从 `data/` 中匹配的文件加载 payload — 无需配置。 | 模块 | 默认文件 | 内容 | |---|---|---| | `sqli` | `data/sqli.txt` | 基于错误、基于时间的盲注、基于 union 的 payload | | `xss` | `data/xss.txt` | 带有内嵌反射标记的反射型 XSS payload | | `ssrf` | `data/ssrf.txt` | 环回、IPv6、云元数据探测 URL | | `bac` | `data/bac.txt` | 管理面板路径和 IDOR 参数探测 | | `auth` | `data/auth.txt` | 默认凭证和身份验证绕过 payload | | `info` | `data/paths.txt` | 敏感文件和调试 endpoint 路径 | ### 自定义 Payload (`-w`) `-w` / `--wordlist` 标志会覆盖所有活动注入模块的默认 `data/` 文件： ``` # 单个字面 payload（无需文件） python main.py -u http://target.com --modules sqli -w "' OR 1=1--" # 单个 payload 文件（在本次运行中替换 data/sqli.txt 和 data/xss.txt） python main.py -u http://target.com --modules sqli,xss -w custom/my_payloads.txt # 文件夹 — 动态加载其中的每个 .txt 文件 python main.py -u http://target.com --modules sqli -w custom/sqli-payloads/ ``` **优先级：** `-w` 始终覆盖 `data/` 默认值。当没有给出 `-w` 且不存在 `data/.txt` 时，模块将回退到其内置的硬编码 payload 列表 — 不会中断执行。 ## 误报过滤扫描器在最终确定发现之前应用两层误报消减： **扫描前基线指纹识别（按模块）** - **Broken Access Control**：在探测任何管理/敏感路径之前，扫描器会获取一个绝对不存在的 URL，并记录其正文哈希和大小。任何返回与该基线相同或几乎相同（正文大小在 2% 以内）响应的路径都会被静默丢弃。这消除了 SPA 和 Next.js / React 应用中最常见的误报 — 客户端路由器显示 404 UI，但服务器始终响应 200。 - **Info Disclosure**：相同的基线指纹识别 — 返回 SPA 包罗万象的 `index.html` 的敏感路径将被抑制。内容类型验证器进一步确认标记的文件实际包含预期内容（例如，`.env` 必须匹配 `KEY=VALUE` 模式，`.git/HEAD` 必须匹配 `ref: refs/heads/`）。 **扫描后签名验证** (`modules/false_positive_filter.py`) - **SQL Injection**：要求响应体中包含可识别的数据库错误签名。不包含 DB 特定模式的 HTTP 500 响应将被抑制。 - **XSS**：验证注入的 payload 在响应中以*未编码*形式出现。HTML 实体编码的反射被归类为不可利用并被排除。 `config.yaml` 中的可配置阈值： ``` false_positive: enabled: true min_body_diff_bytes: 50 # Minimum response delta to flag anomalous behavior reflection_threshold: 0.8 # Minimum unencoded reflection match ratio for XSS ``` ## 报告输出 ### HTML 仪表板 ![Scanner Dashboard](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/19f6d7e747180930.png) *HTML 报告 — 严重性细分、可折叠的发现项、CVSS 评分、MITRE ATT&CK 上下文* ### 报告输出示例 ![Report Output](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/78610cf4ea180937.png) *样本扫描报告，展示了按严重性分组的发现项及其证据和修复指南* 所有报告会自动保存到 `output/` 文件夹。**始终会生成 HTML。** JSON 和 Markdown 是可选的附加项。 ``` # 默认 — 仅 HTML，自动命名 python main.py -u http://target.com # output/scan_target.com_20260426_153000.html # 命名报告 python main.py -u http://target.com -o myreport # output/myreport.html # 命名报告 + 可选格式 python main.py -u http://target.com -o myreport --json --markdown # output/myreport.html + output/myreport.json + output/myreport.md ``` | 格式 | 标志 | 主要用途 | |---|---|---| | HTML | 始终开启 | 客户交付物、可视化严重性仪表板、可折叠的发现项 | | JSON | `--json` | SIEM 接入、CI/CD 流水线集成、自定义自动化 | | Markdown | `--markdown` | 文档、工单创建、同行评审 | ### 终端输出样本 ``` +==================================================================+ | Web Application Vulnerability Scanner v2.0.0 | | OWASP Web Top 10 (2021) · API Top 10 (2023) · AI Top 10 (2025)| | *** AUTHORIZED TESTING ONLY *** | +==================================================================+ [1/10] Security Headers (A05:2021)... 3 findings [2/10] Information Disclosure (A05:2021)... 1 finding [3/10] SQL Injection (A03:2021)... 2 findings [4/10] Cross-Site Scripting (A03:2021)... 1 finding [5/10] Broken Access Control (A01:2021)... 2 findings [6/10] Cryptographic Failures (A02:2021)... 1 finding [7/10] Authentication Failures (A07:2021)... 2 findings [8/10] SSRF (A10:2021)... 1 finding [9/10] API Security (API Top 10)... 3 findings [10/10] AI Security (AI Top 10)... 2 findings ============================================================ Scan Summary ============================================================ Total findings : 18 Elapsed time : 47.3s CRITICAL 3 HIGH 6 MEDIUM 5 LOW 2 INFORMATIONAL 2 ============================================================ ``` ## 🧪 测试 ``` 65 / 65 tests passing ``` ``` pip install -r requirements-dev.txt pytest tests/ -v --cov=modules --cov-report=term-missing ``` | 测试模块 | 覆盖范围 | |---|---| | `test_sql_injection.py` | 基于错误和基于时间的盲注 SQLi 检测、表单和 URL 参数注入 | | `test_xss_scanner.py` | 反射型 XSS 反射、脚本上下文检测、隐藏输入跳过 | | `test_header_auditor.py` | 缺失头检测、服务器信息泄露、弱 CSP 标记 | | `test_info_disclosure.py` | 敏感路径探测、堆栈跟踪检测、SPA 误报抑制 | | `test_false_positive_filter.py` | SQL 错误签名验证、XSS 反射确认、基线比较 | 所有测试均使用 HTTP 响应模拟（`responses` 库） — 无需实时网络。 ## 安全测试环境使用刻意制造漏洞的应用程序进行授权的实验室测试： ``` # DVWA (Damn Vulnerable Web Application) — low security level python main.py -u http://localhost:8080 \ --cookie "security=low; PHPSESSID=test" \ -v # 包含所有报告格式 python main.py -u http://localhost:8080 \ --cookie "security=low; PHPSESSID=test" \ --json --markdown -v # 通过 Docker Compose 搭建的完整实验室环境（见 examples/） docker-compose -f examples/docker-compose.yml up -d python main.py -u http://localhost:8080 --json --markdown -v ``` 推荐的实验室目标：[DVWA](http://dvwa.co.uk/)、[WebGoat](https://owasp.org/www-project-webgoat/)、[Juice Shop](https://owasp.org/www-project-juice-shop/) ## 技术栈 | 组件 | 技术 | |---|---| | 语言 | Python 3.9+ | | HTTP 客户端 | requests ≥ 2.31.0 | | HTML 解析 | beautifulsoup4 ≥ 4.12.0 | | 配置 | PyYAML ≥ 6.0 | | 终端输出 | colorama ≥ 0.4.6 | | 测试 | pytest + responses（HTTP 模拟） | | 静态分析 | flake8（代码检查）、bandit（安全） | | CI/CD | GitHub Actions | ## 已知限制 | 领域 | 限制 | |---|---| | SQLi | 基于时间的盲注检测依赖于响应时间；在高延迟目标上容易受到网络抖动的影响 | | 爬取 | 深度受限；JavaScript 渲染的 SPA（React、Vue、Angular）需要补充工具（Playwright/Selenium） | | XSS | 仅限基于反射的检测；基于 DOM 和存储型 XSS 需要运行时执行或两阶段检索 | | 身份验证 | 仅支持会话 Cookie 传递 — 没有登录表单自动化或 MFA 处理 | | CSRF | 超出范围；需要浏览器上下文执行 | | 存储型 XSS | 已排除；两阶段的提交-检索方法对生产目标具有持久性风险 | ## 路线图 - [ ] 通过 Playwright 集成支持 JavaScript 渲染的 SPA - [ ] 用于 GitHub Code Scanning 原生集成的 SARIF 输出格式 - [ ] YAML 定义的自定义扫描规则引擎（无需 Python） - [ ] GraphQL 内省和 schema 枚举模块 - [ ] 具有仅增量报告的持续扫描模式 - [ ] 通过沙盒化两阶段方法检测存储型 XSS - [ ] 子域枚举，以扩展攻击面发现 - [ ] WebSocket 安全测试模块 ## 📌 核心亮点 | | | |---|---| | ✔ | 模块化基于插件的架构 — 通过子类化 `ScannerBase` 添加扫描器，无需更改引擎 | | ✔ | 真实的攻击链关联 — 不是孤立的发现，而是经过建模的对手路径 | | ✔ | 生产就绪的 CI/CD 行为 — 可配置的退出代码、干净的 stdout/stderr 分离 | | ✔ | 集三种安全框架于一个工具 — Web + API + AI，均与 OWASP 保持一致 | | ✔ | 可针对企业工作流扩展 — SIEM 就绪的 JSON，每个发现项均附有 MITRE ATT&CK 上下文 | ## 贡献有关开发设置、模块编写指南和 Pull Request 流程，请参见 [CONTRIBUTING.md](CONTRIBUTING.md)。 ## 安全政策有关负责任的披露政策，请参见 [SECURITY.md](SECURITY.md)。 ## 免责声明 **仅限授权测试。** 此工具旨在针对您拥有或获得明确书面授权进行测试的系统使用。未经授权对第三方系统使用是非法的。参见 [DISCLAIMER.md](DISCLAIMER.md)。基于 [MIT](LICENSE) 许可。 **作者：** Nitesh Ghimire — 安全研究员 GitHub: [@niteshghimire0147](https://github.com/niteshghimire0147)

标签：AI安全测试, API安全测试, CI/CD安全门禁, CISA项目, Cloudflare, CVSS v3.1 评分, DevSecOps, HTML报告, JSON报告, Markdown报告, MITRE ATT&CK, OWASP Top 10, Python安全工具, Web漏洞扫描器, XXE攻击, 上游代理, 多线程扫描, 大模型安全, 安全报告生成, 安全规则引擎, 恶意代码分类, 攻击链分析, 文档安全, 网络安全, 网络安全评估, 自动化渗透测试, 误报过滤, 逆向工具, 防御框架, 隐私保护