Berkayy123-h/web-scanner

GitHub: Berkayy123-h/web-scanner

Stars: 1 | Forks: 0

# Web Scanner — Security Intelligence Engine Proof-driven web vulnerability intelligence platform for automated scanning, attack chain analysis, exploit simulation, and multi-tenant security monitoring. ## Architecture ┌─────────────────────────────────────────────────────────┐ │ SaaS Platform Layer │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │ Auth │ │ Billing │ │ Rate │ │ Audit │ │ │ │ JWT+RBAC │ │ Stripe │ │Governor│ │ Pipeline │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ ├─────────────────────────────────────────────────────────┤ │ API Server (FastAPI) │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │ Scans │ │ Org │ │ Queue │ │ Dashboard│ │ │ │ CRUD │ │ Multi- │ │ Async │ │ Chart.js │ │ │ │ │ │ Tenant │ │ Worker │ │ UI │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ ├─────────────────────────────────────────────────────────┤ │ Storage Layer │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │SQLAlchemy│ │ Redis │ │ JSONL │ │ Alembic │ │ │ │ Async │ │ Queue │ │ Audit │ │ Migrate │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ ├─────────────────────────────────────────────────────────┤ │ Workers Layer │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │ Celery │ │ Crawl │ │Exploit │ │ Post- │ │ │ │ Dist. │ │ Shard │ │ Chain │ │ Exploit │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ ├─────────────────────────────────────────────────────────┤ │ Scanning Engine │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │ XSS │ │ SQLi │ │ SSRF │ │ DOM XSS │ │ │ │ Module │ │ Module │ │ Module │ │ Module │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │ Spider │ │ Scorer │ │ Proof │ │ │ │ Crawler │ │Confidence│ │ Verify │ │ │ └──────────┘ └──────────┘ └────────┘ │ ├─────────────────────────────────────────────────────────┤ │ Observability & Deployment │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │ │ │Prometheus│ │OpenTele. │ │ Docker │ │ Nginx │ │ │ │ Metrics │ │ Tracing │ │ Compose│ │ Proxy │ │ │ └──────────┘ └──────────┘ └────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────┘ ## What's New in v1.1.0 ### Sprint 1 — Discovery - SPA route seeding fallback with 10 common API routes. - Parametric nav expansion (`--enable-nav-expansion`). - Auth scope boundary (`--auth-scope-paths`). ### Sprint 2 — Auth & Session - Login verification with `check_url` / `check_pattern`. - Post-auth re-crawl for authenticated endpoint discovery. - CLI flags: `--auth-scope-paths`, `--enable-nav-expansion`. ### Sprint 3 — Verifier - DOM XSS verifier polling loop (10 × 200ms). - SQLi verifier retry (2 rounds). - XSS marker request retry on transient failure. ### Sprint 4 — Metrics - Cross-target drift monitor (`DriftMonitor`). - Priority stability tracker (`PriorityTracker`). ### Validation | Metric | Value | |---|---:| | DVWA Precision | 1.00 | | DVWA Recall | 1.00 | | DVWA F1 | 1.00 | | Session variance | 0 (3 runs) | | Test suite | 118 collected, 105 passed, 3 skipped | ## Quick Start ### Prerequisites - Python 3.11+ - Redis (optional for dev distributed queue) ### Install git clone https://github.com/Berkayy123-h/web-scanner cd web-scanner pip install -e . ### CLI Scanner # Basic scan webscanner --url http://example.com # Deep scan with all modules webscanner --url http://example.com --depth 3 --ssrf-internal # Scan with auth + scope webscanner --url http://example.com \ --login-url http://example.com/login \ --login-user admin --login-pass secret \ --auth-scope-paths /dashboard,/api \ --enable-nav-expansion ### Start API Server uvicorn web_scanner.api.server:app --reload --port 8000 ### Docker Compose # Development docker compose --profile dev up -d # Staging (PostgreSQL, Prometheus, Grafana) docker compose --profile stage up -d # Production (Nginx, OTel, autoscaling) docker compose --profile prod up -d ## Features ### Scanning Engine | Module | Detects | Technique | |---|---|---| | XSS | Reflected, Stored, DOM-based | Context-aware polyglot payloads, 100+ vectors | | SQLi | Error-based, Boolean, Time-blind | 50+ injection patterns, DB fingerprinting | | SSRF | Cloud metadata, Internal pivot, Blind | 30+ bypass techniques, redirect chains | | DOM XSS | Sink analysis, Taint flow | innerHTML, eval, document.write, setTimeout | All engines use multi-signal confidence scoring and deterministic proof verification. ### SPA Discovery webscanner --url http://juice-shop.local --spa --enable-nav-expansion - Network interception + XHR/fetch/pushState monkeypatching. - Framework detection: Angular, React, Vue, Svelte. - SPA route seeding fallback with 10 common API routes. - `enrich_spider()` feeds discovered endpoints into the scan pipeline. ### Intel Reports webscanner --url http://example.com --intel-dir ./reports - Priority Engine: CRITICAL (≥85), HIGH (≥60), MEDIUM (≥40), LOW (≥25), INFO. - Evidence Summary: execution proof, error patterns, timing, content matches. - Reasoning Chain: rule-by-rule decision trace with confidence delta. - Triage Summary: `report.md` + `report.json` (schema v2). ### Platform - Multi-tenant auth: API keys + JWT (access/refresh, revoke, rotation). - RBAC: Admin / Member / Viewer with granular permissions. - Rate governance: per-org concurrency limits, per-target cooldown. - Audit pipeline: dual-write (JSONL + SQL), 15 event types. - Scope policy: `--auth-scope-paths` allow/block rules, block takes precedence. ### Metrics & Observability - `DriftMonitor`: cross-target drift detection across runs. - `PriorityTracker`: priority stability across N runs. - Prometheus metrics: scan count/duration, queue depth, active scans. - OpenTelemetry tracing: distributed traces, X-Trace-ID propagation. - Grafana dashboards: pre-configured datasource. ### Distributed - Celery workers: Redis broker, concurrent task processing. - Crawl sharding: partition URLs across workers. - Redis-backed queue: drop-in for in-memory queue. ### Exploit Analysis - Attack chaining: 7 chain templates. - Post-exploit simulation: cookie theft, credential extraction, pivot, exfil. - Damage scoring: 0–1000 severity score per chain. ### Billing - Plan tiers: Free / Pro / Enterprise. - Usage metering: daily scan counts, per-minute API rate tracking. - Stripe integration: checkout, webhooks, subscription lifecycle. ## Benchmark ### Phase 1 — Flask Test Server | Type | TP | FP | FN | Precision | Recall | F1 | |---|---:|---:|---:|---:|---:|---:| | xss | 14 | 0 | 0 | 1.000 | 1.000 | 1.000 | | sqli | 3 | 0 | 0 | 1.000 | 1.000 | 1.000 | | ssrf | 2 | 0 | 0 | 1.000 | 1.000 | 1.000 | | dom_xss | 5 | 0 | 0 | 1.000 | 1.000 | 1.000 | | TOTAL | 24 | 0 | 0 | 1.000 | 1.000 | 1.000 | ### DVWA Validation P=1.00 R=1.00 F1=1.00 Session stability: 0 variance across 3 runs ## API Interactive docs: `http://localhost:8000/docs` (Swagger UI). ### Authentication X-API-Key: scan_<48_hex_chars> Authorization: Bearer ### Key Endpoints | Method | Path | Description | |---|---|---| | GET | /health | Health check | | GET | /metrics | Prometheus metrics | | POST | /api/v1/scans | Start a scan | | GET | /api/v1/scans | List scans | | GET | /api/v1/scans/{id} | Get scan result | | POST | /api/v1/scans/{id}/cancel | Cancel scan | | POST | /api/v1/auth/login | Login (JWT) | | POST | /api/v1/auth/refresh | Refresh token | | POST | /api/v1/auth/logout | Logout | | POST | /api/v1/api-keys | Create API key | | GET | /api/v1/billing/plan | Get current plan | | POST | /api/v1/billing/create-checkout | Upgrade plan | ## Test Suite # All tests PYTHONPATH=src pytest tests/ # Specific suites PYTHONPATH=src pytest tests/test_platform.py PYTHONPATH=src pytest tests/test_harden.py PYTHONPATH=src pytest tests/test_benchmark.py PYTHONPATH=src pytest tests/test_safety.py ## Project Structure src/web_scanner/ ├── scanner.py ├── __init__.py ├── log.py ├── reporter.py ├── core/ │ ├── engine.py │ ├── auth.py │ ├── scorer.py │ ├── verifier.py │ ├── evidence.py │ ├── signals.py │ ├── jwt.py │ └── taskqueue.py ├── crawler/ │ └── spider.py ├── modules/ │ ├── xss.py │ ├── sqli.py │ ├── ssrf.py │ └── dom.py ├── discovery/ │ ├── spa_discovery.py │ └── schema.py ├── metrics/ │ └── engine.py ├── intel/ │ ├── priority.py │ ├── explain.py │ └── report_v2.py ├── api/ │ ├── server.py │ ├── auth.py │ ├── dashboard.py │ ├── queue.py │ ├── scope.py │ ├── schemas.py │ ├── rate_governor.py │ ├── rbac.py │ ├── billing.py │ ├── metrics.py │ ├── tracing.py │ └── stripe_integration.py ├── workers/ │ ├── celery_app.py │ ├── distributed_queue.py │ └── crawl_shard.py ├── exploit/ │ ├── chainer.py │ ├── post_exploit.py │ ├── credential_abuse.py │ └── runner.py └── proxy/ └── server.py deploy/ ├── entrypoint.sh ├── nginx.conf ├── prometheus.yml ├── otel-collector.yml ├── grafana-datasource.yml └── env/ ├── .env.dev ├── .env.stage └── .env.prod tests/ ├── test_benchmark.py ├── test_scanner.py ├── test_safety.py ├── test_integration.py ├── test_platform.py ├── test_harden.py ├── test_stress.py ├── test_server.py ├── test_realworld_server.py └── test_juice_shop_spa.py ## Configuration | Variable | Default | Description | |---|---|---| | `SCANNER_DATABASE_URL` | `sqlite:///scanner.db` | Database URL | | `SCANNER_REDIS_URL` | `redis://localhost:6379/0` | Redis URL | | `SCANNER_JWT_SECRET` | *(required)* | JWT signing secret | | `SCANNER_JWT_ALGORITHM` | `HS256` | JWT algorithm | | `SCANNER_JWT_ACCESS_EXPIRE_MINUTES` | `15` | Access token TTL | | `SCANNER_JWT_REFRESH_EXPIRE_DAYS` | `7` | Refresh token TTL | | `SCANNER_MAX_CONCURRENT` | `10` | Max concurrent scans per org | | `SCANNER_RATE_PER_MINUTE` | `30` | Max API requests per minute | | `SCANNER_STRIPE_SECRET_KEY` | `` | Stripe API key | | `OTEL_EXPORTER_OTLP_ENDPOINT` | `` | OpenTelemetry endpoint | ## License MIT