Berkayy123-h/web-scanner
GitHub: Berkayy123-h/web-scanner
Stars: 1 | Forks: 0
# Web Scanner — Security Intelligence Engine
Proof-driven web vulnerability intelligence platform for automated scanning, attack chain analysis, exploit simulation, and multi-tenant security monitoring.
## Architecture
┌─────────────────────────────────────────────────────────┐
│ SaaS Platform Layer │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │ Auth │ │ Billing │ │ Rate │ │ Audit │ │
│ │ JWT+RBAC │ │ Stripe │ │Governor│ │ Pipeline │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ API Server (FastAPI) │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │ Scans │ │ Org │ │ Queue │ │ Dashboard│ │
│ │ CRUD │ │ Multi- │ │ Async │ │ Chart.js │ │
│ │ │ │ Tenant │ │ Worker │ │ UI │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ Storage Layer │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │SQLAlchemy│ │ Redis │ │ JSONL │ │ Alembic │ │
│ │ Async │ │ Queue │ │ Audit │ │ Migrate │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ Workers Layer │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │ Celery │ │ Crawl │ │Exploit │ │ Post- │ │
│ │ Dist. │ │ Shard │ │ Chain │ │ Exploit │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ Scanning Engine │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │ XSS │ │ SQLi │ │ SSRF │ │ DOM XSS │ │
│ │ Module │ │ Module │ │ Module │ │ Module │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │ Spider │ │ Scorer │ │ Proof │ │
│ │ Crawler │ │Confidence│ │ Verify │ │
│ └──────────┘ └──────────┘ └────────┘ │
├─────────────────────────────────────────────────────────┤
│ Observability & Deployment │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │Prometheus│ │OpenTele. │ │ Docker │ │ Nginx │ │
│ │ Metrics │ │ Tracing │ │ Compose│ │ Proxy │ │
│ └──────────┘ └──────────┘ └────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
## What's New in v1.1.0
### Sprint 1 — Discovery
- SPA route seeding fallback with 10 common API routes.
- Parametric nav expansion (`--enable-nav-expansion`).
- Auth scope boundary (`--auth-scope-paths`).
### Sprint 2 — Auth & Session
- Login verification with `check_url` / `check_pattern`.
- Post-auth re-crawl for authenticated endpoint discovery.
- CLI flags: `--auth-scope-paths`, `--enable-nav-expansion`.
### Sprint 3 — Verifier
- DOM XSS verifier polling loop (10 × 200ms).
- SQLi verifier retry (2 rounds).
- XSS marker request retry on transient failure.
### Sprint 4 — Metrics
- Cross-target drift monitor (`DriftMonitor`).
- Priority stability tracker (`PriorityTracker`).
### Validation
| Metric | Value |
|---|---:|
| DVWA Precision | 1.00 |
| DVWA Recall | 1.00 |
| DVWA F1 | 1.00 |
| Session variance | 0 (3 runs) |
| Test suite | 118 collected, 105 passed, 3 skipped |
## Quick Start
### Prerequisites
- Python 3.11+
- Redis (optional for dev distributed queue)
### Install
git clone https://github.com/Berkayy123-h/web-scanner
cd web-scanner
pip install -e .
### CLI Scanner
# Basic scan
webscanner --url http://example.com
# Deep scan with all modules
webscanner --url http://example.com --depth 3 --ssrf-internal
# Scan with auth + scope
webscanner --url http://example.com \
--login-url http://example.com/login \
--login-user admin --login-pass secret \
--auth-scope-paths /dashboard,/api \
--enable-nav-expansion
### Start API Server
uvicorn web_scanner.api.server:app --reload --port 8000
### Docker Compose
# Development
docker compose --profile dev up -d
# Staging (PostgreSQL, Prometheus, Grafana)
docker compose --profile stage up -d
# Production (Nginx, OTel, autoscaling)
docker compose --profile prod up -d
## Features
### Scanning Engine
| Module | Detects | Technique |
|---|---|---|
| XSS | Reflected, Stored, DOM-based | Context-aware polyglot payloads, 100+ vectors |
| SQLi | Error-based, Boolean, Time-blind | 50+ injection patterns, DB fingerprinting |
| SSRF | Cloud metadata, Internal pivot, Blind | 30+ bypass techniques, redirect chains |
| DOM XSS | Sink analysis, Taint flow | innerHTML, eval, document.write, setTimeout |
All engines use multi-signal confidence scoring and deterministic proof verification.
### SPA Discovery
webscanner --url http://juice-shop.local --spa --enable-nav-expansion
- Network interception + XHR/fetch/pushState monkeypatching.
- Framework detection: Angular, React, Vue, Svelte.
- SPA route seeding fallback with 10 common API routes.
- `enrich_spider()` feeds discovered endpoints into the scan pipeline.
### Intel Reports
webscanner --url http://example.com --intel-dir ./reports
- Priority Engine: CRITICAL (≥85), HIGH (≥60), MEDIUM (≥40), LOW (≥25), INFO.
- Evidence Summary: execution proof, error patterns, timing, content matches.
- Reasoning Chain: rule-by-rule decision trace with confidence delta.
- Triage Summary: `report.md` + `report.json` (schema v2).
### Platform
- Multi-tenant auth: API keys + JWT (access/refresh, revoke, rotation).
- RBAC: Admin / Member / Viewer with granular permissions.
- Rate governance: per-org concurrency limits, per-target cooldown.
- Audit pipeline: dual-write (JSONL + SQL), 15 event types.
- Scope policy: `--auth-scope-paths` allow/block rules, block takes precedence.
### Metrics & Observability
- `DriftMonitor`: cross-target drift detection across runs.
- `PriorityTracker`: priority stability across N runs.
- Prometheus metrics: scan count/duration, queue depth, active scans.
- OpenTelemetry tracing: distributed traces, X-Trace-ID propagation.
- Grafana dashboards: pre-configured datasource.
### Distributed
- Celery workers: Redis broker, concurrent task processing.
- Crawl sharding: partition URLs across workers.
- Redis-backed queue: drop-in for in-memory queue.
### Exploit Analysis
- Attack chaining: 7 chain templates.
- Post-exploit simulation: cookie theft, credential extraction, pivot, exfil.
- Damage scoring: 0–1000 severity score per chain.
### Billing
- Plan tiers: Free / Pro / Enterprise.
- Usage metering: daily scan counts, per-minute API rate tracking.
- Stripe integration: checkout, webhooks, subscription lifecycle.
## Benchmark
### Phase 1 — Flask Test Server
| Type | TP | FP | FN | Precision | Recall | F1 |
|---|---:|---:|---:|---:|---:|---:|
| xss | 14 | 0 | 0 | 1.000 | 1.000 | 1.000 |
| sqli | 3 | 0 | 0 | 1.000 | 1.000 | 1.000 |
| ssrf | 2 | 0 | 0 | 1.000 | 1.000 | 1.000 |
| dom_xss | 5 | 0 | 0 | 1.000 | 1.000 | 1.000 |
| TOTAL | 24 | 0 | 0 | 1.000 | 1.000 | 1.000 |
### DVWA Validation
P=1.00 R=1.00 F1=1.00
Session stability: 0 variance across 3 runs
## API
Interactive docs: `http://localhost:8000/docs` (Swagger UI).
### Authentication
X-API-Key: scan_<48_hex_chars>
Authorization: Bearer
### Key Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health | Health check |
| GET | /metrics | Prometheus metrics |
| POST | /api/v1/scans | Start a scan |
| GET | /api/v1/scans | List scans |
| GET | /api/v1/scans/{id} | Get scan result |
| POST | /api/v1/scans/{id}/cancel | Cancel scan |
| POST | /api/v1/auth/login | Login (JWT) |
| POST | /api/v1/auth/refresh | Refresh token |
| POST | /api/v1/auth/logout | Logout |
| POST | /api/v1/api-keys | Create API key |
| GET | /api/v1/billing/plan | Get current plan |
| POST | /api/v1/billing/create-checkout | Upgrade plan |
## Test Suite
# All tests
PYTHONPATH=src pytest tests/
# Specific suites
PYTHONPATH=src pytest tests/test_platform.py
PYTHONPATH=src pytest tests/test_harden.py
PYTHONPATH=src pytest tests/test_benchmark.py
PYTHONPATH=src pytest tests/test_safety.py
## Project Structure
src/web_scanner/
├── scanner.py
├── __init__.py
├── log.py
├── reporter.py
├── core/
│ ├── engine.py
│ ├── auth.py
│ ├── scorer.py
│ ├── verifier.py
│ ├── evidence.py
│ ├── signals.py
│ ├── jwt.py
│ └── taskqueue.py
├── crawler/
│ └── spider.py
├── modules/
│ ├── xss.py
│ ├── sqli.py
│ ├── ssrf.py
│ └── dom.py
├── discovery/
│ ├── spa_discovery.py
│ └── schema.py
├── metrics/
│ └── engine.py
├── intel/
│ ├── priority.py
│ ├── explain.py
│ └── report_v2.py
├── api/
│ ├── server.py
│ ├── auth.py
│ ├── dashboard.py
│ ├── queue.py
│ ├── scope.py
│ ├── schemas.py
│ ├── rate_governor.py
│ ├── rbac.py
│ ├── billing.py
│ ├── metrics.py
│ ├── tracing.py
│ └── stripe_integration.py
├── workers/
│ ├── celery_app.py
│ ├── distributed_queue.py
│ └── crawl_shard.py
├── exploit/
│ ├── chainer.py
│ ├── post_exploit.py
│ ├── credential_abuse.py
│ └── runner.py
└── proxy/
└── server.py
deploy/
├── entrypoint.sh
├── nginx.conf
├── prometheus.yml
├── otel-collector.yml
├── grafana-datasource.yml
└── env/
├── .env.dev
├── .env.stage
└── .env.prod
tests/
├── test_benchmark.py
├── test_scanner.py
├── test_safety.py
├── test_integration.py
├── test_platform.py
├── test_harden.py
├── test_stress.py
├── test_server.py
├── test_realworld_server.py
└── test_juice_shop_spa.py
## Configuration
| Variable | Default | Description |
|---|---|---|
| `SCANNER_DATABASE_URL` | `sqlite:///scanner.db` | Database URL |
| `SCANNER_REDIS_URL` | `redis://localhost:6379/0` | Redis URL |
| `SCANNER_JWT_SECRET` | *(required)* | JWT signing secret |
| `SCANNER_JWT_ALGORITHM` | `HS256` | JWT algorithm |
| `SCANNER_JWT_ACCESS_EXPIRE_MINUTES` | `15` | Access token TTL |
| `SCANNER_JWT_REFRESH_EXPIRE_DAYS` | `7` | Refresh token TTL |
| `SCANNER_MAX_CONCURRENT` | `10` | Max concurrent scans per org |
| `SCANNER_RATE_PER_MINUTE` | `30` | Max API requests per minute |
| `SCANNER_STRIPE_SECRET_KEY` | `` | Stripe API key |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `` | OpenTelemetry endpoint |
## License
MIT