LevaAverGit/mini-siem-detection-lab-v2
GitHub: LevaAverGit/mini-siem-detection-lab-v2
Stars: 0 | Forks: 0
# Mini SIEM Detection Lab
[](https://github.com/LevaAverGit/mini-siem-detection-lab/actions/workflows/ci.yml)
A lab-grade detection pipeline that simulates a SOC monitoring workflow:
event source → log ingestion → normalization → detection rules → alerts → incident grouping → report → analyst playbook.
Built to demonstrate Python backend, security engineering, and SOC workflow skills. Not a production SIEM.
## What This Project Demonstrates
- **Event pipeline design** — four log sources → unified normalized Event model → detection engine → alert/incident lifecycle
- **FastAPI backend** — ingest, list, triage, and report endpoints with Pydantic v2 models throughout
- **Detection rule engine** — 9 rules loaded from YAML, deterministic, no ML or external API
- **Incident grouping** — alerts correlated by shared source IP into incidents with timeline and entity tracking
- **SQLite persistence** — schema-first init, per-test isolation via `tmp_path`
- **CLI tool** — `ingest`, `demo`, `alerts list`, `incidents list`, `incidents report` commands
- **124 tests, 0 warnings** — unit tests for each service layer, API tests via `httpx.ASGITransport`
- **Structured reporting** — Markdown and JSON incident reports
## Architecture
Log Sources (4) Detection Engine
linux_auth.log ─┐ ┌── SSH Brute Force (threshold-based)
nginx_access.log┤ ├── Brute Force Success
windows_sec.jsonl─► Normalize ──► ├── Web Dir Scanning
cloud_audit.jsonl┘ (Event) ├── Sensitive Path Access
├── Suspicious User Agent
POST /events/ingest ├── Windows Account Created
↓ ├── Cloud SG Opened to 0.0.0.0/0
Normalization Service ├── IAM Change After Login Failure
↓ └── Multi-Source Suspicious IP
Detection Engine (9 rules)
↓ Storage
Alert List SQLite (events / alerts / incidents)
↓
Incident Grouping (by source_ip)
↓
Incident + Timeline
↓
Report (Markdown / JSON)
## Quickstart
# Install
python3.11 -m venv .venv && .venv/bin/pip install -r requirements-dev.txt
# Or with make
make install
# Run all tests
make test
# Run full demo (ingest all sample logs, show results)
make demo
## Demo
$ make demo
============================================================
Mini SIEM Detection Lab — Demo Run
============================================================
Ingested 129 events from sample_logs/linux_auth.log
Ingested 118 events from sample_logs/nginx_access.log
Ingested 12 events from sample_logs/windows_security.jsonl
Ingested 8 events from sample_logs/cloud_audit.jsonl
Total events ingested : 267
Total skipped : 0
Alerts generated : 22
Incidents created : 8
Alert breakdown by severity:
CRITICAL : 5
HIGH : 11
MEDIUM : 6
Incidents:
[INC-0001] [CRITICAL] Critical Incident — 203.0.113.99
[INC-0002] [HIGH] High Incident — 192.0.2.150
...
## API Overview
| Endpoint | Method | Description |
|---|---|---|
| `/health` | GET | Liveness check |
| `/events/ingest` | POST | Ingest raw log content |
| `/events/` | GET | List events (filter: `source_type`, `limit`) |
| `/alerts/` | GET | List alerts (filter: `status`, `severity`) |
| `/alerts/{id}/status` | PATCH | Update alert triage status |
| `/incidents/` | GET | List incidents (filter: `status`, `severity`) |
| `/incidents/{id}` | GET | Get incident detail |
| `/incidents/{id}/report.md` | GET | Markdown report |
| `/incidents/{id}/report.json` | GET | JSON report |
See `docs/API_OVERVIEW.md` for request/response examples.
**Start the API:**
make run-api
# FastAPI available at http://127.0.0.1:8000
# Docs at http://127.0.0.1:8000/docs
## CLI Usage
# Ingest a log file
python -m cli.main ingest --source linux_auth --file sample_logs/linux_auth.log
# Run full demo
python -m cli.main demo
# List alerts
python -m cli.main alerts list
# List incidents
python -m cli.main incidents list
# Export incident report
python -m cli.main incidents report --id INC-0001 --format md --output reports/INC-0001.md
python -m cli.main incidents report --id INC-0001 --format json
## Detection Rules
| Rule ID | Source | Logic | Severity | MITRE Technique |
|---|---|---|---|---|
| `SSH_BRUTE_FORCE` | linux_auth | >= 10/30/100 failed SSH logins from same IP | medium/high/critical | T1110.001 |
| `SSH_BRUTE_FORCE_SUCCESS` | linux_auth | Same IP: >= 5 failures then accepted login | critical | T1078 |
| `WEB_DIR_SCAN` | nginx_access | >= 30/80 HTTP 404s from same IP | medium/high | T1595.002 |
| `SENSITIVE_PATH_ACCESS` | nginx_access | Access to /.env, /.git, /admin, /phpmyadmin | medium/high | T1083 |
| `SUSPICIOUS_USER_AGENT` | nginx_access | sqlmap, nikto, gobuster, masscan, dirbuster | medium/high | T1595 |
| `WIN_ACCOUNT_CREATED_AFTER_FAILURES` | windows_security | 4720 follows multiple 4625 on same host | high | T1136.001 |
| `CLOUD_SG_OPEN` | cloud_audit | SG rule 0.0.0.0/0 on port 22/3389/5432/3306 | high/critical | T1562.007 |
| `CLOUD_IAM_CHANGE_AFTER_FAILURE` | cloud_audit | IAM policy change by user with recent login failures | high | T1098 |
| `MULTI_SOURCE_SUSPICIOUS_IP` | all | Same IP in suspicious events across 2+ source types | critical | Multiple |
All rules are configurable via `app/rules/default_rules.yml`. Each rule includes a MITRE ATT&CK tactic, technique, and mapping confidence (`direct` or `approximate`). See `docs/DETECTION_RULES.md` for per-rule mapping notes.
## Sigma-Style Rules
The `sigma_rules/` directory contains Sigma-format YAML examples that map the lab's detection logic to the [Sigma](https://sigmahq.io/) open standard:
| Sigma rule file | Lab rule mapped | MITRE technique |
|---|---|---|
| `sigma_rules/ssh_brute_force.yml` | SSH_BRUTE_FORCE | T1110.001 |
| `sigma_rules/web_path_traversal_scan.yml` | WEB_DIR_SCAN, SENSITIVE_PATH_ACCESS, SUSPICIOUS_USER_AGENT | T1595.002, T1083, T1595 |
| `sigma_rules/windows_failed_logons_account_creation.yml` | WIN_ACCOUNT_CREATED_AFTER_FAILURES | T1136.001, T1078 |
## Example Incident Report
`reports/example_incident_report.md` contains a synthetic SOC analyst incident report demonstrating the output format for an SSH brute-force-with-compromise scenario:
- Full timeline from first failed login to successful compromise
- MITRE ATT&CK tactic/technique chain (T1110.001 → T1078)
- Evidence from correlated rules
- Recommended response steps
- False positive assessment
## Sample Incident Workflow
1. Ingest logs
python -m cli.main ingest --source linux_auth --file sample_logs/linux_auth.log
2. Run the rest
python -m cli.main ingest --source nginx_access --file sample_logs/nginx_access.log
python -m cli.main ingest --source windows_security --file sample_logs/windows_security.jsonl
python -m cli.main ingest --source cloud_audit --file sample_logs/cloud_audit.jsonl
3. Review incidents
python -m cli.main incidents list
4. Export report
python -m cli.main incidents report --id INC-0001 --format md --output reports/report.md
5. Triage alert (via API)
curl -X PATCH http://127.0.0.1:8000/alerts/{alert_id}/status \
-H "Content-Type: application/json" \
-d '{"status": "triaged"}'
## Tests
make test # 124 tests
| Test module | Coverage |
|---|---|
| `test_normalization.py` | Linux auth, Nginx, Windows, Cloud parsers; malformed lines; file-level ingestion |
| `test_detection_engine.py` | All 9 rules; threshold boundaries; severity escalation; multi-source correlation |
| `test_incident_grouping.py` | IP grouping; severity escalation; timeline; entity collection; score cap |
| `test_storage_service.py` | Insert/list/update for events/alerts/incidents; DB isolation via `tmp_path` |
| `test_api_events.py` | Health, ingest, list endpoints; source_type filter |
| `test_api_alerts.py` | Alert list, status update, incidents list, report MD/JSON |
| `test_cli.py` | Demo, ingest, report export |
| `test_report_service.py` | Markdown sections, JSON structure, AI trace check |
## Project Structure
mini-siem-detection-lab/
├── app/
│ ├── main.py FastAPI app factory, lifespan
│ ├── api/
│ │ ├── routes_events.py POST /events/ingest, GET /events/
│ │ ├── routes_alerts.py GET/PATCH /alerts/
│ │ ├── routes_incidents.py GET /incidents/, reports
│ │ └── routes_health.py GET /health
│ ├── core/
│ │ ├── config.py pydantic-settings (SIEM_ prefix)
│ │ └── logging.py Structured JSON logging
│ ├── db/
│ │ ├── database.py SQLite connection, init_db
│ │ └── schema.sql CREATE TABLE statements
│ ├── models/
│ │ └── schemas.py Event, Alert, Incident, Pydantic v2
│ ├── services/
│ │ ├── normalization_service.py 4 source parsers → unified Event
│ │ ├── detection_engine.py 9 detection rules → Alert list
│ │ ├── incident_grouping_service.py Alert → Incident (by IP)
│ │ ├── storage_service.py SQLite CRUD
│ │ ├── report_service.py Markdown + JSON report generation
│ │ └── ingestion_service.py File-level ingest orchestration
│ └── rules/
│ └── default_rules.yml Detection rule thresholds and config
├── cli/
│ └── main.py CLI: ingest, demo, alerts, incidents
├── sample_logs/
│ ├── linux_auth.log 129 synthetic Linux auth events
│ ├── nginx_access.log 118 synthetic Nginx access events
│ ├── windows_security.jsonl 12 synthetic Windows Security events
│ └── cloud_audit.jsonl 8 synthetic cloud audit events
├── tests/ 124 tests
├── docs/ 11 documentation files
├── .github/workflows/ci.yml GitHub Actions CI
├── Makefile
├── pyproject.toml
└── requirements.txt
## What This Is Not
- Not a production SIEM
- Not a replacement for Wazuh, MaxPatrol SIEM, KUMA, Splunk, or ELK
- Not a real-time distributed event processing system
- Not an ML/UEBA anomaly detection system
- Not an EDR, DLP, PAM, or NGFW
- Not agent-based log collection
- Not a legal compliance product
## Limitations
See `docs/LIMITATIONS.md` for a full list. Key points:
- All data is synthetic — no real systems are involved
- Detection rules use static thresholds without time-windowing
- Single-threaded processing — not designed for high-volume ingestion
- No authentication on API endpoints (local lab use only)
- No real-time streaming
## Skills Demonstrated
See `docs/BIGTECH_SKILLS_MAPPING.md` for a full competency mapping
and `docs/SOC_INTERVIEW_DEFENSE.md` for interview talking points and scope.
Demonstrates junior/junior+ readiness for security tooling, SOC automation, and Python backend tasks:
- Python 3.11, FastAPI, Pydantic v2, SQLite, pytest
- Event-driven pipeline thinking
- Detection rule engineering with rule-level MITRE ATT&CK mapping (direct/approximate confidence)
- SOC alert lifecycle (new → triaged → escalated → closed)
- Structured Markdown and JSON reporting
- CLI tool design
- Test isolation with `tmp_path` and `ASGITransport`
## License
MIT