shehrozmajeed/soc-log-analyzer

GitHub: shehrozmajeed/soc-log-analyzer

一个轻量级的 AI 驱动 SOC 日志分析平台，结合规则检测与 Isolation Forest 异常检测，对 Apache、SSH、Syslog 日志进行自动化威胁发现与可视化呈现。

Stars: 0 | Forks: 0

``` ███████╗ ██████╗ ██████╗ ██╗ ██████╗ ██████╗ ██╔════╝██╔═══██╗██╔════╝ ██║ ██╔═══██╗██╔════╝ ███████╗██║ ██║██║ ██║ ██║ ██║██║ ███╗ ╚════██║██║ ██║██║ ██║ ██║ ██║██║ ██║ ███████║╚██████╔╝╚██████╗ ███████╗╚██████╔╝╚██████╔╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ A N A L Y Z E R ``` ### **AI 驱动的安全运营中心 — 日志智能平台** *解析。检测。可视化。响应。*
[🚀 快速开始](#-quick-start) · [📐 系统架构](#-architecture) · [🔍 检测引擎](#-detection-engine) · [📊 控制面板](#-dashboard) · [🌐 API 参考](#-api-reference) · [⚙️ 配置说明](#%EF%B8%8F-configuration)

## 📌 概述 **SOC Log Analyzer** 是一个生产级、全栈的网络安全工具，旨在将原始服务器日志转化为可操作的威胁情报。它结合了确定性的基于规则的检测与无监督机器学习技术，能够揭示暴力破解攻击、DDoS 模式、可疑的 HTTP 行为以及零日行为异常——所有这些都通过实时的分析师控制面板进行展示。该系统被设计为**商业 SIEM 平台（Splunk、IBM QRadar、Microsoft Sentinel）的轻量级、透明替代方案**，适用于学术环境、中小型组织以及安全研究。 ``` Raw Logs ──► Parser ──► Detection Engine ──► Risk Scorer ──► Dashboard (3 formats) (Rules + ML) (0–100 score) (Streamlit) ``` ### 为什么会有这个项目 ## 🎯 核心能力 | 能力 | 描述 | |---|---| | **多格式日志解析** | Apache Combined Log、OpenSSH auth.log、RFC 3164 syslog —— 自动检测 | | **基于规则的检测** | 5 个确定性检测器，采用滑动时间窗口分析 | | **ML 异常检测** | 基于 7 维度 IP 行为特征向量的 Isolation Forest 算法 | | **风险评分** | 定量 0–100 分数 + 定性 LOW / MEDIUM / HIGH 严重级别 | | **SOC 控制面板** | 5 页面的 Streamlit UI，包含 Plotly 图表、告警分诊和实时日志流 | | **GeoIP 丰富** | 通过 ip-api.com 进行国家/城市/ISP 查询，带有 LRU 缓存 | | **REST API** | 12 个带有文档的 FastAPI 端点，支持 OpenAPI/Swagger UI | | **导出** | 用于事件文档的 CSV 和 PDF 报告生成 | | **实时流** | 用于实时日志重放的 Server-Sent Events (SSE) 端点 | ## 📐 架构 ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ SOC LOG ANALYZER │ ├──────────────────────────────┬──────────────────────────────────────────────┤ │ STREAMLIT DASHBOARD │ FASTAPI BACKEND │ │ │ │ │ ┌────────────────────────┐ │ ┌──────────────────────────────────────┐ │ │ │ 📊 Dashboard │ │ │ API LAYER │ │ │ │ 🚨 Alert Management │◄─┼──►│ POST /logs/upload │ │ │ │ 📋 Log Explorer │ │ │ GET /logs/stats │ │ │ │ 📤 Upload Center │ │ │ GET /logs/stream (SSE) │ │ │ │ 📈 Reports & Export │ │ │ GET /alerts/ │ │ │ └────────────────────────┘ │ │ GET /alerts/summary │ │ │ │ │ PATCH /alerts/{id}/resolve │ │ │ │ │ GET /reports/alerts/csv|pdf │ │ │ │ └─────────────┬────────────────────────┘ │ │ │ │ │ │ │ ┌─────────────▼────────────────────────┐ │ │ │ │ PARSER LAYER │ │ │ │ │ │ │ │ │ │ ┌──────────┐ ┌────────┐ ┌───────┐ │ │ │ │ │ │ Apache │ │ SSH │ │Syslog │ │ │ │ │ │ │ Parser │ │ Parser │ │Parser │ │ │ │ │ │ └──────────┘ └────────┘ └───────┘ │ │ │ │ │ Auto-Detection │ │ │ │ │ Normalizer + GeoIP │ │ │ │ └─────────────┬────────────────────────┘ │ │ │ │ │ │ │ ┌─────────────▼────────────────────────┐ │ │ │ │ DETECTION ENGINE │ │ │ │ │ │ │ │ │ │ RULE ENGINE ML ENGINE │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ │ Brute-Force │ │ Isolation │ │ │ │ │ │ │ DDoS Detect │ │ Forest │ │ │ │ │ │ │ Status Spike │ │ (7-feature │ │ │ │ │ │ │ Susp. Paths │ │ IP vectors) │ │ │ │ │ │ │ Syslog Anom. │ └──────────────┘ │ │ │ │ │ └──────────────┘ │ │ │ │ │ Risk Scorer (0–100) │ │ │ │ └─────────────┬────────────────────────┘ │ │ │ │ │ │ │ ┌─────────────▼────────────────────────┐ │ │ │ │ PERSISTENCE LAYER │ │ │ │ │ SQLite · SQLAlchemy ORM │ │ │ │ │ log_entries · alerts │ │ │ │ └──────────────────────────────────────┘ │ └──────────────────────────────┴──────────────────────────────────────────────┘ ``` ### 文件夹结构 ``` soc-log-analyzer/ │ ├── backend/ # Python FastAPI application │ ├── main.py # App entry point, CORS, startup lifecycle │ ├── config.py # All thresholds, constants, env overrides │ ├── database.py # SQLAlchemy models: LogEntry, Alert │ │ │ ├── parser/ │ │ ├── log_parser.py # Apache / SSH / syslog regex parsers │ │ └── normalizer.py # ORM conversion + GeoIP enrichment │ │ │ ├── detection/ │ │ ├── rule_engine.py # 5 rule-based threat detectors │ │ ├── ml_engine.py # Isolation Forest anomaly detection │ │ └── risk_scorer.py # Score normalization + alert persistence │ │ │ ├── api/ │ │ ├── routes_logs.py # /logs/* endpoints (upload, list, stats, stream) │ │ ├── routes_alerts.py # /alerts/* endpoints (CRUD, resolve, summary) │ │ └── routes_reports.py # /reports/* (CSV and PDF export) │ │ │ └── utils/ │ ├── geoip.py # ip-api.com lookup with LRU cache │ └── logger.py # Structured logging with consistent format │ ├── frontend/ │ └── dashboard.py # Streamlit SOC dashboard (5 pages) │ ├── data/ │ ├── sample_apache.log # Apache access log with attack patterns │ ├── sample_ssh.log # SSH auth log with brute-force sequences │ └── sample_syslog.log # Syslog with privilege escalation events │ ├── reports/ # Generated CSV / PDF exports (auto-created) ├── soc_alerts.db # SQLite database (auto-created on first run) ├── requirements.txt └── README.md ``` ## 🚀 快速开始 ### 前提条件 | 要求 | 版本 | 备注 | |---|---|---| | Python | ≥ 3.11 | 推荐 3.12 | | pip | ≥ 23.0 | 随 Python 附带 | | Git | 任意 | 用于克隆仓库 | ### 1 · 克隆仓库 ``` git clone https://github.com/your-org/soc-log-analyzer.git cd soc-log-analyzer ``` ### 2 · 创建虚拟环境 ``` # macOS / Linux python -m venv .venv source .venv/bin/activate # Windows (PowerShell) python -m venv .venv .venv\Scripts\Activate.ps1 ``` ### 3 · 安装依赖 ``` pip install -r requirements.txt ``` ### 4 · 启动后端 API ``` uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 ``` 启动时您将看到： ``` INFO backend.main: SOC Log Analyzer API started. DB initialised. INFO Uvicorn running on http://0.0.0.0:8000 ``` 访问 **[http://localhost:8000/docs](http://localhost:8000/docs)** 以查看交互式 Swagger UI。 ### 5 · 启动控制面板打开第二个终端（并激活虚拟环境）： ``` streamlit run frontend/dashboard.py ``` 访问 **[http://localhost:8501](http://localhost:8501)** 以查看 SOC 控制面板。 ### 6 · 加载示例数据 **通过控制面板：** 导航到 **📤 上传日志** 并点击三个示例数据集按钮之一： - 🌐 Apache 日志 - 🔑 SSH 日志 - ⚙️ Syslog **通过 cURL：** ``` # 上传 Apache log curl -X POST "http://localhost:8000/logs/upload" \ -F "file=@data/sample_apache.log" \ -F "log_type=apache" # 上传 SSH log curl -X POST "http://localhost:8000/logs/upload" \ -F "file=@data/sample_ssh.log" \ -F "log_type=ssh" # 上传 Syslog curl -X POST "http://localhost:8000/logs/upload" \ -F "file=@data/sample_syslog.log" \ -F "log_type=syslog" ``` **示例 Apache 日志的预期输出：** ``` { "filename": "sample_apache.log", "log_type": "apache", "lines_parsed": 38, "entries_saved": 38, "alerts_generated": 9, "alert_summary": { "HIGH": 5, "MEDIUM": 4, "LOW": 0 } } ``` ## 🔍 检测引擎 ### 基于规则的检测器所有基于规则的检测器均使用 **滑动时间窗口** 算法。对于每个唯一的源 IP，事件会按时间顺序排序，并在时间线上滑动固定时长的窗口。如果任何窗口内的事件计数达到或超过阈值，则会生成告警。 #### 暴力破解登录检测监控来自单一 IP 的快速连续认证失败序列，涵盖 SSH（`Failed password`、`Invalid user`）和 HTTP（返回 `401` 的 `POST /login`）。 | 参数 | 默认值 | 描述 | |---|---|---| | `BRUTE_FORCE_THRESHOLD` | `5` | 触发告警的最小失败登录次数 | | `BRUTE_FORCE_WINDOW_SECS` | `60` | 滑动时间窗口（秒） | | 基础风险分 | `70 + (count × 2)` | 上限为 100 | **触发示例：** ``` 10.0.0.45 - [13:55:40] "POST /login" 401 10.0.0.45 - [13:55:41] "POST /login" 401 10.0.0.45 - [13:55:42] "POST /login" 401 ← 3 10.0.0.45 - [13:55:43] "POST /login" 401 10.0.0.45 - [13:55:44] "POST /login" 401 ← 5 → BRUTE_FORCE HIGH (score: 80) ``` #### DDoS / 洪水攻击检测检测极短时间窗口内来自单一来源的异常高 HTTP 请求速率，表明存在自动化洪水攻击。 | 参数 | 默认值 | 描述 | |---|---|---| | `DDOS_THRESHOLD` | `10` | 触发告警的每窗口请求数 | | `DDOS_WINDOW_SECS` | `2` | 滑动时间窗口（秒） | | 基础风险分 | `60 + (count × 3)` | 上限为 100 | #### HTTP 状态码激增识别产生重复 4xx 或 5xx 错误响应的 IP，表明存在目录遍历扫描、撞库或配置错误的攻击工具。 | 参数 | 默认值 | 描述 | |---|---|---| | `STATUS_SPIKE_THRESHOLD` | `5` | 每个窗口内相同状态码的数量 | | `STATUS_SPIKE_WINDOW_SECS` | `60` | 滑动时间窗口（秒） | | 基础风险分 | `50 + (count × 5)` | 上限为 100 | #### 可疑请求路径检测将每个 HTTP 请求 URI 对照已知危险模式库进行扫描。匹配项会立即告警（无需时间窗口）。 ``` Detected Patterns ───────────────── .env → Exposed environment file probe wp-admin → WordPress admin panel scan phpmyadmin → Database admin interface probe /.git/ → Source code repository exposure /etc/passwd → Unix credential file traversal /etc/shadow → Shadow password file traversal UNION SELECT → SQL injection attempt