Sahil-K39/agentic-soc-engine

GitHub: Sahil-K39/agentic-soc-engine

利用大语言模型实时分析系统日志并自动分类网络威胁、映射 MITRE ATT&CK 框架，支持一键生成防火墙修复规则的安全运营中心。

Stars: 0 | Forks: 0

# 🛡️ Antigravity SOC — AI 驱动的网络威胁情报引擎 **实时威胁检测、MITRE ATT&CK 分类及自动化修复 — 由 Google Gemini AI 提供支持。** ![Python](https://img.shields.io/badge/Python-3.9+-3776AB?style=for-the-badge&logo=python&logoColor=white) ![Gemini](https://img.shields.io/badge/Google_Gemini-AI_Agent-4285F4?style=for-the-badge&logo=google&logoColor=white) ![Pydantic](https://img.shields.io/badge/Pydantic-v2-E92063?style=for-the-badge&logo=pydantic&logoColor=white) ![Security](https://img.shields.io/badge/Security-Hardened-2ECC71?style=for-the-badge&logo=shield&logoColor=white) ![MITRE](https://img.shields.io/badge/MITRE_ATT%26CK-Mapped-FF6F00?style=for-the-badge) *这是一个生产级的安全运营中心 (SOC) 仪表板，它使用自主 AI agent 分析系统日志，根据 MITRE ATT&CK 框架对网络威胁进行分类，并生成一键防火墙修复方案 — 所有这些都在本地运行。*

## 📋 目录 - [核心功能](#-key-features) - [实时演示预览](#-live-demo-preview) - [系统架构](#-system-architecture) - [AI / ML 流水线](#-ai--ml-pipeline) - [技术栈](#-tech-stack) - [设置与安装](#-setup--installation) - [运行系统](#-running-the-system) - [API 参考](#-api-reference) - [安全加固](#-security-hardening) - [项目结构](#-project-structure) - [端到端工作原理](#-how-it-works--end-to-end) - [未来路线图](#-future-roadmap) - [许可证](#-license) ## ✨ 核心功能 | 类别 | 功能 | |----------|---------| | 🤖 **AI Agent** | 自主的 Gemini 3.5 Flash agent 分析原始 syslog 事件，并通过约束解码返回结构化威胁情报 | | 🧠 **威胁分类** | 实时分类 SSH 暴力破解、SQL 注入、XSS、目录遍历、权限提升以及 Web 身份验证攻击 | | 🎯 **MITRE ATT&CK 映射** | 每个检测到的威胁都映射到其 MITRE ATT&CK 技术 ID（T1110、T1190、T1078、T1548），并包含完整的技术名称 | | 🔥 **一键修复** | 仪表板只需按下一个按钮，即可生成并执行 `iptables` 防火墙拦截规则 | | 📊 **实时仪表板** | 精致的深色模式 SOC 仪表板，具有实时威胁推送、agent 推理轨迹以及事件工件卡片 | | 🛡️ **安全加固** | 防 XSS 的 DOM 渲染、CORS 锁定、速率限制、目录遍历保护、payload 大小限制、原子文件写入 | | 🔄 **优雅降级** | AI agent → 基于规则的回退分类器流水线；系统无论有无 API 密钥均可运行 | | 📈 **实时统计 API** | 提供 RESTful endpoint 用于查看威胁分布、结合 GeoIP 的顶级攻击者 IP、登录成功率以及被拦截的 IP 跟踪 | ## 🖥️ 实时演示预览 ``` ┌─────────────────────────────────────────────────────────────────────┐ │ ANTIGRAVITY // THREAT INTELLIGENCE ENGINE ● Agent Active │ ├─────────────────────┬──────────────────────┬────────────────────────┤ │ // SYSTEM FEED │ // AGENT REASONING │ // INCIDENT ARTIFACT │ │ │ │ │ │ [CRITICAL] │ ● Step 01: Trace │ Attack: SQL Injection │ │ 195.133.40.12 │ Isolated │ Source: 195.133.40.12 │ │ SQL Injection on │ Evaluating input │ MITRE: T1190 │ │ backend API │ against signature │ │ │ │ indices... │ $ iptables -A INPUT │ │ [INFO] 127.0.0.1 │ │ -s 195.133.40.12 │ │ Health check 200 OK │ ● Step 02: Strategy │ -j DROP │ │ │ Staged │ │ │ │ Writing recovery │ [APPROVE CONTAINMENT] │ │ │ blocks to JSON... │ │ └─────────────────────┴──────────────────────┴────────────────────────┘ ``` ## 🏗️ 系统架构 ``` ┌──────────────────────────────────────────────┐ │ PRODUCER LAYER │ │ │ mock_auth.log ──►│ agent_platform.py │ (syslog events) │ ┌─────────────┐ ┌───────────────────┐ │ │ │ Log Watcher │───►│ Gemini 3.5 Flash │ │ │ │ (async tail) │ │ AI Agent │ │ │ └─────────────┘ │ (structured out) │ │ │ │ └────────┬──────────┘ │ │ │ fallback │ │ │ ▼ │ │ │ ┌─────────────┐ │ │ │ │ Rule-Based │ │ │ │ │ Classifier │ │ │ │ └──────┬──────┘ │ │ │ └──────────┬─────────┘ │ │ ▼ │ │ mitigation_report.json │ └──────────────────────┬───────────────────────┘ │ ┌──────────────────────┴───────────────────────┐ │ CONSUMER LAYER │ │ │ │ app.py (HTTP Server) │ │ ┌────────────────┐ ┌───────────────────┐ │ │ │ Log Parser │ │ Rate Limiter │ │ │ │ (7 regex │ │ IP Validator │ │ │ │ patterns) │ │ Path Sanitiser │ │ │ ├────────────────┤ ├───────────────────┤ │ │ │ GET / │ │ POST /api/block │ │ │ │ GET /api/logs │ │ (block / unblock) │ │ │ │ GET /api/stats│ │ │ │ │ └────────────────┘ └───────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ Threat Simulator (background thread)│ │ │ │ Generates realistic attack traffic │ │ │ │ every 3-6 seconds │ │ │ └─────────────────────────────────────┘ │ └──────────────────────┬───────────────────────┘ │ ┌──────────────────────┴───────────────────────┐ │ PRESENTATION LAYER │ │ │ │ index.html (Tailwind CSS Dashboard) │ │ • Polls /mitigation_report.json every 3s │ │ • XSS-safe DOM rendering (no innerHTML) │ │ • Pauses polling when tab is hidden │ │ • One-click firewall block via POST │ └──────────────────────────────────────────────┘ ``` ## 🤖 AI / ML 流水线本项目实现了一个用于实时网络威胁分析的**两层 AI 推理流水线**： ### 第一层 — 大型语言模型 (Gemini 3.5 Flash) ``` # 一个自主 AI agent 接收原始 syslog 文本并返回结构化 threat intel config = LocalAgentConfig( model="gemini-3.5-flash", response_schema=MitigationReport, # constrained decoding → valid JSON ) security_analyst = Agent(config) # 对于每个可疑日志事件： response = await security_analyst.chat( "Analyze this log event: 'Failed password for root from 195.133.40.12 port 22 ssh2'. " "Map to MITRE ATT&CK. Generate mitigation command." ) data = await response.structured_output() # → {"status": "Active Threat", "ip": "195.133.40.12", # "vector": "SSH Brute Force", "mitre_id": "T1110 - Brute Force", # "patch_command": "iptables -A INPUT -s 195.133.40.12 -j DROP"} ``` **LLM 的作用：** - **自然语言理解** — 解析没有固定 schema 的非结构化 syslog 文本 - **多标签威胁分类** — 从数十种可能的攻击向量中识别出攻击类型 - **知识检索** — 将攻击映射到 MITRE ATT&CK 分类法（在预训练期间学习） - **约束解码** — `response_schema` 强制 transformer 输出与 Pydantic 模型匹配的有效 JSON ### 第二层 — 基于规则的回退分类器当 API 不可用时，带有精心整理的 MITRE 知识库的**基于关键字的分类器**会提供确定性的结果： ``` MITRE_ATTACK_DB = { "sshd_fail": {"vector": "SSH Brute Force", "mitre_id": "T1110"}, "web_sql": {"vector": "SQL Injection", "mitre_id": "T1190"}, "web_traversal": {"vector": "Directory Traversal", "mitre_id": "T1190"}, "sudo_abuse": {"vector": "Privilege Escalation", "mitre_id": "T1548.003"}, "sshd_accept_external": {"vector": "Suspicious External Login","mitre_id": "T1078"}, } ``` ### 使用的 ML 技术 | 技术 | 位置 | 目的 | |-----------|-------|---------| | **LLM 推理** | `agent_platform.py` | 通过 Gemini transformer 进行威胁分类 | | **约束解码** | `response_schema=MitigationReport` | 强制模型输出结构化的 JSON | | **特征提取** | `constants.py`（7 个 regex 模式） | 从原始文本中提取 IP、用户、端口、攻击类型 | | **模式分类** | `local_fallback_parse()` | 带有查找表的基于关键字的分类器 | | **确定性 GeoIP** | `get_geoip_details()` | 用于可重现演示的最后一位八位组模映射 | ## 🛠️ 技术栈 | 层级 | 技术 | 目的 | |-------|-----------|---------| | **AI 引擎** | Google Antigravity SDK + Gemini 3.5 Flash | 自主威胁分析 agent | | **后端** | Python 3.9+ / `http.server` (标准库) | 零依赖的 HTTP 服务器 | | **数据模型** | Pydantic v2 | 带有验证的类型安全 schema | | **前端** | Tailwind CSS + Vanilla JS | 响应式深色模式 SOC 仪表板 | | **配置** | python-dotenv | 安全的环境变量管理 | | **并发** | `asyncio` (agent) + `threading` (服务器) | 非阻塞日志追踪 + 后台模拟 | ## 🚀 设置与安装 ### 前置条件 - Python 3.9 或更高版本 - pip (Python 包管理器) - 一个 Gemini API 密钥 *（可选 — 系统在没有它的情况下也可以在回退模式下运行）* ### 安装说明 ``` # 1. 克隆仓库 git clone https://github.com/yourusername/cyber-threat-dashboard.git cd cyber-threat-dashboard # 2. 创建并激活虚拟环境 python3 -m venv .venv source .venv/bin/activate # macOS / Linux # .venv\Scripts\activate # Windows # 3. 安装依赖项 pip install google-antigravity python-dotenv pydantic # 4. 配置环境（可选 — 启用 AI agent） cp .env.example .env # 编辑 .env 并添加您的 GEMINI_API_KEY ``` ### 验证安装 ``` python3 -c "import google.antigravity; from pydantic import BaseModel; print('✅ All dependencies OK')" ``` ## ▶️ 运行系统 ### 选项 A — 仅仪表板（快速开始） ``` python3 app.py ``` 在浏览器中打开 **http://localhost:8080**。内置的威胁模拟器将自动生成真实的攻击流量。 ### 选项 B — 仪表板 + AI Agent（完整流水线） ``` # 终端 1：启动 dashboard 服务器 python3 app.py # 终端 2：启动 AI agent 监视器 python3 agent_platform.py ``` AI agent 将追踪 `mock_auth.log`，分析每个威胁事件，并实时更新 `mitigation_report.json`。 ## 📡 API 参考 | 方法 | Endpoint | 描述 | 响应 | |--------|----------|-------------|----------| | `GET` | `/` | 提供 SOC 仪表板 UI | `text/html` | | `GET` | `/api/logs` | 返回所有已解析的日志事件（最新优先） | `[{timestamp, ip, status, threat_level, ...}]` | | `GET` | `/api/stats` | 汇总的威胁统计数据 | `{total_alerts, failed_logins, threat_level_distribution, top_ips, ...}` | | `GET` | `/mitigation_report.json` | 最新的 AI 生成的缓解报告 | `{status, ip, vector, mitre_id, patch_command}` | | `POST` | `/api/block` | 拦截或解除拦截 IP 地址 | `{status: "success", action, ip}` | ### 示例 — 拦截攻击者 IP ``` curl -X POST http://localhost:8080/api/block \ -H "Content-Type: application/json" \ -d '{"ip": "195.133.40.12", "action": "block"}' # 响应：{"status": "success", "action": "block", "ip": "195.133.40.12"} ``` ## 🔒 安全加固本项目实现了具有多层安全防护的**纵深防御**： | 漏洞 | 缓解措施 | 文件 | |--------------|------------|------| | **XSS (跨站脚本攻击)** | 所有动态内容均使用 `textContent` / `createElement` — 零 `innerHTML` | `index.html` | | **目录遍历** | 通过 `os.path.realpath()` 解析路径并根据 `BASE_DIR` 进行验证 | `app.py` | | **CORS 滥用** | Origin 锁定为 `http://localhost:8080`（非通配符 `*`） | `app.py` | | **速率限制** | 滑动窗口速率限制器：每个客户端 IP 每 60 秒 120 个请求 | `app.py` | | **Header 欺骗** | 速率限制器直接使用 `client_address[0]` — 从不信任 `X-Forwarded-For` | `app.py` | | **内存耗尽** | POST body 上限为 4096 字节；过大的 payload 将以 `413` 状态拒绝 | `app.py` | | **IP 验证** | 所有 IP 在处理前均通过 `ipaddress.ip_address()` 验证 | `app.py` | | **API 密钥泄露** | `.env` 被 git 忽略；提供带有占位符的 `.env.example` | `.gitignore` | | **原子写入** | 缓解报告写入 `.tmp`，然后使用 `os.replace()` 以确保崩溃安全 | `agent_platform.py` | | **回环保护** | 明确禁止拦截 `127.0.0.1` 和 `192.168.1.100` | `app.py` | | **优雅关闭** | SIGTERM 处理程序用于清理并终止服务器 | `app.py` | | **标签页可见性** | 当浏览器标签页隐藏时暂停轮询，以减少资源浪费 | `index.html` | ## 📂 项目结构 ``` cyber-threat-dashboard/ │ ├── app.py # HTTP server: routing, log parsing, threat simulator ├── agent_platform.py # AI agent: Gemini-powered log watcher & threat classifier ├── constants.py # Centralised config: paths, regex, MITRE DB, GeoIP, tunables ├── models.py # Pydantic v2 schemas: LogEntry, MitigationReport ├── index.html # Real-time SOC dashboard (Tailwind CSS) │ ├── .env.example # Template for environment variables ├── .env # Your API key (git-ignored) ├── .gitignore # Excludes secrets, logs, caches, build artifacts ├── .flake8 # Linter configuration ├── mypy.ini # Type checker configuration ├── README.md # This document │ ├── mock_auth.log # Runtime: simulated syslog events (git-ignored) ├── mitigation_report.json # Runtime: latest AI threat assessment (git-ignored) │ └── .agents/ └── rules/ └── security-guardrails.md # Agent execution constraints & policies ``` ## 🔄 端到端工作原理 ``` 1. SIMULATE 2. DETECT 3. CLASSIFY ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Threat │ writes log │ Log Watcher │ trigger │ Gemini Agent │ │ Simulator │──────────────►│ (async tail) │──keyword──────►│ (LLM) │ │ (3-6s loop) │ │ │ detected │ │ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌──────────────────────────────┘ │ structured output ▼ 4. REPORT 5. SERVE 6. REMEDIATE ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Atomic JSON │ read by │ HTTP Server │ renders │ Dashboard │ │ File Write │──────────────►│ /api/stats │──────────────►│ UI │ │ │ │ /api/logs │ │ [BLOCK IP] │ └──────────────┘ └──────────────┘ └──────────────┘ ``` **分步说明：** 1. **威胁模拟** — `app.py` 中的后台线程每 3-6 秒生成一次真实的 syslog 事件（SSH 暴力破解、SQL 注入、XSS、权限提升），并将它们写入 `mock_auth.log`。 2. **日志检测** — `agent_platform.py` 使用异步 I/O 持续追踪日志文件。当检测到触发关键字（`Failed password`、`WARNING:`、`FAIL LOGIN`、`Accepted password`）时，该行将被转发以进行分析。 3. **AI 分类** — Gemini 3.5 Flash agent 接收原始日志行，识别攻击向量，将其映射到 MITRE ATT&CK 技术，并生成修复命令。如果 AI 不可用，基于规则的分类器将处理此任务。 4. **报告持久化** — 结构化评估以原子方式写入 `mitigation_report.json`（写入 `.tmp`，然后使用 `os.replace()`）。 5. **API 服务** — `app.py` 提供仪表板 UI 和 RESTful endpoint。日志解析器使用 7 个已编译的 regex 模式从原始 syslog 行中提取结构化字段。统计数据会动态进行聚合。 6. **修复** — 仪表板显示威胁评估，并提供一键式 **"Approve Active Containment Patch"**（批准主动遏制补丁）按钮。点击它会发送 `POST /api/block` 请求，该请求将攻击者 IP 添加到拦截集合中，并记录防火墙规则事件。 ## 🗺️ 未来路线图 - [ ] **WebSocket / SSE** — 用实时推送更新替换轮询 - [ ] **持久化存储** — 将威胁历史记录存储在 SQLite 或 Firestore 中 - [ ] **多 Agent 架构** — 针对网络、端点和应用程序威胁的专用 agent - [ ] **真实日志集成** — 指向实时的 `/var/log/auth.log` 以进行生产部署 - [ ] **威胁热力图** — 使用 GeoIP 坐标对攻击者来源进行地理可视化 - [ ] **SIEM 集成** — 将威胁事件导出到 Splunk、Elastic 或 Google Chronicle - [ ] **身份验证** — 为多用户 SOC 环境添加登录流程 - [ ] **CI/CD 流水线** — 自动化测试、代码检查和部署 ## 📄 许可证 MIT 许可证 — 随意 fork、修改和部署。

**由 🛡️ Sahil 构建** *自主 AI × 网络安全 × 实时情报*

标签：AI代理, CISA项目, DLL 劫持, Python, Web报告查看器, 大语言模型, 威胁情报, 安全运营中心, 开发者工具, 无后门, 网络映射, 自动化防御, 逆向工具