zhadyz/AI_SOC

GitHub: zhadyz/AI_SOC

一个开源的AI增强安全运营中心平台，结合LLM和机器学习实现告警智能分析、事件关联、攻击预测和检测规则自动生成。

Stars: 125 | Forks: 31

# AI增强安全运营中心 (AI-SOC) 一个由AI驱动的SOC平台，利用ML检测网络威胁，利用LLM解释警报，将攻击关联为事件，从分析师反馈中学习，并自动生成检测规则。 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![Docker](https://img.shields.io/badge/docker-required-blue.svg)](https://www.docker.com/) ## 功能介绍安全分析师每天会收到数千条警报。其中大部分是噪音。本系统将这些噪音压缩为可操作的情报： 1. **检测威胁**：使用基于 CICIDS2017 训练的 ML 模型（准确率 99.28%，推理耗时 <5ms） 2. **解释警报**：使用本地 LLM (Ollama) 以通俗易懂的英语解释警报，并映射到 MITRE ATT&CK 3. **关联相关警报**：根据 IP 亲和性、时间窗口和杀伤链 (kill chain) 阶段将相关警报归类为事件 4. **从分析师决策中学习** —— 误报标记和严重级别修正能改进未来的检测 5. **预测下一阶段攻击**：利用杀伤链转移概率进行预测 6. **生成检测规则** —— LLM 为新型攻击模式编写 Sigma 规则，并将其排入队列等待分析师审批 LLM 通过 Ollama 在本地运行。安全数据不会离开网络。 ## 架构 ``` ┌─────────────────────────────────────────┐ │ DETECTION LAYER │ │ Wazuh SIEM | Suricata IDS | Zeek │ └──────────────┬──────────────────────────┘ │ ┌──────────────▼──────────────────────────┐ │ INTEGRATION LAYER │ │ Wazuh Integration (:8002) │ │ Webhook receiver + alert router │ └──────┬───────────────┬──────────────────┘ │ │ ┌────────────▼─────┐ ┌──────▼───────────┐ │ AI ANALYSIS │ │ KNOWLEDGE BASE │ │ Alert Triage │ │ RAG Service │ │ (:8100) │ │ (:8300) │ │ LLM + ML + RAG │◄─│ MITRE ATT&CK │ │ Async workers │ │ CVE database │ │ Context memory │ │ Security runbooks│ └────────┬─────────┘ └──────────────────┘ │ ┌────────▼─────────┐ ┌──────────────────┐ │ ML INFERENCE │ │ CORRELATION │ │ (:8500) │ │ ENGINE (:8600) │ │ RF/XGB/DT │ │ Incident grouping│ │ 77 features │ │ Kill chain track │ │ Hot-reload │ │ Attack prediction│ └──────────────────┘ └──────────────────┘ │ ┌────────▼─────────┐ ┌──────────────────┐ │ FEEDBACK LOOP │ │ RULE GENERATOR │ │ (:8400) │ │ (:8700) │ │ PostgreSQL │ │ LLM-generated │ │ Alert history │ │ Sigma rules │ │ Analyst feedback│ │ Back-testing │ │ Retraining data │ │ Approval queue │ └──────────────────┘ └──────────────────┘ ``` ## 快速开始 ``` git clone https://github.com/zhadyz/AI_SOC.git cd AI_SOC # 选项 1：单命令部署 ./deploy-ai-soc.sh # 选项 2：手动部署 docker compose -f docker-compose/phase1-siem-core.yml up -d # SIEM docker compose -f docker-compose/ai-services.yml up -d # AI services docker compose -f docker-compose/monitoring-stack.yml up -d # Monitoring ``` 首次运行会下载约 8GB 的 Docker 镜像和 LLM 模型。 ## 服务组件 | 服务 | 端口 | 用途 | |---------|------|---------| | Wazuh Dashboard | :443 | SIEM 警报和代理管理 | | Alert Triage | :8100 | 基于异步工作线程池的 LLM 警报分析 | | RAG Service | :8300 | 针对 MITRE ATT&CK、CVE 和运维手册的语义搜索 | | Feedback Service | :8400 | 警报持久化和分析师反馈收集 | | ML Inference | :8500 | 网络入侵检测 (RF/XGB/DT，准确率 99.28%) | | Correlation Engine | :8600 | 警报-事件分组和攻击预测 | | Rule Generator | :8700 | LLM 生成的 Sigma 检测规则 | | Wazuh Integration | :8002 | Webhook 接收器、警报路由、RAG 增强 | | Grafana | :3000 | 4 个监控仪表盘 | | Prometheus | :9090 | 指标收集 (29 条警报规则) | ## 使用示例 ### 分析警报 ``` curl -X POST http://localhost:8100/analyze \ -H "Content-Type: application/json" \ -d '{ "alert_id": "test-001", "rule_description": "SSH brute force attack detected", "rule_level": 10, "source_ip": "203.0.113.42", "dest_ip": "10.0.1.50", "dest_port": 22, "raw_log": "Failed password for root from 203.0.113.42 port 45678 ssh2" }' ``` 返回严重级别、类别、置信度分数、MITRE 技术映射、IOC 和建议措施。 ### 提交分析师反馈 ``` curl -X POST http://localhost:8400/feedback/test-001 \ -H "Content-Type: application/json" \ -d '{ "analyst_id": "analyst1", "is_false_positive": false, "true_label": "ATTACK", "notes": "Confirmed brute force from known malicious range" }' ``` 反馈驱动学习循环 —— 误报标记和严重级别修正能改进未来的 ML 模型。 ### 查看事件 ``` # 列出关联 Incident curl http://localhost:8600/incidents # 预测下一攻击阶段 curl http://localhost:8600/predict/reconnaissance ``` 相关警报会自动分组为事件。预测器返回可能的下一阶段攻击及预防措施建议。 ### 生成检测规则 ``` curl -X POST http://localhost:8700/generate \ -H "Content-Type: application/json" \ -d '{ "alert_id": "novel-001", "alert_description": "Unusual PowerShell encoded command execution", "mitre_techniques": ["T1059.001"], "severity": "high" }' ``` LLM 生成 Sigma 检测规则，针对历史警报对其进行回测，并排入队列等待分析师审批。 ## ML 模型三个基于 CICIDS2017（210万条网络流记录，77个特征）训练的模型： | 模型 | 准确率 | FPR | 推理耗时 | |-------|----------|-----|-----------| | Random Forest | 99.28% | 0.25% | <1ms | | XGBoost | 99.10% | 0.30% | <1ms | | Decision Tree | 98.90% | 0.50% | <0.5ms | 模型通过持续重训练管道根据分析师标记的反馈进行重训练。优胜劣汰 (Champion/challenger) 评估确保只推广改进后的模型。 ## 知识库 RAG 服务提供以下内容的语义搜索： - **MITRE ATT&CK**：835 项技术，包含描述、战术和平台 - **CVE 数据库**：来自 NVD API v2 的严重/高危漏洞 - **安全运维手册**：8 份事件响应剧本 (SSH 暴力破解、恶意软件、钓鱼、勒索软件、权限提升、数据泄露、未授权访问、DDoS) ## 关键设计决策 - **本地 LLM** (Ollama)：安全事件从不离开网络。无云 API 依赖。 - **诚实的 ML 置信度**：当网络流数据不可用时，ML 置信度上限为 50% 并标记为“alert_metadata”来源。 - **优雅降级**：每个服务都能处理上游故障 —— 如果 Ollama 宕机，则返回纯 ML 结果。如果反馈服务宕机，警报仍能正常处理。 - **异步工作线程池**：3 个并发 LLM 工作线程配合优先队列。断路器在事件规模激增期间跳过低严重级别警报的 LLM 处理。 - **反馈飞轮**：分析师决策反馈至模型重训练、误报模式检测和上下文 LLM 记忆中。 ## 项目结构 ``` AI_SOC/ ├── services/ │ ├── alert-triage/ # LLM alert analysis (FastAPI) │ ├── rag-service/ # Knowledge base retrieval (ChromaDB) │ ├── feedback-service/ # Alert persistence + analyst feedback (PostgreSQL) │ ├── correlation-engine/ # Incident grouping + attack prediction │ ├── rule-generator/ # LLM Sigma rule generation │ ├── wazuh-integration/ # Wazuh webhook receiver │ ├── retraining/ # Continuous ML retraining pipeline │ └── common/ # Shared utilities (auth, metrics, security) ├── ml_training/ # ML training pipeline + inference API ├── models/ # Trained model artifacts (.pkl) ├── docker-compose/ # Docker Compose files for all stacks ├── config/ # Prometheus, Grafana, Wazuh, Suricata configs ├── datasets/ # CICIDS2017 dataset ├── tests/ # Unit, integration, E2E, load, security tests ├── docs/ # Documentation site (MkDocs) └── deploy-ai-soc.sh # Single-command deployment script ``` ## 系统要求 - Docker Engine 23+ 和 Docker Compose v2 - 最低 16GB RAM（推荐 32GB） - 20GB 磁盘空间 - Linux 适用于完整技术栈 (Suricata/Zeek 需要 `network_mode: host`) - Windows/macOS 适用于 SIEM + AI 服务（无网络传感器） ## 文档完整文档请访问 [research.onyxlab.ai](https://research.onyxlab.ai) - [安装指南](docs/getting-started/installation.md) - [架构概览](docs/architecture/overview.md) - [API 文档](http://localhost:8100/docs) (Swagger UI, 实时) - [安全指南](docs/security/guide.md) - [部署指南](docs/deployment/guide.md) ## 研究背景本项目是论文 *“AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation”* 的研究实现。 **作者：** Abdul Bari (abdul.bari8019@coyote.csusb.edu) **所属机构：** California State University, San Bernardino **许可证：** Apache 2.0

标签：AI安全运营中心, AI风险缓解, AMSI绕过, Apex, CIDR查询, CISA项目, DLL 劫持, HTTP/HTTPS抓包, LLM评估, Metaprompt, Ollama, Privacy-Preserving AI, RAG技术, Rootkit, Sigma规则, Suricata, TheHive, Wazuh, Zeek, 告警关联, 多智能体编排, 大语言模型, 威胁检测, 安全编排自动化与响应 (SOAR), 态势感知, 插件系统, 攻击预测, 智能SOC, 本地部署, 机器学习, 杀伤链, 测试用例, 现代安全运营, 目标导入, 网络安全, 自定义请求头, 误报过滤, 请求拦截, 逆向工具, 隐私保护