gapilongo/SOC

GitHub: gapilongo/SOC

基于 LangGraph 构建的智能安全运营中心多智能体框架,通过 AI 驱动的工作流实现告警自动分诊、威胁关联分析和事件响应自动化。

Stars: 11 | Forks: 4

# 基于 LangGraph 的 SOC 概念验证 ## 🔒 智能安全运营中心 - 多智能体工作流系统 一个基于 LangGraph 构建的高级安全运营中心 (SOC) 概念验证系统,具有用于告警处理、威胁分析和事件响应的自主 AI 智能体。该系统展示了现代 AI 如何通过智能自动化、人机协作和持续学习来增强安全运营。 ## 🏗️ 架构概述 SOC 概念验证实现了一个专为可扩展性、可靠性和智能决策设计的复杂多层架构: ### 核心框架 - **状态管理**:具有版本控制和审计跟踪的集中式状态处理 - **工作流引擎**:由 LangGraph 驱动的复杂多智能体工作流编排 - **配置**:动态策略驱动的配置管理 ### 智能体生态 六个协同工作的专业化 AI 智能体: 1. **摄取智能体** - 具有去重功能的多源告警收集 2. **分诊智能体** - 智能告警优先级排序和初步分类 3. **分析智能体** - 使用 ReAct 推理循环进行深度威胁调查 4. **人机协同 (Human-in-the-Loop)** - 结构化的分析师协作与上报 5. **响应智能体** - 自动化的遏制和修复操作 6. **学习智能体** - 持续的模型改进和知识提取 ### 工具编排层 安全工具的统一接口: - **SIEM 集成** - Splunk、QRadar、Sentinel 连接 - **威胁情报** - IOC 富化和信誉服务 - **沙箱分析** - 自动化恶意软件沙箱执行 - **EDR/XDR 工具** - 端点调查与响应 ## 🚀 核心特性 ### 智能决策 - **基于置信度的路由** - 告警根据置信度阈值在智能体之间流转 - **策略驱动的工作流** - 具有业务逻辑的可配置决策点 - **自适应阈值** - 基于历史性能的动态调整 ### 高级状态管理 ``` Enhanced State Object: ├── alert_id & raw_alert ├── enriched_data & triage_status ├── confidence_score & FP/TP_indicators ├── workflow_history & agent_executions ├── human_feedback & response_actions └── metadata & audit_trail ``` ### 异步处理 - **并发工具执行** - 并行的安全工具查询 - **重试机制** - 健壮的错误处理和回退策略 - **缓存层** - 基于 Redis 的性能优化 - **速率限制** - API 保护和资源管理 ### 人机协作 - **结构化反馈接口** - 明确的上报和审查流程 - **基于角色的访问控制** - 分析师、高级分析师和管理员工作流 - **SLA 跟踪** - 响应时间监控和告警 - **知识转移** - 将人类洞察反馈给学习系统 ## 🔄 工作流过程 ``` graph TD %% Enhanced State Definition State[Enhanced State Object
• alert_id
• raw_alert
• enriched_data
• triage_status
• confidence_score
• FP/TP_indicators
• workflow_history
• agent_executions
• human_feedback
• response_actions
• metadata] %% Enhanced Nodes with Async Support Ingestion[Ingestion Agent
• Async Sources
• Batching
• Deduplication
• Rate Limiting] Triage[Triage Agent
• Rule Engine
• ML Scoring
• Thresholds
• Fallback] Correlation[Correlation Agent
• Async Queries
• Caching
• Timeouts
• Retry Logic] Analysis[Analysis Agent
• ReAct Loop
• Tool Orchestration
• Fallback to Human
• Reasoning Logs] HumanLoop[Human-in-the-Loop
• Structured Feedback
• Role-Based Access
• Escalation Levels
• SLA Tracking] Response[Response Agent
• Playbook Engine
• Approval Workflow
• Rollback Support
• Action Audit] Learning[Learning Agent
• Model Versioning
• Training Pipeline
• Performance Metrics
• A/B Testing] Close[Close Alert
• State Validation
• Audit Trail
• Archive Process
• Metrics Collection] %% Enhanced Decision Points with Thresholds IsFP{Confidence > 80%
AND FP Indicators?
Policy: FP_THRESHOLD} NeedsCorrelation{Confidence 40-70%
OR Needs Context?
Policy: CORRELATION_POLICY} NeedsAnalysis{Confidence < 60%
OR Complex Alert?
Policy: ANALYSIS_POLICY} NeedsHuman{Confidence Grey Zone
OR High Risk?
Policy: HUMAN_REVIEW_POLICY} NeedsResponse{Confirmed Threat
AND Auto-Response
Enabled?
Policy: RESPONSE_POLICY} %% Enhanced Tool Orchestration Tools[Tool Orchestration Layer
• Async Execution
• Retry Strategies
• Caching Layer
• Metrics Collection
• Timeout Handling
• Fallback Logic] %% Storage Layer Storage[Storage Layer
• PostgreSQL
• Redis Cache
• Vector DB
• State History
• Audit Logs] %% Monitoring & Observability Monitoring[Monitoring & Observability
• Metrics
• Tracing
• Logging
• Alerts
• Dashboards] %% Enhanced Flow with Async and Feedback Loops Ingestion -->|Initialize State| State State -->|New Alert| Triage Triage -->|Update State| State State -->|Triage Complete| IsFP IsFP -->|Yes| Close IsFP -->|No| NeedsCorrelation NeedsCorrelation -->|Yes| Correlation NeedsCorrelation -->|No| NeedsAnalysis Correlation -->|Update State| State State -->|Correlation Complete| NeedsAnalysis NeedsAnalysis -->|Yes| Analysis NeedsAnalysis -->|No| NeedsHuman Analysis -->|ReAct Loop| State State -->|Analysis Complete| NeedsHuman NeedsHuman -->|Yes| HumanLoop NeedsHuman -->|No| NeedsResponse HumanLoop -->|Update State| State State -->|Human Feedback| NeedsResponse NeedsResponse -->|Yes| Response NeedsResponse -->|No| Learning Response -->|Update State| State State -->|Response Complete| Learning Learning -->|Update State| State State -->|Learning Complete| Close %% Enhanced Tool Connections with Orchestration Triage -.->|Call via Orchestrator| Tools Correlation -.->|Async Call via Orchestrator| Tools Analysis -.->|ReAct via Orchestrator| Tools HumanLoop -.->|Call via Orchestrator| Tools Response -.->|Call via Orchestrator| Tools Learning -.->|Call via Orchestrator| Tools %% Storage Connections State <-->|Persist/Load| Storage Tools <-->|Cache/Store| Storage %% Monitoring Connections State -.->|State Changes| Monitoring Tools -.->|Tool Metrics| Monitoring Triage -.->|Agent Metrics| Monitoring Correlation -.->|Agent Metrics| Monitoring Analysis -.->|Agent Metrics| Monitoring HumanLoop -.->|Agent Metrics| Monitoring Response -.->|Agent Metrics| Monitoring Learning -.->|Agent Metrics| Monitoring %% Feedback Loops HumanLoop -.->|Feedback| Learning Learning -.->|Improved Models| Analysis Learning -.->|Improved Models| Triage Learning -.->|Improved Models| Correlation %% Styling classDef agent fill:#e1f5fe,stroke:#01579b,stroke-width:2px classDef decision fill:#fff3e0,stroke:#e65100,stroke-width:2px classDef state fill:#f3e5f5,stroke:#4a148c,stroke-width:2px classDef tools fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px classDef storage fill:#fff8e1,stroke:#ff8f00,stroke-width:2px classDef monitoring fill:#fce4ec,stroke:#880e4f,stroke-width:2px classDef terminal fill:#ffebee,stroke:#b71c1c,stroke-width:2px class Ingestion,Triage,Correlation,Analysis,HumanLoop,Response,Learning agent class IsFP,NeedsCorrelation,NeedsAnalysis,NeedsHuman,NeedsResponse decision class State state class Tools tools class Storage storage class Monitoring monitoring class Close terminal ``` ### 架构层 ``` graph TB subgraph "Core Framework" Core[Core Framework] State[State Management] Workflow[Workflow Engine] Config[Configuration] end subgraph "Agent Layer" IngestionMod[Ingestion Module] TriageMod[Triage Module] AnalysisMod[Analysis Module] HumanMod[Human Loop Module] ResponseMod[Response Module] LearningMod[Learning Module] end subgraph "Tool Layer" ToolOrchestrator[Tool Orchestrator] SIEMTools[SIEM Tools] IntelTools[Intel Tools] SandboxTools[Sandbox Tools] EDRTools[EDR Tools] end subgraph "Storage Layer" StateStorage[State Storage] CacheLayer[Cache Layer] VectorDB[Vector DB] AuditLogs[Audit Logs] end subgraph "Monitoring Layer" Metrics[Metrics Collection] Tracing[Distributed Tracing] Logging[Structured Logging] Alerting[Alerting System] end Core --> State Core --> Workflow Core --> Config Workflow --> IngestionMod Workflow --> TriageMod Workflow --> AnalysisMod Workflow --> HumanMod Workflow --> ResponseMod Workflow --> LearningMod IngestionMod --> ToolOrchestrator TriageMod --> ToolOrchestrator AnalysisMod --> ToolOrchestrator HumanMod --> ToolOrchestrator ResponseMod --> ToolOrchestrator LearningMod --> ToolOrchestrator ToolOrchestrator --> SIEMTools ToolOrchestrator --> IntelTools ToolOrchestrator --> SandboxTools ToolOrchestrator --> EDRTools State --> StateStorage ToolOrchestrator --> CacheLayer LearningMod --> VectorDB State --> AuditLogs IngestionMod --> Metrics TriageMod --> Metrics AnalysisMod --> Metrics HumanMod --> Metrics ResponseMod --> Metrics LearningMod --> Metrics Metrics --> Tracing Metrics --> Logging Metrics --> Alerting ``` ### 1. 告警摄取 ``` graph LR A[Alert Sources] --> B[Ingestion Agent] B --> C[State Initialization] C --> D[Deduplication] D --> E[Rate Limiting] E --> F[Triage Queue] ``` ### 2. 智能分诊 ``` graph TD A[Alert Input] --> B{Rule Engine} B --> C[ML Scoring] C --> D{Confidence Score} D -->|>80%| E[Auto-Close FP] D -->|40-80%| F[Correlation Queue] D -->|<40%| G[Analysis Queue] E --> H[Learning Feedback] F --> I[Correlation Agent] G --> J[Analysis Agent] ``` ### 3. 上下文关联 ``` graph LR A[Alert] --> B[Correlation Agent] B --> C[Async Queries] C --> D[Threat Intel APIs] C --> E[SIEM Historical Data] C --> F[Asset Information] D --> G[Enrichment Results] E --> G F --> G G --> H[Temporal Analysis] H --> I[Entity Resolution] I --> J[Updated State] ``` ### 4. 深度分析 (ReAct 循环) ``` graph TD A[Analysis Agent] --> B[Reasoning] B --> C{Need More Data?} C -->|Yes| D[Action: Query Tools] D --> E[Tool Execution] E --> F[Observation] F --> B C -->|No| G[Conclusion] G --> H{Confidence Level} H -->|High| I[Auto Response] H -->|Medium| J[Human Review] H -->|Low| K[Escalate to Senior] I --> L[Response Agent] J --> M[Human-in-Loop] K --> M ``` ### 5. 人工上报 ``` graph TD A[Human Review Required] --> B{Risk Level} B -->|Critical| C[Immediate Escalation] B -->|High| D[Senior Analyst Queue] B -->|Medium| E[Standard Review Queue] C --> F[Manager/CISO Alert] D --> G[Senior Analyst] E --> H[Analyst] F --> I[Structured Feedback] G --> I H --> I I --> J{Decision} J -->|Approve| K[Response Agent] J -->|Deny| L[Close Alert] J -->|Need More Info| M[Back to Analysis] ``` ### 6. 自动响应 ``` graph TD A[Response Triggered] --> B[Playbook Selection] B --> C{Approval Required?} C -->|Yes| D[Approval Workflow] C -->|No| E[Execute Actions] D --> F{Approved?} F -->|Yes| E F -->|No| G[Log Decision & Close] E --> H[Block IP/Domain] E --> I[Quarantine File] E --> J[Disable Account] E --> K[Network Isolation] H --> L[Action Audit] I --> L J --> L K --> L L --> M{Success?} M -->|Yes| N[Update State] M -->|No| O[Rollback & Alert] N --> P[Learning Agent] O --> Q[Human Intervention] ``` ### 7. 持续学习 ``` graph TD A[Learning Agent] --> B[Collect Feedback] B --> C[Human Feedback] B --> D[Agent Performance] B --> E[Response Outcomes] C --> F[Model Training Data] D --> F E --> F F --> G[Model Versioning] G --> H{A/B Testing} H -->|Champion| I[Deploy New Model] H -->|Challenger| J[Performance Analysis] J --> K{Better Performance?} K -->|Yes| I K -->|No| L[Keep Current Model] I --> M[Update Agent Configs] L --> N[Log Results] M --> O[Performance Monitoring] N --> O O --> P[Metrics Dashboard] ``` ## 📊 监控与可观测性 ### 全面遥测 - **指标收集**:智能体性能、工具延迟、准确率 - **分布式追踪**:端到端的工作流可见性 - **结构化日志**:可搜索的审计轨迹和调试 - **实时仪表盘**:运营中心可视化 ### 关键绩效指标 - **平均检测时间 (MTTD)** - **平均响应时间 (MTTR)** - **误报率** - **智能体准确度评分** - **人工上报率** - **工具利用率指标** ## 🛠️ 技术栈 ### 核心技术 - **LangGraph**:多智能体工作流编排 - **LangChain**:LLM 集成与工具连接 - **PostgreSQL**:主要状态和审计存储 - **Redis**:缓存和会话管理 - **向量数据库**:相似性搜索和 embeddings ### AI/ML 组件 - **大型语言模型**:GPT-4, Claude 用于推理 - **定制 ML 模型**:专门的威胁检测 - **Embedding 模型**:语义相似度分析 - **分类模型**:告警分类 ### 安全工具集成 - **SIEM 平台**:Splunk, IBM QRadar, Microsoft Sentinel - **威胁情报**:VirusTotal, MISP, 商业情报源 - **沙箱解决方案**:Cuckoo, Joe Sandbox, Falcon Sandbox - **EDR/XDR**:CrowdStrike, SentinelOne, Microsoft Defender ## 🚦 快速开始 ### 前置条件 ``` Python 3.11+ PostgreSQL 14+ Redis 7+ Docker & Docker Compose Security tool API credentials ``` ### 快速设置 ``` # Clone repository git clone https://github.com/your-org/soc-langgraph-poc cd soc-langgraph-poc # Install dependencies pip install -r requirements.txt # Configure environment cp .env.example .env # 使用你的 API keys 和 database URLs 编辑 .env # Initialize database python scripts/init_db.py # Start services docker-compose up -d # Launch SOC workflow python main.py ``` ### 配置 主要配置领域: - **智能体策略**:置信度阈值和路由逻辑 - **工具凭证**:API 密钥和连接字符串 - **工作流规则**:业务逻辑和上报程序 - **学习设置**:模型更新频率和训练数据 ## 📈 用例与优势 ### 自动化威胁检测 - **7x24 小时运营**:持续告警处理,无人工疲劳 - **一致性分析**:标准化的调查程序 - **快速响应**:从检测到遏制的亚分钟级周期 ### 分析师能力增强 - **决策支持**:带有解释的 AI 驱动建议 - **工作负载优化**:专注于高价值的调查工作 - **知识规模化**:初级分析师也能拥有高级分析师级别的洞察力 ### 卓越运营 - **减少误报**:智能过滤和关联 - **改善 MTTR**:通过自动化实现更快的的事件响应 - **审计合规**:完整的工作流文档和可追溯性 ## 🔮 路线图与未来增强 ### 计划功能 - **多租户架构**:支持多个客户环境 - **高级 ML 模型**:自定义威胁检测模型训练 - **集成市场**:即插即用的安全工具连接器 - **移动端界面**:供分析师随时响应的移动应用程序 ### 研究领域 - **联邦学习**:跨组织的威胁情报共享 - **可解释 AI**:增强 AI 决策的透明度 - **攻击图分析**:多阶段攻击检测和可视化 ## 📄 许可证 本项目基于 MIT 许可证授权 - 详见 [LICENSE](LICENSE) 文件。 ## 🙏 致谢 - **LangGraph 团队**:提供了优秀的多智能体框架 - **安全社区**:提供了威胁情报和最佳实践 - **开源贡献者**:提供了基础工具和库 **用 ❤️ 为网络安全社区构建** *通过智能自动化赋能安全团队,同时让人类保持对关键决策的控制。*
标签:AI代理, AI安全, Chat Copilot, DAST, DLL 劫持, IP 地址批量处理, LangGraph, PyRIT, ReAct推理, SIEM集成, SOC自动化, 人机协同, 多智能体系统, 大语言模型, 威胁分析, 威胁情报, 安全告警分诊, 安全编排自动化与响应(SOAR), 开发者工具, 恶意软件分析, 扩展检测与响应(XDR), 搜索引擎查询, 智能安全运营中心, 沙箱分析, 测试用例, 深度安全调查, 端点检测与响应(EDR), 网络信息收集, 网络安全, 自动化侦查工具, 自动化响应, 请求拦截, 逆向工具, 隐私保护