PierreRamez/Automated-Threat-Intelligence-Pipeline

GitHub: PierreRamez/Automated-Threat-Intelligence-Pipeline

一个基于 Python 的轻量级 OT 威胁情报管道，自动采集 NVD CVE 并用 LLM 进行上下文分类与展示。

Stars: 0 | Forks: 0

# 自动化 OT 威胁情报管道 ![OT 安全管道](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/35c7353869134246.svg) 一个轻量级的 Python 代理，定期轮询 NVD，过滤与 OT/ICS 相关的 CVE，在单次通过中请求 Gemini 进行 OT 分类 *和* 运行推理，并通过 Streamlit SOC 风格仪表板展示结果。 **交互式仪表板：** [https://automated-threat-intelligence-agent.streamlit.app/](https://automated-threat-intelligence-agent.streamlit.app/) ## 仓库布局 ``` . ├─ agent.py # continuous scanner + AI classification + reasoning ├─ st_dashboard.py # Streamlit frontend reading output_sample.json ├─ output_sample.json # persisted validated OT alerts (sample) ├─ requirements.txt ├─ .env.example └─ README.md ``` ## 快速启动（本地运行） **要求：** Python 3.11+，pip 1. **克隆仓库** ``` git clone https://github.com/PierreRamez/Automated-Threat-Intelligence-Agent.git cd Automated-Threat-Intelligence-Agent ``` 2. **创建虚拟环境并安装依赖** ``` python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt ``` 3. **配置环境变量** 在根目录创建 `.env` 文件： ``` NVD_API_KEY=your_nvd_api_key_here GEMINI_API_KEY=your_google_ai_key_here ``` 4. **运行后端代理**（保持运行） ``` python agent.py ``` 5. **启动 Streamlit 仪表板**（新终端） ``` streamlit run st_dashboard.py ``` **说明** - `agent.py` 每 10 分钟轮询一次 NVD，并将结果写入 `output_sample.json`。 - `st_dashboard.py` 直接从此文件读取并渲染 UI。 - 轮询间隔和输出路径目前硬编码在脚本中。 ## 系统管道概览 1. **摄取（Ingestion）** — 查询 NVD 获取新发布的 CVE。 2. **去重（Deduplication）** — 使用内存缓存跳过已处理的 CVE。 3. **关键词过滤（Keyword Filtering）** — 快速词法扫描 OT 特定厂商、协议与资产。 4. **AI 分析（AI Analysis）** — Gemini 在单次请求中分类 OT 相关性并解释运行影响。 5. **持久化（Persistence）** — 结构化结果追加写入 JSON 存储。 6. **可视化（Visualization）** — Streamlit 仪表板展示指标与分析视图。 ## 提示设计与决策逻辑该代理通过 **Gemini 2.5 Flash Lite** 使用 **单次提示** 以最大化效率与简洁性。在一次调用中，模型同时完成 CVE 分类（如适用则解释对工业系统的风险）。 ### 组合提示结构 ``` You are an expert OT threat analyst with 160 IQ. Thoroughly analyze the following CVE description. Return ONLY a JSON object with exactly these keys: 1. "ot_related" : boolean (True if OT/ICS/SCADA related, False otherwise). 2. "reason" : string (an expert-level, detailed explanation of why. If "ot_related" is True, explain why this vulnerability is dangerous). --------------------- Description: --------------------- {description} ``` **为何采用此方法？** - **低延迟：** 每个 CVE 仅一次 API 调用，最小化端到端处理时间。 - **成本高效：** 使用轻量级 Flash Lite 模型，无需额外跟进调用。 - **简洁性：** 减少提示编排复杂性与失败模式。 ## 架构示意图 ``` flowchart LR subgraph Ingest NVD_API[NVD API] AgentPoll[agent.py poller] end subgraph Filtering Cache{Seen CVE?} KeywordFilter[Keyword Engine] end subgraph Analysis Gemini[Gemini: Classification + Reasoning] end subgraph Storage OutputJSON[(output_sample.json)] end subgraph UI Streamlit[st_dashboard.py] Analyst[Security Analyst] end NVD_API --> AgentPoll --> Cache Cache -- No --> KeywordFilter Cache -- Yes --> AgentPoll KeywordFilter -- Match --> Gemini KeywordFilter -- No Match --> AgentPoll Gemini --> OutputJSON OutputJSON --> Streamlit --> Analyst ``` ## 技术设计细节 - **关键词引擎：** 精选 OT 词典（约 50+ 项），针对 ICS 厂商与协议优化高召回率。 - **LLM 使用：** 每个候选 CVE 单次确定性请求，严格解析 JSON 响应。 - **速率处理：** API 调用的基础重试逻辑，固定轮询间隔。 - **容错能力：** JSON 读写封装在异常处理中，避免仪表板崩溃。 - **密钥管理：** 所有 API 密钥从环境变量加载。 ## 示例输出结构 ``` { "cve_id": "CVE-2026-0528", "cvss": 6.5, "severity": "MEDIUM", "description": "Improper Validation of Array Index in Metricbeat...", "ai_insight": "This CVE impacts Metricbeat. In an OT context, this could disrupt the collection of operational telemetry from SCADA or industrial monitoring systems, affecting visibility and response." } ``` ## 安全与隐私考量 - 仅向 Gemini API 发送公共 CVE 元数据。 - 不暴露内部资产标识符或运行 telemetry。 - API 密钥不纳入版本控制，应定期轮换。 ## 已知限制与坦诚的权衡 - **单次通过 AI 分析：** 分类与推理耦合；假阳性需人工审核。 - **硬编码配置：** 轮询间隔与输出路径目前不可外部配置。 - **JSON 存储：** 简单便携，但不适用于并发写入或扩展。 ## 设计哲学 - **运行影响优先于 CVSS：** 系统优先关注 *上下文*（OT 相关性），而非原始分数，承认“中危”CVE 仍可能使工厂停摆。 - **有意保持简洁：** 通过避免复杂数据库或微服务，此原型减少故障点并确保评估期间的可靠性。 - **未来路线图：** 架构支持清晰升级路径，包括两阶段提示、外部配置与 SQLite 存储，且无需重写核心逻辑。本项目展示了一个端到端的 OT 聚焦威胁情报原型，范围真实、限制清晰，并为迭代打下坚实基础。

标签：AI分类, API集成, CVE, DLL 劫持, Gemini, GPT, ICS安全, Kubernetes, LLM, NVD, OT安全, Python, SCADA安全, Streamlit, Unmanaged PE, 仪表盘, 关键基础设施, 关键词过滤, 去重, 可观测性, 大语言模型, 威胁情报, 工业控制安全, 开发者工具, 持续监控, 数字签名, 无后门, 漏洞管理, 网络安全, 访问控制, 运营推理, 逆向工具, 隐私保护