Microck/delvn

GitHub: Microck/delvn

一款多代理 CTI 流水线工具，将 CVE 订阅源、威胁情报脉冲和安全公告 RSS 聚合并转化为针对您技术栈定制的优先级高管简报。

Stars: 1 | Forks: 0

[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE) **一个多代理 CTI 流水线，将原始 CVE 订阅源和威胁情报转化为针对您技术栈定制的优先级高管简报。** 该项目取自动词 **"Delve"**（深入探究），旨在从海量威胁情报中深度搜索，挖掘出对企业至关重要的特定风险。 Delvn 从三个来源系列摄入信号：NVD CVE、AlienVault OTX 情报脉冲以及安全 RSS 公告 —— 利用向量相似性关联相关活动，根据您声明的技术栈对每个发现进行排名，并生成一份可供人工执行的简明 markdown 简报。专为 **Microsoft AI Dev Days Hackathon 2026** 构建。 ## 工作原理 ``` CVE feed (NVD) ──────┐ OTX intel pulses ─────┤──► collector agents ──► normalize ──► Cosmos DB (threats) RSS advisories ───────┘ │ │ correlator agent (Azure AI Search vectors) │ CorrelationLink docs │ prioritizer agent (stack profile: products / platforms / keywords / exclude) │ HIGH / MEDIUM / LOW / NONE │ reporter agent (Azure OpenAI / hash fallback) │ executive markdown brief ``` **关键属性：** - **三条摄入路径**：NVD CVE（分页）、OTX 脉冲（订阅 + 公共回退）和可配置的 RSS 订阅源作为独立的收集器代理运行。 - **向量关联**：关联器使用 Azure OpenAI 嵌入威胁文本（在凭据缺失时回退到确定性哈希向量），索引到 Azure AI Search，并链接高于置信阈值的最近邻对。 - **技术栈感知优先级排序**：根据与您的 `products`（产品）、`platforms`（平台）和 `keywords`（关键词）列表的关键词重叠情况，每个威胁的评分等级为 `HIGH / MEDIUM / LOW / NONE` —— 其中 `HIGH` 需要同时满足产品匹配且严重性 ≥ 7.0 或具有 exploited（已被利用）标签。 - **高管简报渲染**：报告器将排名后的威胁转换为结构化的 `BriefEntry` 对象，包含标题、证据和建议措施，然后写入一份可供审查的 markdown 文件。 - **默认安全**：`--dry-run` 验证配置并打印计划中的流水线，不进行任何外部 API 调用或 Azure 写入操作。 ## 功能特性 - 通过 NVD 2.0 API 收集 CVE，可选 API key 以获取更高配额 - 从 AlienVault OTX 收集情报（从订阅回退到公共脉冲） - 通过 `feedparser` 从可配置的 RSS 订阅源收集安全公告 - 将所有信号标准化为规范的 `UnifiedThreat` 模型 (Pydantic v2) - 将威胁文本嵌入并索引到 Azure AI Search（HNSW 向量配置） - 使用可解释的置信度评分 `CorrelationLink` 记录关联相关威胁 - 评分相关性：`HIGH`（产品 + 严重性≥7 或 exploited）→ `MEDIUM`（产品匹配）→ `LOW`（平台/关键词）→ `NONE` - 生成包含顶级风险、重要提示、证据和建议措施的 `ExecutiveBrief` - 渲染清晰的高管 markdown 简报供领导层审查 - Dry-run 模式（适用于评委、CI 和本地演练；无需凭据） - 在所有集成中共享的指数退避 HTTP 客户端（最多 5 次重试） - 当 Azure OpenAI 凭据缺失时，使用基于哈希的确定性嵌入回退 ## 快速开始 **前置条件：** | Tool | Version | |------|---------| | Python | 3.11+ | | uv | any recent | ``` git clone https://github.com/Microck/delvn.git cd delvn uv sync --frozen uv run python demo/run_demo.py --dry-run ``` 预期输出： ``` DRY RUN: no external APIs will be called and no Azure writes will occur. Config: /path/to/demo/config.yaml Stack: Apache HTTP Server, PostgreSQL, React Planned flow: 1) collect -> CVE(results_per_page=25), Intel(limit=25), News(default feeds) 2) correlate -> top_k=8, limit=120, min_confidence=0.35 3) prioritize -> limit=120 for stack profile from config 4) report -> top_risks=5, notable_mentions=3, output=docs/demo_brief.md ``` ## 安装 ``` uv sync --frozen ``` 所有运行时和开发依赖都锁定在 `uv.lock` 中。 ## 使用说明 ### Dry-run 模式（无需凭据）验证配置，打印计划中的流水线，并正常退出。适用于评委、CI 和本地演练。 ``` uv run python demo/run_demo.py --dry-run ``` 传入自定义配置： ``` uv run python demo/run_demo.py --dry-run --config path/to/my-config.yaml ``` ### 实时模式（需要 Azure 凭据）运行完整的端到端流水线：收集 → 关联 → 优先级排序 → 报告。 ``` uv run python demo/run_demo.py --live ``` 或指定配置文件： ``` uv run python demo/run_demo.py --live --config demo/config.yaml ``` 实时运行需要设置 `COSMOS_ENDPOINT`、`COSMOS_KEY`、`SEARCH_ENDPOINT` 和 `SEARCH_KEY`。运行器会在开始时检查这些变量，并在写入任何数据之前列出所有缺失的变量并报错退出。 ## 配置 ### 技术栈配置编辑 `demo/config.yaml` 以描述您的技术环境。优先级排序器根据此配置对每个威胁进行评分。 ``` stack: products: - Apache HTTP Server - PostgreSQL - React platforms: - Linux keywords: - httpd - postgres - frontend exclude: - Windows run: collection_window_days: 7 # How far back to pull CVEs max_items_per_source: 25 # Default cap per source cve_results_per_page: 25 # NVD results_per_page (max 2000) intel_limit: 25 # OTX pulse limit correlation_limit: 120 # Max threats fed into correlator correlation_top_k: 8 # Nearest neighbors per query correlation_min_confidence: 0.35 # Minimum confidence to store a link prioritization_limit: 120 # Max threats scored by prioritizer report_top_risks: 5 # Max top-risk entries in brief (capped 3–5) report_notable_mentions: 3 # Additional notable entries report_output_path: docs/demo_brief.md ``` **评分规则：** - `HIGH` - 产品匹配 + (严重性 ≥ 7.0 或 `exploited` 标签) - `MEDIUM` - 仅产品匹配 - `LOW` - 平台或关键词匹配（无产品匹配） - `NONE` - 威胁内容中未发现技术栈关键词 ### 环境变量在项目根目录创建 `.env` 文件（由 `pydantic-settings` 自动加载）： ``` cp .env.example .env # or create manually ``` | Variable | Required for | Description | |---|---|---| | `COSMOS_ENDPOINT` | `--live` | Azure Cosmos DB 帐户端点 URL | | `COSMOS_KEY` | `--live` | Azure Cosmos DB 主密钥 | | `SEARCH_ENDPOINT` | `--live` | Azure AI Search 服务端点 | | `SEARCH_KEY` | `--live` | Azure AI Search 管理密钥 | | `AZURE_OPENAI_ENDPOINT` | `--live` (embeddings) | Azure OpenAI 服务端点 | | `AZURE_OPENAI_API_KEY` | `--live` (embeddings) | Azure OpenAI API 密钥 | | `AZURE_OPENAI_EMBEDDING_DEPLOYMENT` | `--live` (embeddings) | 嵌入模型的部署名称 | | `AZURE_OPENAI_API_VERSION` | `--live` (embeddings) | API 版本 (default: `2024-02-01`) | | `FOUNDRY_ENDPOINT` | optional | Azure AI Foundry 项目端点 | | `FOUNDRY_API_KEY` | optional | Azure AI Foundry API 密钥 | | `NVD_API_KEY` | optional | 用于更高匿名配额的 NVD API 密钥 | | `OTX_API_KEY` | optional | AlienVault OTX API 密钥（回退到公共脉冲） | 当 `AZURE_OPENAI_*` 变量缺失时，关联器将使用基于哈希的确定性嵌入客户端，以便流水线在没有任何 Azure 凭据的情况下仍能完全运行。 ## 输出流水线将高管 markdown 简报写入由 `run.report_output_path` 设置的路径（默认：`docs/demo_brief.md`）。 **简报结构：** ``` # 高管威胁简报 _Generated: _ ## 执行摘要 ## 主要风险 ### 1. <标题> - **Relevance:** HIGH | MEDIUM - **Why it matters:** - **Evidence:** - **Recommended actions:** - - - ## 值得关注的事项 - - - ## 建议后续步骤 1. 2. ... ``` **示例输出片段 (`docs/example_brief.md`)：** ``` ## 主要风险 ### 1. 在主动利用流量中观察到 Apache HTTP Server 路径遍历链 - **Relevance:** HIGH - **Why it matters:** Directly matches internet-facing Apache services in scope and has active public proof-of-concept exploitation. - **Evidence:** - NVD feed flagged high severity web server vulnerability - OTX indicators include exploit host and callback domains - Multiple matching references across CVE + intel + reporting feeds - **Recommended actions:** - Patch Apache instances to the latest fixed release within 24 hours - Add temporary WAF rule for suspicious traversal payloads - Hunt logs for matching indicators in the last 7 days ``` 完整示例请参阅 `docs/example_brief.md`。 ## 架构 ``` flowchart TD subgraph Ingestion A1[cve_agent
NVD 2.0 API] A2[intel_agent
AlienVault OTX] A3[news_agent
Security RSS] end subgraph Storage C1[(Cosmos DB
threats container)] C2[(Azure AI Search
threats index)] end subgraph Correlation B1[correlator_agent
Embed + HNSW query] B2[(Cosmos DB
correlations container)] end subgraph Analysis D1[prioritizer_agent
Stack profile scoring] end subgraph Output E1[reporter_agent
Brief assembly] E2[render_brief_md
Markdown renderer] E3[docs/demo_brief.md] end A1 -->|UnifiedThreat| C1 A2 -->|UnifiedThreat| C1 A3 -->|UnifiedThreat| C1 C1 -->|list_recent_threats| B1 B1 -->|index_threats| C2 C2 -->|vector_query top_k| B1 B1 -->|CorrelationLink| B2 C1 -->|ranked list| D1 B2 -->|correlation_count| D1 D1 -->|HIGH/MEDIUM/LOW/NONE| E1 E1 --> E2 E2 --> E3 ``` 完整组件表和故障模式参考：[`docs/architecture.md`](docs/architecture.md) ### 数据模型 | Model | Purpose | |---|---| | `UnifiedThreat` | 跨源规范威胁对象 (id, source, type, title, summary, severity, indicators, tags, references) | | `CorrelationLink` | 两个威胁之间的关系 (source_id → target_id, confidence, reasons) | | `ExecutiveBrief` | 结构化简报负载 (stack_summary, top_risks, notable_mentions, generated_at) | | `BriefEntry` | 单个风险条目 (headline, relevance, why_it_matters, evidence, recommended_actions) | ### 存储 | Resource | Container / Index | Partition key | |---|---|---| | Cosmos DB `delvn` | `threats` | `/source` | | Cosmos DB `delvn` | `correlations` | `/id` | | Azure AI Search | `threats` (HNSW `tf-vector-profile`) | `contentVector` (dim 1536) | ## 仓库布局 ``` delvn/ ├── demo/ │ ├── run_demo.py # Orchestration entrypoint (--dry-run / --live) │ └── config.yaml # Stack profile + run parameters ├── docs/ │ ├── architecture.md # Component table, data models, failure modes │ ├── example_brief.md # Example pipeline output │ └── brand/ # Logo assets ├── src/ │ ├── agents/ # cve_agent, intel_agent, news_agent, correlator_agent, │ │ # prioritizer_agent, reporter_agent │ ├── common/ # Shared HTTP client (retry + backoff) │ ├── config/ # Settings (pydantic-settings), user stack model │ ├── correlation/ # Correlation candidate matching and scoring │ ├── embeddings/ # AzureOpenAI + deterministic hash fallback clients │ ├── integrations/ # NVD, OTX, RSS source adapters │ ├── models/ # UnifiedThreat, CorrelationLink, ExecutiveBrief, BriefEntry │ ├── normalization/ # Source-specific normalizers → UnifiedThreat │ ├── prioritization/ # Relevance scoring against stack profile │ ├── reporting/ # Markdown renderer │ └── storage/ # Cosmos DB + Azure AI Search clients ├── tests/ # pytest test suite (14 test modules) ├── infra/ # Infrastructure configs ├── pyproject.toml └── uv.lock ``` ## 开发 ``` # 安装所有依赖项（包括 dev group） uv sync --frozen # Lint uv run ruff check src tests # 格式检查 uv run ruff format --check src tests # 类型检查 uv run mypy src # 运行测试 uv run pytest ``` ## 测试测试套件涵盖标准化、优先级评分、简报渲染、嵌入客户端、关联评分和存储客户端冒烟测试。 ``` uv run pytest ``` 运行详细输出： ``` uv run pytest -v ``` 所有测试都是自包含的，无需外部凭据即可运行。存储客户端冒烟测试执行初始化路径，无需实时的 Azure 端点。 ## 贡献欢迎提交 Issue 和 Pull Request。在进行重大更改之前，请先开 Issue 讨论。保持更改聚焦：每个 Pull Request 只解决一个逻辑关注点。 ## 安全 `--live` 模式使用真实的 Azure 凭据并写入实时的 Cosmos DB 和 Azure AI Search 资源。对于评估、CI 和不适合写入操作的演示，请优先使用 `--dry-run`。切勿提交包含真实凭据的 `.env` 文件。`.gitignore` 默认排除 `.env`。 API 密钥（`NVD_API_KEY`、`OTX_API_KEY`、`COSMOS_KEY`、`SEARCH_KEY`、`AZURE_OPENAI_API_KEY`）应作为机密存储在您的 CI 环境或机密管理器中，而不是 `demo/config.yaml` 或源代码中。 ## 许可证 Apache-2.0。详见 [`LICENSE`](LICENSE)。 ## 起源在 **Microsoft AI Dev Days Hackathon 2026** 构建，是一个原型多代理 CTI 流水线，在安全分类工作流中演示了 Azure AI Foundry、Azure OpenAI、Azure Cosmos DB 和 Azure AI Search。

标签：Azure AI, Azure AI Search, Azure Cosmos DB, GPT, NVD, PyRIT, Python, RSS订阅, 企业安全, 优先级排序, 向量相似度, 多智能体系统, 威胁情报, 安全简报, 实时处理, 密码管理, 开发者工具, 开源安全工具, 微软AI, 无后门, 智能安全分析, 漏洞管理, 网络安全, 网络资产管理, 自动化报告, 逆向工具, 逆向工程平台, 配置审计, 隐私保护, 黑客马拉松