likhitham78/ThreatIQ-AI-Cyber-Threat-Intelligence

GitHub: likhitham78/ThreatIQ-AI-Cyber-Threat-Intelligence

一个基于 RAG 和多 Agent 架构的 AI 网络威胁情报分析平台，能够自动解析安全报告并生成包含威胁分类、CVE 和 IOC 的综合评估。

Stars: 0 | Forks: 0

# 🛡️ ThreatIQ：AI 驱动的网络威胁情报助手 **ThreatIQ** 是一个生产级、模型无关的网络威胁情报分析平台。它利用检索增强生成 (RAG) 和协作式多 Agent 架构来解析、索引和分析网络安全威胁报告（包括 `.txt` 和 `.pdf`）。通过拆分报告、将其索引到 ChromaDB 中，并协调多个专业化安全 Agent，ThreatIQ 能够生成高保真、全面的安全评估，涵盖威胁分类、漏洞分析 (CVE)、破坏指标 (IOC) 以及可操作的执行摘要。 ## 🏗️ 架构与数据流 ``` ┌───────────────────────────────┐ │ User Interface (Streamlit) │ └───────────────┬───────────────┘ │ Upload .txt/.pdf│ Ask Question / Run Analysis ▼ ┌───────────────────────────────┐ │ Text & PDF Extraction │ └───────────────┬───────────────┘ │ Extract │ & Chunk ▼ ┌───────────────────────────────┐ │ SentenceTransformer Embedder │ └───────────────┬───────────────┘ │ Generate│ Embeddings ▼ ┌───────────────────────────────┐ │ ChromaDB Vector Store │ └───────────────┬───────────────┘ │ Retrieve Context │ (RAG Query) ▼ ┌───────────────────────────────┐ │ Multi-Agent Orchestrator │ └───────────────┬───────────────┘ │ ┌─────────────────────┼─────────────────────┐ ▼ ▼ ▼ [Threat Agent] [CVE Agent] [IOC Agent] Analyzes type, Extracts CVEs, Identifies IPs, severity & findings severity & desc domains & hashes │ │ │ └─────────────────────┼─────────────────────┘ ▼ [Report Agent] Aggregates findings into a unified security assessment │ ▼ [Final Security Report] ``` ### 核心组件： 1. **动态 RAG 摄取**：从 PDF（使用 `pypdf`）或 TXT 中提取文本，使用具有 200 字符重叠的 1000 字符递归文本分割器拆分内容以保留安全上下文，并将向量本地存储在持久化的 ChromaDB 数据库中。 2. **模型无关核心**：集中式文本生成层 (`llm_client.py`) 将 Agent 代码与特定的 API 客户端解耦。通过环境变量可在 Google Gemini 和本地 Ollama/Llama 实例之间无缝切换。 3. **专业化多 Agent 工作流**：在检索到的上下文中运行按顺序执行的安全专家 Agent（威胁、CVE 和 IOC 分析师），并将其综合分析结果输入到报告 Agent 中，以生成高保真的报告汇编。 ## 📁 项目结构 ``` ThreatIQ/ │ ├── streamlit_app.py # Streamlit UI & main entrypoint ├── requirements.txt # Python package dependencies ├── .env # Environment variables config (local only) │ ├── src/ # Source codebase │ ├── agents/ # Multi-agent modules │ │ ├── threat_agent.py # Classifies threat profiles and severities │ │ ├── cve_agent.py # Extracts CVEs and matches vulnerabilities │ │ ├── ioc_agent.py # Parses IPs, domains, hashes, and URLs │ │ └── report_agent.py # Aggregates findings into executive summary │ │ │ ├── rag/ # RAG components │ │ ├── embeddings.py # Caches & manages SentenceTransformer models │ │ ├── vector_store.py # Handles chunking, database clients, and indexing │ │ ├── retriever.py # Queries ChromaDB using embedding vectors │ │ └── multi_agent_rag.py# Coordinates retrieve-and-run agent execution │ │ │ ├── utils/ # Central utility functions │ │ └── llm_client.py # Model-agnostic LLM interface (Gemini/Ollama) │ │ │ └── generate_reports.py # Script to generate the evaluation dataset │ ├── uploads/ # Input data folder (contains 20 pre-generated reports) ├── chroma_db/ # Local persistent database directory └── demo_archive/ # Archived learning & prototype scripts ``` ## 🚀 本地设置与安装 1. **克隆仓库**： git clone cd ThreatIQ 2. **设置虚拟环境**： python -m venv venv # 在 Windows 上： venv\Scripts\activate # 在 macOS/Linux 上： source venv/bin/activate 3. **安装依赖项**： pip install -r requirements.txt 4. **配置环境变量**：在根目录下创建 `.env` 文件： # LLM Provider（选项：'gemini' 或 'ollama'） LLM_PROVIDER=gemini # Gemini API Credentials（如果使用 gemini provider 则为必填） GEMINI_API_KEY=your_actual_gemini_api_key_here # Ollama Configurations（如果使用 ollama provider 则为必填） OLLAMA_URL=http://localhost:11434/api/generate OLLAMA_MODEL=llama3 5. **生成威胁报告数据集**： ThreatIQ 附带了一个数据集生成器脚本，可输出 15 个 `.txt` 和 5 个 `.pdf` 现实的应急响应报告，涵盖勒索软件、内部威胁、云安全等内容： python src/generate_reports.py 6. **运行应用程序**： streamlit run streamlit_app.py ## 🌐 Streamlit Community Cloud 部署按照以下步骤在网络上在线托管您的 ThreatIQ 应用程序： 1. **将您的代码推送到 GitHub**：确保所有文件（`.env`、`venv/` 和 `chroma_db/` 除外）都已推送到公开的 GitHub 仓库。 2. **登录 Streamlit Community Cloud**：前往 [share.streamlit.io](https://share.streamlit.io/) 并使用您的 GitHub 账户登录。 3. **部署应用程序**： - 点击 **"New app"** 按钮。 - 选择您的仓库、分支（通常是 `main`），并将主文件路径设置为 `streamlit_app.py`。 - 点击 **"Deploy!"**。 4. **配置密钥与 API 密钥**： - 在您的应用程序仪表板中，点击右下角的 **Settings**（齿轮图标）。 - 进入 **"Secrets"** 选项卡。 - 以 TOML 格式输入您的环境变量： LLM_PROVIDER = "gemini" GEMINI_API_KEY = "your_actual_gemini_api_key_here" - 保存设置。应用程序将自动重启并安全地连接到 Gemini API。 ## 📝 作品集与简历亮点 ### GitHub 描述 ### 简历要点 * **开发了 ThreatIQ**，这是一个模型无关的网络威胁情报系统，使用 **检索增强生成 (RAG)** 和 **Streamlit** 来实现非结构化安全日志和报告的自动化分析。 * **设计了一种协作式多 Agent 架构**，包含威胁、CVE 和 IOC Agent，可解析安全细节并将结果聚合为专业、可下载的高管安全报告。 * **使用 `pypdf`、`SentenceTransformers` 和 `ChromaDB` 实现了动态 PDF 和 TXT 报告索引**，实现了高效的查询检索，并缓解了 LLM 上下文窗口/配额的限制。 * **集中化了 LLM 逻辑**以使代码库与提供商无关，支持在云端端点（**Google Gemini API**）和本地开源模型（**Ollama/Llama**）之间进行无缝的运行时热切换。 * **构建了测试套件和报告生成器**，创建了 20 个复杂且高度逼真的威胁画像（包括勒索软件、网络钓鱼、云泄露和零日漏洞利用），其中包含实际的 CVE 元数据和网络指标。

标签：AI, AI风险缓解, ChromaDB, Kubernetes, RAG, 多智能体, 威胁情报分析, 漏洞分析, 网络安全, 自动化代码审查, 路径探测, 逆向工具, 隐私保护