zzz-piyush/secRAG-X

GitHub: zzz-piyush/secRAG-X

一款结合知识图谱、向量搜索与本地 LLM 的 AI 驱动网络安全推理系统，用于漏洞映射与威胁分析。

Stars: 2 | Forks: 0

# 🛡️ SecRAG-X ### 结合知识图谱、向量搜索与本地 LLM 的 AI 驱动网络安全推理系统 ![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python) ![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge) ![Status](https://img.shields.io/badge/Status-Active-brightgreen?style=for-the-badge) ![Neo4j](https://img.shields.io/badge/Neo4j-Knowledge%20Graph-blue?style=for-the-badge&logo=neo4j) ![Ollama](https://img.shields.io/badge/Ollama-Local%20LLM-black?style=for-the-badge) ![FAISS](https://img.shields.io/badge/FAISS-Vector%20Search-orange?style=for-the-badge) [![演示](https://img.shields.io/badge/▶%20Watch%20Demo-Video-red?style=for-the-badge)](#-demo)

## 🏗️ 架构与数据流 ### 高层系统架构 ``` graph TD A[👤 User / Browser Dashboard] --> B[🌐 Flask API - server.py] B --> C[🧠 Reasoning Engine - explane.py] C --> D[(🗄️ Neo4j Knowledge Graph)] C --> E[🔍 FAISS Vector Store] C --> F[🤖 Ollama LLM + Embeddings] D --> G[CVEs / CWEs / CPEs] D --> H[Assets / Network Topology] D --> I[MITRE ATT&CK Techniques] ``` ### RAG 数据流流水线 ``` graph LR UserQuery["👤 User Query"] --> LLM["🤖 Llama 3 (Ollama)"] LLM --> KG[("🗄️ Neo4j Knowledge Graph")] LLM --> VS[("🔍 FAISS Vector Store")] KG --> RAG["🛡️ RAG Reasoning Response"] VS --> RAG ``` ## 🔐 功能 | 功能 | 描述 | |---------|-------------| | 🗄️ 知识图谱 | 包含资产、软件、CVE、CWE、网络拓扑和 MITRE ATT&CK 的 Neo4j 图谱 | | 🔍 混合检索 | FAISS 向量搜索 + 图谱遍历，提供准确、上下文相关的答案 | | 🤖 本地 LLM | 基于 Ollama 的推理 —— 完全离线，无需 API 密钥 | | 🛡️ 意图检测 | 安全处理模糊、不安全或超出范围的查询 | | 📊 实时仪表盘 | 提供图谱可视化、风险摘要和资产详情下钻的浏览器 UI | | 🧪 测试套件 | 包含对 API、graph schema、对齐、推理以及无图谱回退的测试 | ## 🆚 为什么选择 SecRAG-X？ | 功能 | 传统工具 | SecRAG-X | |---------|-------------------|----------| | 漏洞分析 | 孤立分析 | 基于图谱的上下文分析 | | 攻击映射 | 有限 | 集成 MITRE ATT&CK | | 查询处理 | 手动过滤 | 自然语言 | | 语义检索 | ❌ | 基于 FAISS | | AI 推理 | ❌ | 基于 Ollama | | 可视化 | 基础仪表盘 | 交互式图谱 | ## 🧰 技术栈

### 组件细分 | 层级 | 技术 | 用途 | |-------|------------|---------| | **语言模型** | Llama 3 (通过 Ollama) | 本地网络安全推理与解释 | | **Embeddings** | Nomic Embed Text | 用于本地文档检索的语义 embedding | | **图数据库** | Neo4j | 包含 CVE、CWE、资产和 MITRE ATT&CK 的知识图谱 | | **向量存储** | FAISS | 对离线文档进行语义相似度检索 | | **后端框架** | Flask (Python) | 用于推理和查询的 REST API endpoint | | **前端 UI** | HTML5 / CSS3 / Vanilla JS | 基于 D3.js 图谱可视化的交互式浏览器仪表盘 | ## 📊 结果与指标该系统已针对全面的网络安全数据集进行了基准测试和验证： | 指标 | 目标值 | 验证状态 | |--------|--------------|-----------------| | **漏洞节点 (CVEs)** | ~60,000 | ✅ 已填充 59,210 | | **弱点节点 (CWEs)** | ~1,000 | ✅ 已填充 969 | | **攻击技术 (MITRE ATT&CK)** | ~700 | ✅ 已填充 691 | | **企业资产** | 50 | ✅ 已关联 50 个模拟资产 | | **关系 (边)** | ~120,000+ | ✅ 已映射 122,877 条边 | | **意图检测准确率** | >95% | ✅ 在可靠性测试中达到 100% | | **多跳推理深度** | 最多 4 跳 | ✅ 资产 → 软件 → CVE → CWE → ATT&CK | | **可靠性测试套件通过率** | 100% | ✅ 186/186 个测试用例通过 | | **幻觉率** | 0.0% | ✅ 零幻觉（仅限于基于图谱的证据） | ## 📁 项目结构 ``` secRAG-X/ ├── 📁 static/ → Browser dashboard (HTML/CSS/JS) ├── 🖥️ server.py → Flask API and graph endpoints ├── 🧠 explane.py → Main reasoning and intent engine ├── 📥 data_ingest.py → Neo4j ingestion pipeline ├── 🏗️ build_knowledge.py → FAISS knowledge base builder ├── 🔍 vector_store.py → Embedding and vector search helpers ├── ⚙️ rag_engine.py → Lightweight RAG wrapper ├── 🗺️ mapping_engine.py → Graph mapping utilities ├── 🏢 asset.py → Mock enterprise asset generator ├── 🌐 network_topology.py → Mock topology/SBOM generator ├── 🧪 test_*.py → Validation and regression tests ├── 📄 requirements.txt → Python dependencies ├── 🔒 .env.example → Environment variable template └── 📜 LICENSE → MIT License ``` ## ⚡ 快速开始 **克隆仓库** ``` git clone https://github.com/JENITH47/secRAG-X.git cd secRAG-X ``` **安装依赖** ``` pip install -r requirements.txt ``` **启动 Neo4j 并配置凭据** ``` docker run -d --name neo4j \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/your_password_here \ neo4j:latest cp .env.example .env # 使用你的 Neo4j 凭据编辑 .env ``` **拉取 Ollama 模型并构建知识图谱** ``` ollama pull llama3 ollama pull nomic-embed-text python data_ingest.py python build_knowledge.py ``` **启动服务器** ``` python server.py ``` 在浏览器中打开 http://localhost:5000。 ## 📡 API 参考 | 方法 | Endpoint | 描述 | |--------|----------|-------------| | POST | `/api/ask` | 回答自然语言安全问题 | | GET | `/api/summary` | 图谱汇总：资产、CVE、弱点、攻击 | | GET | `/api/risks` | 最高风险的资产 | | GET | `/api/attacks` | 可能的攻击技术 | | GET | `/api/exposure` | 攻击暴露指标 | | GET | `/api/asset/` | 单个资产的详细上下文 | **请求示例：** ``` curl -X POST http://localhost:5000/api/ask \ -H "Content-Type: application/json" \ -d '{"question": "Which systems are most vulnerable?"}' ``` ## 💬 示例问题 ## 🎬 演示

[![观看演示](https://img.shields.io/badge/▶%20Watch%20Full%20Demo-Walkthrough-red?style=for-the-badge&logo=youtube&logoColor=white)](https://drive.google.com/file/d/1vLTG0lMg6HAn1Js3o_MlLf30cL28yMXT/view?usp=sharing)

## 🧪 测试在 Neo4j 完成数据填充后运行测试套件： ``` python test_api.py python test_graph_schema.py python test_alignment.py python test_no_graph.py ``` 运行完整的 11 部分可靠性测试： ``` python test_full_system.py ``` ## 🔮 未来展望 - 实时威胁情报集成 - 实时入侵检测支持 - 自动化网络安全响应机制 - 大规模分布式部署 - 实时网络流量分析 ## 📝 注意事项 - 大型/生成的数据集和向量索引文件已排除在 git 之外。 - 请勿将生产环境凭据纳入版本控制 —— 请使用 `.env` 进行本地配置。 - `.env.example` 文件展示了所有必需的环境变量。 ## 👤 作者 **Jenith** [![GitHub](https://img.shields.io/badge/GitHub-JENITH47-181717?style=for-the-badge&logo=github)](https://github.com/JENITH47) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-0A66C2?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/jenith-golyan/) **Piyush Kumar** ## [![GitHub](https://img.shields.io/badge/GitHub-zzz-piyush?style=for-the-badge&logo=github)](https://github.com/zzz-piyush) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-0A66C2?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/piyush-kumar-linkdin-profile/)

**⭐ 如果您觉得这个项目有用，请给本仓库点个 Star！**

标签：AI风险缓解, DLL 劫持, FAISS, Neo4j, RAG, 大语言模型, 威胁情报, 开发者工具, 数据可视化, 请求拦截, 逆向工具