VedzKun/DigiFortress

GitHub: VedzKun/DigiFortress

一个具备语义记忆持久化和实时安全验证的 LLM Agent 架构，通过多 Agent 共识与对抗模拟保护模型记忆免受注入和污染攻击。

Stars: 0 | Forks: 0

# 🛡️ DigiFortress 一种先进的、安全的 LLM agent 架构，具备**语义记忆持久化**、主动的**上下文推理**以及实时的**安全防御验证**。 DigiFortress 能够动态决定何时查询其向量记忆库，对传入的记忆进行分类，根据 prompt 注入和逻辑矛盾对记忆进行筛查，并使用本地 LLM 合成答案。 ## 🏗️ 核心架构与流程 ``` flowchart TD User([User Input]) --> Agent[Agent Orchestrator] %% Short-term memory Agent -->|1. Store context| Conv[Conversation Memory] %% Security validation pipeline on Remember (Single-Agent) Agent -->|2. On Remember: Validate| Security[🛡️ Validator Core] Security -->|Rule-based & LLM Scorer| Trust{Trust Score >= 0.4?} Trust -->|No| Quarantine[🛑 Quarantine Containment] Trust -->|Yes| Conflict{Contradiction Detected?} Conflict -->|Yes| Block[⚠️ Conflict Blocked] Conflict -->|No| Accept[✅ Save to DBs] %% Database registry Accept --> Chroma[(ChromaDB Vector Store)] Accept --> SQL[(SQLite Security Registry)] %% Query pipeline on Ask Agent -->|3. On Ask: Check necessity| Reasoning{Reasoning Layer} Reasoning -->|No| Prompt[LLM Prompt Assembly] Reasoning -->|Yes| Embed[Embedder] Embed --> Chroma Chroma -->|4. Retrieve & Update stats| SQL SQL --> Prompt Prompt -->|5. Synthesize response| LLM[Ollama: Qwen2.5-3B] LLM --> Agent %% Multi-Agent Security Framework subgraph MultiAgent[Multi-Agent Security & Governance Framework] Comm[Agent Communication Client] -->|Sign & Send| Auth[Agent Authenticator] Auth -->|Verify cryptographic keys| Registry[Agent Registry] Comm -->|Extract Claim| Broadcast[Agent Network Claim Broadcast] Broadcast -->|Validate conflicting claims| CAV[Cross-Agent Validator] CAV -->|Determine winner| Consensus[Consensus Engine] %% Graph & Metrics Comm -->|Auto-register edge| Graph[Agent Network Graph] Graph -->|Trust, Influence & Blame| NetAnalysis[Network Analyzer] %% Attacks & Benchmarks Simulator[Poisoning Simulator] -->|Inject malicious claims| Broadcast Benchmark[Resilience Benchmark Runner] -->|Attack library suite| Simulator Benchmark -->|Evaluate containment| Report[Benchmark Report] %% Containment Consensus -->|Low score / quarantine| Containment[Containment Engine] Containment -->|Block messages / isolate| Comm end %% Database connections Registry --> SQL Graph --> SQL Report --> SQL ``` ## ✨ 特性 * 🧠 **持久化语义记忆**：集成 ChromaDB 和 HuggingFace 的 `sentence-transformers` (`all-MiniLM-L6-v2`)，以持久化地嵌入和调用用户数据。 * 🚦 **智能推理层**：动态拦截查询，以评估是需要进行语义上下文检索，还是可以直接使用短期上下文进行回答。 * 🛡️ **主动式多 Agent 记忆验证核心**： * **多 Agent 共识审查**：通过独立的 **Trust**、**Security** 和 **Consistency** agent 过滤传入的信念。使用共识引擎动态计算共识分数。 * **安全覆盖**：如果 security agent 将危险 payload 标记（`0.0` 分），即使较高的来源声誉将整体 trust 分数拉高至阈值以上，也会显式阻止/隔离这些 payload。 * **LLM 冲突检测器**：根据相似、重叠的历史信念评估新记忆，以实时检测并阻止逻辑矛盾。 * **衰减与声誉分析**：自动随时间推移计算记忆的衰减分数，并根据访问次数和信任权重计算动态的活跃声誉分数。 * **安全事件日志与风险审计**：使用动态风险引擎评估实时风险分数（0 到 100）和风险等级（低、中、高、严重），并将所有记忆验证审计持久化记录到 SQLite 注册表中。 * 🔍 **反事实审计层**：生成反事实基线（在没有检索到的上下文下生成的响应），以测量语义分歧（余弦距离）和所检索记忆的判断漂移，从而实现对隐蔽的 prompt 注入或记忆覆盖的实时检测。 * ⏱️ **会话异常与突发检测**：监控每个会话的用户写入行为，以动态标记速率限制异常和高频注入突发（例如在 60 秒内写入 5 次以上），动态调整会话风险分数。 * 🕸️ **知识图谱提取**：集成 `NetworkX` 以构建动态、持久化的语义网络。使用基于 LLM 的解析自动从已接受的记忆中提取实体及其关系。 * ⚔️ **对抗性攻击模拟器**：发起 prompt 注入攻击（例如系统覆盖、数据泄露），以测试验证层的安全边界，并将结果记录到 `red_team_results` 中。 * 🤝 **多 Agent 安全平台**： * **跨 Agent 验证**：使用基于声誉加权的 Consensus Engine 动态解决跨多个 agent 来源的冲突声明。 * **加密 Agent 身份验证**：使用不可变的 `agent_id` 跟踪和密钥验证 agent 间的消息。 * **控制引擎与中毒模拟器**：自动模拟并跟踪对抗性 agent 中毒级联，阻止恶意广播并根据动态 trust 分数隔离受损 agent。 * **Agent 信任网络图**：将整个多 agent 通信生态系统表示为 NetworkX 图，将节点链接持久化到 SQLite，计算拓扑指标（度/中介中心性）并跟踪爆炸半径。 * **多 Agent 攻击基准测试**：在包含 10 种不同 agent 拓扑（中毒、数据泄露、权限提升）的攻击库下评估网络，以计算标准的 Resilience Score。 * 📊 **记忆安全仪表板**：一个控制台仪表板引擎，用于汇总已接受/冲突/隔离的记忆指标、平均风险、主要威胁来源以及最近的安全事件。 * 🖥️ **交互式 Shell 与 Streamlit Web UI**：标准的控制台终端菜单，并附带高级的 **Streamlit Web UI**，可动态可视化指标和流水线更新。 ## ⚡ 性能优化 DigiFortress 具有高度优化的性能流水线，专为低延迟执行和低内存占用而设计： * 🎛️ **类级模型缓存**：`SentenceTransformer` 模型权重在首次导入时缓存在 `Embedder` 的类变量中。所有后续的 agent 和验证检查器共享同一个内存实例，从而减少了启动延迟（每次初始化节省 1-3 秒），并防止多个冗余实例导致 RAM/VRAM 臃肿。 * 🗄️ **共享 ChromaDB PersistentClient**：单个持久化客户端连接在所有 `MemoryManager` 实例之间共享。这防止了在繁重的并发多 agent 验证下出现 SQLite 锁竞争（“database is locked”错误）。 * 🏎️ **优化的 Trigger 贪心搜索**：目标概念 embedding 在 `TriggerOptimizer` 优化运行开始时仅计算一次，而不是在贪心搜索循环内的每次 token 评估时都计算。这消除了数百次冗余的模型编码调用，并极大地加快了 **AgentPoison Lab** 的 trigger 优化速度。 ## 📁 仓库结构 ``` DigiFortress/ ├── src/ │ ├── agent/ │ │ ├── agent.py # Main Agent orchestrating memory, LLM, and reasoning │ │ ├── conversation.py # Conversation history buffer & flow manager │ │ ├── reasoning.py # Intercepts queries to check if memory is required │ │ ├── agent_registry.py # Manages immutable agent identities │ │ ├── agent_authenticator.py # Cryptographically signs and verifies agent messages │ │ ├── agent_communication.py # Validates message integrity across agents │ │ ├── agent_network.py # Central message broker triggering claim validation │ │ └── agent_claim.py # Dataclass structure for agent knowledge claims │ │ │ ├── memory/ │ │ ├── memory_manager.py # Persistent ChromaDB client integration │ │ └── memory_classifier.py# Classifies memories into preferences, tasks, facts, etc. │ │ │ ├── defenses/ │ │ ├── validator.py # Evaluates trust and coordinates conflicts/decisions │ │ ├── trust_scorer.py # Rule-based static check evaluating trust weights │ │ ├── llm_trust_scorer.py # Dynamic trust checks utilizing HuggingFace models │ │ ├── llm_conflict_detector.py # Contradiction detector running on local Qwen LLM │ │ └── quarantine.py # Temporary containment for quarantined memories │ │ │ ├── security/ │ │ ├── agents/ │ │ │ ├── trust_agent.py # Core trust classification evaluation │ │ │ ├── security_agent.py # Rule-based static security policy checker │ │ │ └── consistency_agent.py # Checks incoming memory contradiction overlaps │ │ ├── cross_agent_validator.py # Resolves network-wide conflicting agent claims │ │ ├── consensus_engine.py # Computes consensus ratings and resolves conflicts │ │ ├── containment_engine.py # Blocks broadcasts and quarantines compromised agents │ │ ├── agent_poison_simulator.py # Launches and benchmarks agent network poisoning attacks │ │ ├── propagation_tracker.py # Logs the spread depth of compromised network messages │ │ ├── explanation_engine.py # Generates multi-agent security audit reasoning │ │ ├── dashboard_service.py # Aggregates overall system security metrics │ │ └── risk_engine.py # Risk assessment engine calculating risk scores and levels │ │ │ ├── graph/ │ │ ├── knowledge_graph.py # NetworkX semantic entity link network │ │ ├── relation_extractor.py # LLM-based entity-relation parser │ │ ├── agent_network_graph.py # NetworkX representation of agent ecosystem nodes/edges │ │ ├── trust_network.py # Measures overall trust across agent connection paths │ │ ├── influence_tracker.py # Tracks degree centrality to identify highly-linked agents │ │ ├── network_analyzer.py # Identifies vulnerable/bottleneck agents using centrality │ │ └── propagation_graph.py # Simulates and tracks compromise blast-radius paths │ │ │ ├── benchmarks/ │ │ ├── attack_library.py # Curated multi-agent attack topology payloads │ │ ├── benchmark_runner.py # Orchestrates and logs benchmark runs to SQLite │ │ ├── benchmark_report.py # Summarizes detection, containment, and resilience │ │ └── multi_agent_benchmark.py # Wrapper interface to run the full benchmark suite │ │ │ ├── database/ │ │ └── security_db.py # SQLite analytics db tracking access, metrics & reputations │ │ │ ├── embeddings/ │ │ └── embedder.py # Local Sentence Transformers vectorizer wrapper │ │ │ ├── attacks/ │ │ └── poisoning_simulator.py # Injector simulator launching adversarial payloads │ │ │ ├── redteam/ │ │ ├── red_team_engine.py # Runs adversarial payload suites and logs to DB │ │ └── attack_library.py # Curated prompt injection and system override datasets │ │ │ └── llm/ │ └── llm_handler.py # Ollama connector client for local model generation │ ├── data/ │ ├── security.db # SQLite database storing analytics records │ └── chroma_db/ # Persistent Vector database │ ├── requirements.txt # System dependencies ├── README.md # Project documentation ├── main.py # Entry interactive CLI shell └── app.py # Premium Streamlit Web Application ``` ## 🚀 设置与安装 ### 1. 前置条件确保您的机器上已安装 **Python 3.10+** 和 [Ollama](https://ollama.com/)。 ### 2. 克隆仓库 ``` git clone https://github.com/VedzKun/DigiFortress.git cd DigiFortress ``` ### 3. 设置虚拟环境创建并激活您的本地 Python 虚拟环境： ``` # 在 Windows 上 python -m venv digifortress_env .\digifortress_env\Scripts\activate ``` ### 4. 安装依赖 ``` pip install -r requirements.txt ``` ### 5. 下载本地 LLM 确保 Ollama 正在您的任务栏中运行，然后拉取所需的 **Qwen2.5** 模型： ``` ollama pull qwen2.5:3b ``` ## 🎮 运行方式 ### 选项 A：Streamlit Web UI（推荐）🛡️ 启动高级 Web 控制台，该控制台提供交互式页面、逐步的流水线动画以及实时模拟器图表： ``` python -m streamlit run app.py ``` 这将在您的默认浏览器中打开 `http://localhost:8501`。 * **🔒 安全仪表板**：实时 KPI 指标、主动威胁评估等级以及动态 Plotly 柱状图。 * **🧠 核心记忆管理器**：交互式搜索面板，用于查看访问日志、声誉或清除记忆。 * **✍️ 记忆（新记忆）**：可视化 embedding 生成、上下文重叠、信任评分以及最终的整合决策。 * **💬 询问 Agent（聊天）**：与 agent 聊天并查看精确的情节上下文检索的沙盒。 * **⚔️ 攻击模拟器**：发起对抗性攻击波，实时观看验证日志、拦截情况以及指标更新。 ### 选项 B：交互式 CLI Shell 💻 在终端内启动标准 shell 界面： ``` python main.py ``` * **`1`（记忆）**：输入新信念，运行验证防御并存储接受的值。 * **`2`（询问）**：基于活跃上下文合成响应。 * **`3`（查看记忆）**：已保存记忆的格式化列表。 * **`4`（分析）**：从 SQLite 打印详细的声誉、访问次数和衰减分数。 * **`5`（安全仪表板）**：显示防御成功率和拦截次数。 * **`6`（运行攻击模拟）**：注入测试 payload 并报告状态。 * **`7`（退出）**：安全关闭连接并退出。 * **`8`（来源声誉）**：显示每个信念来源的活跃声誉分数和指标（已接受、冲突、隔离计数）。 * **`9`（安全事件）**：查看所有安全评估事件（payload、来源、状态、风险分数、风险等级和时间戳）的详细按时间倒序排列的日志。 * **`10`（运行 Red Team 测试）**：运行自动化的、覆盖全类别的 Red Team 对抗性注入攻击波。 * **`11`（查看 Red Team 结果）**：查看以前 Red Team 执行的详细历史结果。 * **`12`（知识图谱邻居）**：查询知识图谱中的任何节点以查看其提取的实体邻居。 * **`13`（记忆安全概览）**：打印所有系统状态的聚合的、基于控制台的仪表板摘要报告。 * **`14`（会话分析）**：显示活跃会话信息、总会话写入次数、突发异常标志以及计算出的会话风险等级。 * **`15`（审计查询（反事实））**：使用反事实审计引擎对查询进行审计，打印分歧、判断漂移以及计算出的记忆影响分数。 * **`16`（运行 MINJA 基准测试）**：跨 worker agent 运行自动化的并行 MINJA 对抗性基准测试套件，以计算主动、被动和行为安全率。 * **`17`（测试 Agent 通信）**：验证基于不可变 ID 的 agent 间消息加密签名。 * **`18`（测试多 Agent 验证与中毒）**：通过模拟矛盾的网络声明、agent 中毒级联以及自动包含拦截，对 Cross-Agent Validator 进行基准测试。 * **`19`（多 Agent 弹性基准测试）**：针对生态系统运行自动化的多 agent 攻击库，构建网络图，分析中心性指标，并计算整体的 Network Resilience Score。

标签：AI安全, AI风险缓解, Chat Copilot, Kubernetes, LLM Agent, LLM评估, Ollama, RAG架构, Web报告查看器, 向量数据库, 提示词注入防御, 特权检测, 逆向工具