zedarvates/botte-secrete

GitHub: zedarvates/botte-secrete

一个多 Agent Token 优化平台，通过本地↔云端智能路由和多种压缩缓存技术大幅降低 AI 代码审计与修复过程中的 token 消耗和云端成本。

Stars: 1 | Forks: 0

# 🧦 Botte Secrète — 多 Agent Token 优化平台 [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/cfc32d269a014038.svg)](https://github.com/zedarvates/botte-secrete/actions) [![Tests](https://img.shields.io/badge/tests-277%2F277-brightgreen)](https://github.com/zedarvates/botte-secrete) [![Release](https://img.shields.io/badge/release-v1.3.0-blue)](https://github.com/zedarvates/botte-secrete/releases) [![Token Savings](https://img.shields.io/badge/token%20savings-85%25-blue)](https://github.com/zedarvates/botte-secrete) [![Self-Audit](https://img.shields.io/badge/self--audit-75%2F100%20(B-yellowgreen)](https://github.com/zedarvates/botte-secrete) 一个用于代码审计、自动修复、token 优化和对抗性红蓝对抗的多 Agent pipeline，所有这些都专为在本地硬件上高效运行而构建。 **目标：降低你的项目开发成本。** 将其部署到任何 repo 中，它的 agent 会将廉价的任务路由到本地模型（LM Studio / Ollama），实现 **0 云端 token**，仅在遇到困难部分时才升级到云端，在本地选择工具，并告诉你哪些硬件/基础设施的改动能进一步降低成本。 ## 🎯 它的功能 1. **审计** — 静态分析（死代码、重复、复杂度、密钥、边界） 2. **修复** — 带有验证的自动纠正 3. **优化** — 按项目过滤技能，减少 token 4. **红蓝对抗** — 挑战蓝队的对抗性 agent 5. **本地↔云端路由** — 自动工作量评估将廉价任务发送给本地 LLM，困难的任务发送到云端（DeepSeek/GLM/Nemotron/Grok/Gemma）；融合机制让它们协作 6. **部署** — 一个命令即可通过 MCP 将整个技术栈接入任何项目 ## ⚡ 部署到你的项目 ``` python -m skills.bootstrap.cli /path/to/your-project # wire MCP tools, audit directives, write .botte config python -m skills.infra_advisor.cli auto . # one-pass cost audit (directives + infra + duplication) python -m skills.llm_backends.cli audit --fresh # what local models can this machine run? ``` 部署后，在你的项目中重启 agent — 它将获得 `botte-llm` MCP 工具：`auto_route`、`local_chat`、`fusion`、`find_skills`、`infra_tips`、 `auto_audit`、`audit_local_usage` 等。 ## 📋 复制粘贴提示词 **无需任何设置知识** — 将此内容粘贴到你最喜欢的 LLM 项目聊天中（Claude、ChatGPT、Cursor、…）。它会为你安装并连接所有内容： ``` Set up "Botte Secrète" to cut my AI token/cost on THIS project. Work step by step, run the commands, and stop to ask me only if something fails. 1. Get the toolkit (skip if the folder already exists): git clone https://github.com/zedarvates/botte-secrete 2. From inside the botte-secrete folder, deploy it into MY project: python -m skills.bootstrap.cli "" (this only ADDS to my .mcp.json — it never deletes my existing setup) 3. Tell me what local models I can run: python -m skills.llm_backends.cli audit --fresh 4. Show me my cost report (per component + tokens/session): python -m skills.metrics.cli "" 5. Restart yourself in my project so you pick up the new MCP tools, then from now on route cheap work — classification, extraction, summaries, and choosing which skills/tools to use — to my LOCAL model (tools: local_chat, auto_route, find_skills). Keep the expensive cloud model only for hard reasoning, and prefer `rtk ` for terminal commands. Finish by telling me the headline: how many tokens per session this saves. ``` **运行测试** — 粘贴到 repo 根目录下的终端中（一条命令，适用于 Windows/macOS/Linux）： ``` python scripts/run_tests.py ``` （或者询问你的 agent：*"运行 Botte Secrète 测试套件并报告任何失败： `python scripts/run_tests.py`"*） ## ⚔️ 架构 ``` 👑 Athos (Orchestrator) ┌─────────┼─────────┐ │ │ │ 🥊 Porthos 📿 Aramis ⚔️ d'Artagnan (Audit) ∥ (Optimize) (Fix) └─────────┼─────────┘ │ 👑 Le Cardinal (Red Team) ┌─────────┼─────────┐ │ │ │ 🗡️ Rochefort 🔪 Milady 🕯️ Cte Wardes (Counter-Audit) (Counter-Fix)(Counter-Optimize) ``` 蓝队：**Porthos ∥ Aramis → d'Artagnan → Athos**（并行审计+优化）红队：**Rochefort ∥ Milady ∥ Cte Wardes → Le Cardinal**（并行反击） ## 📦 模块 | 模块 | 用途 | 效果 | |--------|---------|--------| | `core-agent.md` | 共享规则：botte、反模式、澄清、预算 | 减少 57% 预提示词 | | `mousquetaires/` | 蓝队 — 4 个 agent（审计、修复、优化、编排） | 自动化 pipeline | | `cardinal/` | 红队 — 4 个对抗性 agent（反向审计、反向修复、反向优化） | 质量门控 | | `clarification/` | 主动提问 — 最多 5 个，静默=自动 | 减少 80% 无用功 | | `cache/` | `.botte-cache/` — 避免 agent 间重复扫描 | 减少 50% 重复扫描 token | | `llm_backends/` | **P15** — 发现/审计/调用本地 LLM 服务器（LM Studio、Ollama、…） | 卸载 → 0 云端 token | | `llm_mcp/` | **P15** — 向 agent 暴露本地 LLM 工具的 MCP 服务器 | 自动本地路由 | | `local_router/` | **P9** — 将任务路由到本地模型（Hailo/ComfyUI/LocalAI） | 减少 40% 云端 token | | `media_loader/` | **P10** — 在 LLM 看到之前从媒体中提取文本 | 减少 95% 媒体任务 token | | `response_cache/` | **P8** — 语义响应缓存（哈希 + 相似度） | 减少 60% 重复查询 | | `vector_protocol/` | **P11** — agent 通过量化向量通信 | 减少 70% agent 间 token | | `ultra_compact/` | **P12** — 单字符键、数组格式、仅限 delta | 减少 90% 迭代 token | | `code_fingerprint/` | **P13** — 哈希函数，跳过未更改的代码 | 减少 80% 重新分析 token | | `tiered_router/` | **P14** — 5 级模型选择 + agent 压缩 | 比全 PREMIUM 减少 95% | | `auto_router/` | **P17** — 自动本地↔云端路由（DeepSeek/GLM/Nemotron/Grok/Gemma）+ 融合 | 基于工作量，感知预算 | | `loader/` | 用于 `delegate_task` 的预提示词加载器 | 正确的 agent 上下文 | | `fallow_like/` | 8 个静态分析器（死代码、重复、复杂度、密钥、边界等） | 代码质量 | | `skill_project_optimizer/` | 按项目过滤技能，token 性能分析 | 减少 73% 技能 token | | `directives_audit/` | **P16** — 审计 agent 指导文件（CLAUDE.md/AGENTS.md/specs，md+html） | 避免盲目运行 agent | | `skill_finder/` | **P18** — 本地查找相关技能/工具（0 云端 token） | 减少 100% 选择成本 | | `bootstrap/` | **P19** — 将整个技术栈部署到项目中（一条命令） | 让其变为现实 | | `capabilities/` | **P26** — 自我模型：能力注册表 + 管理器 + ASCII 系统树 | 集合 → 系统 | | `cluster/` | **P27** — 将家庭实验室作为一个可调度资源 — 将工作分发给空闲机器 | 恢复闲置容量 | | `conductor/` | **P28/P32** — 路由目标 → 有序的本地优先计划，并执行其安全步骤 | 通用路由器 + 执行器 | | `control_loop/` | **P29** — 衡量路由结果 → 调整阈值（自我改进） | 会学习的路由器 | | `report/` | **P30** — 将任何审计保存为带时间戳的 .md/.html，随时可浏览 | 可供查阅的审计 | | `cost_estimator/` | **P31** — 评估任务或修复的 token·模型·金钱·时间 | 提前了解成本 | | `fix/` | **P31** — 列出可纠正的问题，每个都附有成本估算（仅限计划） | 纠正的成本 | | `trends/` | **P31** — 随时间跟踪审计指标 + delta | 查看进度 | | `metrics/` | **P22** — 每个组件以成本为中心的指标（LOC、持续开启成本、节省量） | 量化收益 | | `preflight/` | **P23** — 提交的策略 + 每轮次 hook（偏好本地、自动） | 强制执行，而非自愿 | | `checkup/` | **P23** — 规范的一键式项目检查 + 偏差检测 | 无需手写提示词 | | `infra_advisor/` | **P20** — 硬件/软件/MCP 集群提示 + 自动审计（ASCII 图表） | 削减代码之外的成本 | | `prompt_improver/` | **P21** — 在本地将粗糙的提示词重写为专业的结构化/JSON 提示词 | 0 token 提示词工程 | | `ingest/` | **P24/P33** — 本地网页抓取 + Qdrant 摄取，使用真实的本地 embedding（自动解析，哈希后备） | 0 token 提取 | | `docgen/` | **P24** — 文档：本地草稿 → 云端精炼 + 本地会话审查 | 冗长的文档，本地化 | | `app_test/` | **P25** — 本地优先的 GUI 测试（OculiX 图像匹配）+ HTML 事后分析（日志/截图） | 0 云端视觉 token | | `botte` | 终端包装器 — 压缩命令输出 | 减少 60-99% 终端 token | | `code-rules/` | 编码标准：标准库优先，扁平架构 | 减少 30% 上下文 | | `simplify-code/` | 并行 3-agent 代码审查 | 减少 25% 编辑后 token | | `understand-anything/` | 代码库知识图谱 | 减少 50% 探索 token | ## 🚀 快速开始 ``` git clone https://github.com/zedarvates/botte-secrete.git cd botte-secrete # 完整 Blue Team pipeline python3 -m skills.mousquetaires.cli run ~/your-project --output ./reports # Blue + Red Team (对抗性) python3 -m skills.mousquetaires.cli run ~/your-project --output ./blue python3 -m skills.cardinal.cli run ~/your-project --blue-reports ./blue --output ./red python3 -m skills.cardinal.cli confront --blue ./blue --red ./red # 仅优化 Token python3 -m skills.skill_project_optimizer.cli optimize ~/your-project # 仅 Code audit python3 skills/mousquetaires/scripts/porthos_audit.py ~/your-project ./audit-output # 查看 Token 节省 botte gain ``` ## 💰 Token 节省 | 技术 | 节省量 | 如何实现 | |-----------|---------|-----| | 共享核心提示词 | 57% | DRY 原则：核心加载一次，8 个 delta 替代 8 份完整副本 | | JSON 输出格式 | 75-80% | 紧凑 schema 替代冗长的 markdown | | 项目缓存 | 50% | `.botte-cache/` 避免重复扫描 | | 按项目过滤技能 | 73% | `.skills-profile` 排除不相关技能 | | botte 终端包装器 | 60-99% | 压缩命令输出 | | Token 预算强制器 | 定性 | 每个 agent 的硬性限制（800-2500 tok） | | **综合** | **~65%** | **贯穿整个 pipeline 的减少量** | ## 🎯 Token 预算 | Agent | 预算 | 超出时的策略 | |-------|--------|---------------------| | Porthos | 2000 tok | 截断 >10 的发现 | | d'Artagnan | 1500 tok | 报告跳过的修复 | | Aramis | 2500 tok | 仅限 P0 行动 | | Athos | 1000 tok | 综合 + 链接 | | Rochefort | 1500 tok | 前 5 个漏报 | | Milady | 1200 tok | 前 5 个回归 | | Cte Wardes | 1200 tok | 前 5 个过度优化 | | Le Cardinal | 800 tok | 结论 + 前 3 个行动 | ## 🛠️ botte — Token 优化终端 ``` botte cargo build # -80% botte cargo test # -90% botte git status # -59% botte git diff # -80% botte pnpm install # -90% botte docker ps # -85% botte gain # View savings botte discover # Find missed optimization opportunities ``` ## 📂 项目结构 ``` botte-secrete/ ├── skills/ │ ├── core-agent.md # Shared rules (loaded once for all agents) │ ├── cache/ # .botte-cache/ system │ ├── clarification/ # Proactive question engine │ ├── loader/ # Pre-prompt loader for delegate_task │ ├── fallow_like/ # 8 static analyzers │ ├── skill_project_optimizer/ # Per-project token optimizer │ ├── mousquetaires/ # Blue Team (4 agents) │ │ ├── prompts/ # Agent pre-prompts (deltas) │ │ ├── scripts/ # Agent execution scripts │ │ ├── templates/ # Report templates │ │ └── cli.py # CLI (typer + rich) │ └── cardinal/ # Red Team (4 agents) │ ├── prompts/ # Adversarial pre-prompts │ └── scripts/ # Confrontation scripts ├── docs/ │ ├── plans/ # Architecture design docs │ └── schemas/ # JSON report schemas ├── scripts/ │ └── botte # Token-optimized terminal wrapper └── README.md ``` ## 🏠 本地 LLM 后端 (P15) 检测并使用本地模型服务器，这样廉价任务就不会发送到云端。适用于 **LM Studio、Ollama、LocalAI、vLLM、llama.cpp、Jan、KoboldCpp** — 任何使用 OpenAI `/v1` schema 的工具。 ``` # 发现可达内容 (localhost、单台主机或扫描 /24) python -m skills.llm_backends.cli scan python -m skills.llm_backends.cli scan --subnet python -m skills.llm_backends.cli scan 192.168.1.47 # Audit：你使用本地模型吗？这台机器能运行什么？后续步骤？ python -m skills.llm_backends.cli audit --fresh # 在本地运行 prompt — 0 cloud tokens python -m skills.llm_backends.cli chat "classify: bug or feature?" --max-tokens 128 ``` 审计会分析你的硬件配置（RAM/VRAM/GPU），如果你**尚未**拥有本地模型，它会提供针对你机器实际可运行内容量身定制的分步设置指南。 ### 将其接入 Claude Code (MCP) 将 `configs/mcp.example.json` 复制到 `.mcp.json`，将 `cwd` 设置为该 repo，然后 agent 将获得五个工具：`discover_backends`、`list_models`、`audit_local_usage`、 `route_task`、`local_chat`。告诉它*“在本地分类这些”*，它就会卸载到你的 GPU。参见 [`skills/llm_mcp/SKILL.md`](skills/llm_mcp/SKILL.md)。 ## 🧭 自动路由器 + 融合 (P17) 根据工作量评估自动决定**本地 vs 云端**，涵盖本地后端 *以及*云提供商（DeepSeek、Zhipu GLM、NVIDIA Nemotron、xAI Grok、 Google Gemma — 均兼容 OpenAI）。 ``` python -m skills.auto_router.cli route "classify: bug or feature?" # → LOCAL python -m skills.auto_router.cli route "design a distributed cache, prove correctness" # → cloud tier python -m skills.auto_router.cli providers # catalog + availability python -m skills.auto_router.cli run "summarize this in 2 lines" # decide + execute ``` 琐碎任务保留在本地（0 云端 token）；困难的任务选择最便宜且 capable 的云端模型 — 感知预算，并在未设置云端 key 时回退到本地。通过 `OPENROUTER_API_KEY`（所有模型）或原生 key （`DEEPSEEK_API_KEY`、`XAI_API_KEY`、`ZHIPUAI_API_KEY`、`NVIDIA_API_KEY`）进行云端访问。 **融合**使模型协作： ``` python -m skills.auto_router.cli fusion cascade "is 17 prime?" # cheap→escalate if unsure python -m skills.auto_router.cli fusion draft "explain the CAP theorem" # local drafts, cloud refines python -m skills.auto_router.cli fusion vote "capital of France, one word?" # consensus ``` 还作为 MCP 工具`auto_route`、`fusion`）暴露。参见 [`skills/auto_router/SKILL.md`](skills/auto_router/SKILL.md)。 ## 🔬 硬件加速 - **Hailo-8** (EUREKAI 192.168.1.47) — YOLOv8, ResNet-18, PaddleOCR - **ComfyUI** (EUREKAI :8188) — 本地 Stable Diffusion - **Bonsai Image** — WebGPU 三元模型 - 视觉/生成的云端 API 成本为零 ## 🗺️ 路线图 - [x] P0: 共享核心 + 精简提示词 + JSON schema（减少 57% 提示词） - [x] P1: 项目缓存 + 并行 pipeline + token 预算（减少 50% 重复扫描） - [x] P2: 输出截断 + 智能预取 - [x] P3: 预提示词加载器 + agent 差异语言 + 合并的 README - [x] P4: 所有模块的 SKILL.md + 统一的 pipeline 脚本 - [x] P7: 仪表板 HTML + 编码 agent 的工单生成器 - [x] P8: 通过 Qdrant 的语义响应缓存（减少 60% 重复查询） - [x] P9: 本地模型路由器（Hailo/ComfyUI/LocalAI）（减少 40% 云端 token） - [x] P10: 渐进式媒体加载器（减少 95% 媒体 token） - [x] P11: 向量 agent 协议（减少 70% agent 间 token） - [x] P12: 超紧凑 JSON 格式（减少 90% 迭代报告） - [x] P13: 代码指纹识别（减少 80% 重新分析） - [x] P5: CI/CD 集成（pre-commit hooks，GitHub Actions） - [x] P6: 实时仪表板（自动监视、节省图表、实时状态） - [x] P14: 分层模型选择 + agent 间压缩 - [x] P15: 本地 LLM 后端（LM Studio/Ollama 发现、审计、MCP 服务器） - [x] P16: 指导文件审计 — 跨格式验证 CLAUDE.md/AGENTS.md/specs（包括 HTML） - [x] P17: 自动本地↔云端路由器（基于工作量）+ 多提供商目录 + 融合（级联/草稿-精炼/投票） - [x] P18: 本地技能/工具查找器 — 用于技能选择的零 token 检索 - [x] P19: 项目部署器 — `botte setup ` 接入 MCP + 指导文件审计 + 配置 - [x] P20: 基础设施顾问 — 集群硬件/软件/MCP 提示 + 自动审计（ASCII 集群图） - [x] P21: 提示词改进器 — 本地 LLM 重写为专业的结构化/JSON 提示词（0 云端 token） - [x] P22: 项目指标（按组件的 LOC、持续开启成本、节省量）+ 可扩展到大型 repo 的闲置扫描器 - [x] P23: 强制执行层 — 提交的策略 + 预检 hook（自动偏好本地）+ 规范的 /checkup - [x] P24: 本地网页抓取 + Qdrant 摄取（基础）+ 文档草稿→精炼 + 会话审查 - [x] P25: 本地优先的 GUI 应用测试（OculiX）+ HTML 事后分析报告；部署器接入的 OculiX 视觉控制 MCP 服务器 - [x] P26: 能力注册表 + 管理器 + 分层系统映射（工具包的自我模型） - [x] P27: 集群调度器 — 发现机器，将工作分发给空闲主机（LRU），agent 委派交接 - [x] P28: Conductor — 将目标路由到有序的、本地优先的能力计划（通用路由器） - [x] P29: 控制回路 — 衡量路由结果并调整工作量→层级阈值（自我改进） - [x] P30: 报告持久化 — 将审计保存为带时间戳的 .md/.html（名称+日期+时间），可通过 `report list` 浏览 - [x] P31: 成本估算（token·模型·金钱·时间）+ 带有每次纠正成本的修复计划 + 指标趋势 - [x] P32: Conductor 执行器 — 无人值守地运行计划中的只读步骤；确认门控会变更/云端的步骤 - [x] P33: 用于摄取的真正本地 embedding — 自动解析本地 /v1/embeddings endpoint（哈希后备） - [x] P34: GitHub Action — 将本地优先的 /checkup 结论发布（并更新）为 PR 评论，0 云端 token ## 📝 变更日志参见 [CHANGELOG.md](CHANGELOG.md)。当前版本：**v1.3.0**。 ## 📜 许可证 MIT — 自由使用，不断改进。 ## 👤 作者 Sylvain Galliez ([@zedarvates](https://github.com/zedarvates))

标签：AI风险缓解, DLL 劫持, SOC Prime, Token优化, 人工智能, 多智能体, 大语言模型, 开发工具, 用户模式Hook绕过, 逆向工具