soyun9947/codebase-intel

GitHub: soyun9947/codebase-intel

一个基于 MCP 协议的代码库上下文层工具，为 AI 编码代理提供架构决策、质量契约和依赖图谱等深度上下文，让 AI 生成的代码既快又正确。

Stars: 0 | Forks: 0

codebase-intel

你的 AI 代理可以编写代码。但它知道你的代码为什么存在吗？

AI 编码代理能够自动补全、重构和生成代码。但在生产环境中至关重要的事情上，它们依然无能为力： - 它们不知道你的团队**决定**使用 token bucket 而不是 sliding window —— 以及原因 - 它们不知道合规团队**要求**在每个响应中包含速率限制标头 - 它们不知道更改 `config.py` 会**破坏**计费和分析功能 - 它们生成的代码**看起来正确**，但却违反了项目的架构模式 **codebase-intel** 解决了这个问题。它是位于你的代码库和任何 AI 代理之间的上下文层 —— 不仅提供存在*什么*代码，还说明它*为什么*存在、必须遵循*什么规则*，以及更改它会*破坏什么*。 ## 前后对比 ``` WITHOUT codebase-intel WITH codebase-intel ───────────────────── ─────────────────── Agent reads: every file in the dir Agent reads: only what matters Tokens used: 16,063 Tokens used: 5,955 Knows why code exists: No Knows why: Yes (13 decisions) Quality guardrails: None Guardrails: 4 contracts enforced Drift awareness: None Drift: stale context detected Impact analysis: None Impact: knows what else breaks Result: faster but fragile Result: faster AND correct ``` ### 生产级代码库的真实基准测试 | 项目 | 文件数 | 朴素 Tokens | codebase-intel | 减少比例 | 决策数 | 契约数 | |---|---:|---:|---:|---:|---:|---:| | **FastAPI 单体应用** (认证 + 前端) | 359 | 16,063 | 5,955 | **63%** | 13 | 4 | | **微服务 A** (AI 处理) | 358 | 14,611 | 5,955 | **59%** | 0 | 7 | | **微服务 B** (文档生成) | 87 | 2,461 | 1,275 | **48%** | 0 | 6 | | **微服务 C** (用户管理) | 153 | 5,904 | 1,476 | **75%** | 0 | 4 | ## 有何不同目前已有非常优秀的代码图工具（code-review-graph 非常出色 —— 6K+ 星）。**我们并不与它们竞争。** 我们解决它们未能解决的问题： | 能力 | 仅图工具 | codebase-intel | |---|---|---| | 代码图 + 依赖项 | 是 | 是（19 种语言） | | **决策日志** —— 代码之所以如此的*原因* | 否 | **是** | | **质量契约** —— AI 必须遵循的规则 | 否 | **是** | | **AI 反模式检测** —— 捕捉幻觉导入、过度抽象 | 否 | **是** | | **漂移检测** —— 过时上下文、上下文腐化警报 | 否 | **是** | | **Token 预算** —— 使上下文适应任何代理的窗口 | 否 | **是** | | **实时分析** —— 随时间证明效率提升 | 否 | **是** | ### 缺失的一层 ``` What exists today: What codebase-intel adds: Code → Graph → Agent Code → Graph ──────────────────→ Agent ↓ ↑ Decision Journal ──→ WHY ────────┤ Quality Contracts → RULES ───────┤ Drift Detector ──→ WARNINGS ─────┘ "Here are the 3 files that matter, the decision your team made 6 months ago, and the 2 rules you must not violate." ``` ## 快速开始 ``` # 全局安装（无需 venv） uvx codebase-intel --help # ephemeral, like npx # 或 pipx install codebase-intel # persistent global install # 或 pip install codebase-intel # traditional # 在你的项目中初始化 cd your-project codebase-intel init # 查看发现的内容 codebase-intel status # 从 git 历史中挖掘 decisions codebase-intel mine --save # 自动检测代码模式并生成质量 contracts codebase-intel detect-patterns --save # 运行 benchmarks（查看前后对比） codebase-intel benchmark # 查看实时 dashboard codebase-intel dashboard ``` ### 连接到 Claude Code / 任意 MCP 客户端 **单项目：** ``` { "mcpServers": { "codebase-intel": { "command": "codebase-intel", "args": ["serve", "/path/to/your/project"] } } } ``` **多项目（全局模式）：** ``` # 从任何位置一次性注册 repos codebase-intel register ~/projects/user-service codebase-intel register /opt/services/payment-api codebase-intel register /var/repos/notification-service ``` ``` { "mcpServers": { "codebase-intel": { "command": "uvx", "args": ["codebase-intel", "serve", "--auto"] } } } ``` 在 `--auto` 模式下，MCP 服务器会根据请求中的文件路径，自动将每个请求路由到正确的项目。无需手动切换 —— 就像 `npx @playwright/mcp` 一样工作。现在，你的代理在编写代码之前，会自动获取相关的上下文、决策和契约。 ## 三大支柱 ### 1. 决策日志 —— “为什么存在这段代码？” 每个团队都会做出数百个从未被记录下来的决策。*为什么*你选择了 Postgres 而不是 Mongo？*为什么*认证中间件要那样构建？*为什么*滑动窗口方法被拒绝了？ codebase-intel 自动从 git 历史记录中捕获这些信息，并将它们与代码关联起来： ``` # .codebase-intel/decisions/DEC-042.yaml id: DEC-042 title: "Use token bucket for rate limiting" status: active context: "Payment endpoint was getting hammered during flash sales" decision: "Token bucket algorithm with per-user buckets, 100 req/min" alternatives: - name: sliding_window rejection_reason: "Memory overhead too high at scale" constraints: - description: "Must not add >2ms p99 latency" source: sla is_hard: true code_anchors: - "src/middleware/rate_limiter.py:15-82" ``` **没有它**：你的代理会提议使用滑动窗口（正是你 6 个月前拒绝的方案）。 **有了它**：你的代理会看到该决策，遵循它，并遵守 SLA 约束。 ### 2. 质量契约 —— “AI 必须遵循什么规则？” Linter 检查语法。契约强制执行**你的项目模式**： ``` # .codebase-intel/contracts/api-rules.yaml rules: - id: no-raw-sql name: No raw SQL in API layer severity: error pattern: "execute\\(.*SELECT|INSERT|UPDATE" fix_suggestion: "Use the repository pattern" - id: async-everywhere name: All I/O must be async severity: error pattern: "requests\\.(get|post)" fix_suggestion: "Use httpx.AsyncClient" ``` **内置的 AI 护栏**可捕捉 AI 代理最容易搞砸的模式： - 幻觉导入（不存在的模块） - 过度抽象（只有一个子类的基类） - 对不可能发生的情况进行不必要的错误处理 - 复述代码而不是解释原因的注释 - 未被请求的功能（违反 YAGNI 原则） ### 3. 漂移检测 —— “我们的上下文仍然有效吗？” 上下文会腐化。决策会过时。代码锚点指向已删除的文件。codebase-intel 可以检测到这些情况： ``` $ codebase-intel drift ╭──────────────── Drift Report ────────────────╮ │ Overall: MEDIUM │ │ 3 items need attention │ ╰───────────────────────────────────────────────╯ - [MEDIUM] Decision DEC-012 anchored to deleted file - [MEDIUM] Decision DEC-008 is past its review date - [LOW] 2 files changed since last graph index ``` ## 19 种语言通过 [tree-sitter-language-pack](https://github.com/nicolo-ribaudo/tree-sitter-language-pack) 进行完整的 tree-sitter 解析： | 类别 | 语言 | |---|---| | **Web** | JavaScript, TypeScript, TSX | | **后端** | Python, Java, Go, Ruby, PHP, Elixir | | **系统** | Rust, C, C++ | | **移动端** | Swift, Kotlin, Dart | | **其他** | C#, Scala, Lua, Haskell | ## CLI 命令 ``` # 核心 codebase-intel init [path] # Initialize — build graph, create configs codebase-intel analyze [--incremental] # Rebuild or update the code graph codebase-intel mine [--save] # Mine git history for decision candidates codebase-intel detect-patterns [--save] # Auto-detect code patterns → quality contracts codebase-intel drift # Run drift detection codebase-intel benchmark # Measure token efficiency (before/after) codebase-intel dashboard # Live efficiency tracking over time codebase-intel serve [path] # Start MCP server (single project) codebase-intel status # Component health check codebase-intel intent [--verify] # Track and verify delivery goals # 全局 workspace（多项目） codebase-intel register # Register a project globally codebase-intel unregister # Remove from global registry codebase-intel projects # List all registered projects codebase-intel serve --auto # Start MCP server for ALL registered projects # 跨 repo（微服务） codebase-intel crossrepo # Scan repos for cross-service dependencies codebase-intel crossrepo --all # Scan all registered projects ``` ## MCP 工具（12 个工具） | 工具 | 功能 | |---|---| | `get_context` | **主工具。** 在 token 预算内组装相关的文件 + 决策 + 契约。 | | `query_graph` | 查询依赖项、被依赖项，或运行影响分析。 | | `get_decisions` | 获取与特定文件相关的架构决策。 | | `get_contracts` | 获取你正在编辑的文件的质量契约。 | | `check_drift` | 在信任旧决策之前验证上下文的新鲜度。 | | `impact_analysis` | “如果我更改这个文件，会破坏什么？” | | `get_status` | 健康检查 —— 图统计、决策数量、契约数量。 | | `record_feedback` | 记录 AI 输出是否被接受/拒绝 —— 为学习循环提供动力。 | | `get_efficiency_report` | 实时 token 节省情况、接受率、前后对比证明。 | | `set_intent` | 捕获用户的需求，并附带机器可验证的验收标准。 | | `check_intent` | 在标记为完成之前，验证验收标准是否真正得到满足。 | | `list_intents` | 显示所有被跟踪的意图及其完成状态。 | ## 社区契约包针对流行框架的预构建质量规则： | 包 | 规则数 | 覆盖范围 | |---|---|---| | **fastapi.yaml** | 10 | 分层架构、Pydantic 模式、异步、Depends()、密钥 | | **react-typescript.yaml** | 11 | 函数式组件、禁止 `any`、自定义 Hook、懒加载 | | **nodejs-express.yaml** | 12 | 错误处理、helmet、速率限制、结构化日志 | ``` cp community-contracts/fastapi.yaml .codebase-intel/contracts/ ``` ## v0.2.0 版本更新 ### 全局工作区 —— 一台服务器，管理所有项目无需再为每个项目运行单独的 MCP 服务器。只需注册一次你的代码库，即可统一提供服务： ``` codebase-intel register ~/projects/user-service codebase-intel register /opt/services/payment-api codebase-intel serve --auto ``` 服务器会根据文件路径自动将每个请求路由到正确的项目。LRU 缓存会保持最近使用的 5 个项目处于加载状态 —— 较旧的项目会被驱逐并按需重新加载。 ### 意图跟踪 —— “你真的构建了被要求的功能吗？” AI 代理在代码编译通过时会说“完成了”。但它们真的交付了所要求的内容吗？意图跟踪通过**机器可验证**的验收标准来捕获目标： ``` # Agent 在任务开始时设定 intent（通过 MCP 工具 set_intent） # Agent 执行任务... # 在标记完成之前，Agent 调用 check_intent # 系统运行自动化检查：file_exists, function_exists, grep_match, test_passes... # 返回：满足 18/21 条标准 — 识别出 3 处差距 ``` 11 种验证类型：`file_exists`、`file_contains`、`function_exists`、`wired`、`cli_works`、`mcp_tool_exists`、`grep_match`、`grep_no_match`、`test_passes`、`custom`、`manual`。 ### 跨仓库感知 —— 微服务依赖映射扫描 10 种语言中的 14 个 Web 框架，以查找暴露的端点和出站 HTTP 调用。映射出哪个服务依赖于哪个端点： ``` codebase-intel crossrepo --all --impact user-service # 输出： # CRITICAL：3 个服务依赖于 /api/v1/users/{id} # → payment-service 调用了此 endpoint [CRITICAL]，位于 src/clients/user.py:42 # → notification-service 调用了此 endpoint，位于 lib/api/users.ts:18 ``` 支持：FastAPI、Express、Flask、Django、Spring Boot、Gin、Actix、Rails、Phoenix、Laravel、ASP.NET、Ktor、Vapor、Echo。 ### 反馈循环 —— 从接受/拒绝中学习记录 AI 生成的代码是被接受、修改还是拒绝。随着时间推移，识别出哪些上下文模式能带来更好的输出，以及哪些拒绝原因最为常见。 ## 架构 ``` AI Agent (any) ──→ MCP Server (12 tools) │ Workspace Manager ←── Global Registry (~/.codebase-intel/) (multi-project routing, LRU cache) │ Context Orchestrator (token budgeting, priority, conflicts) ╱ │ ╲ Code Graph Decision Quality (19 langs) Journal Contracts SQLite+WAL YAML files YAML+builtins ╲ │ ╱ Drift Detector (staleness, rot, orphans) │ ┌──────────┼──────────┐ Analytics Feedback Intent Tracker Loop Tracker └──────────┼──────────┘ │ Cross-Repo Scanner (14 frameworks, 10 languages) ``` ## 理念 **我们并不让 AI 代理变得更聪明。我们让它们变得知情。** 一个拥有 1M token 上下文的代理，就像一个可以访问公司每一个文件的开发者 —— 信息过载且缺乏焦点。一个拥有 codebase-intel 的代理，就像一个刚刚与高级工程师进行了 5 分钟交谈的开发者：*“这是你需要知道的，这是我们这样做的原因，这是你绝对不能破坏的三个东西。”* **我们不与图工具竞争。我们完善它们。** 代码图回答了“什么依赖于什么”。这是必要的，但还不够。codebase-intel 回答了更难的问题：*为什么做出了这个决策？适用什么约束？合规团队会标记什么？我们已经尝试过并拒绝了什么？* **我们不掩盖真相。我们证明真相。** 在你的项目上运行 `codebase-intel benchmark`。查看数据。随着时间推移运行 `codebase-intel dashboard`。观察改进情况。每一项声明都有可重现的、特定于项目的数据作为支撑。 ## 许可证 MIT

标签：AI编程助手, DLL 劫持, MCP, PyPI, Python, 上下文感知, 代码依赖管理, 代码库分析, 代码意图分析, 代码理解, 代码生成, 代码规范, 大语言模型, 威胁情报, 开发者工具, 数据管道, 文档结构分析, 无后门, 智能开发环境, 架构上下文, 模型上下文协议, 渗透测试工具, 自动化代码重构, 软件工程, 软件架构, 逆向工具, 错误基检测, 静态代码分析, 项目依赖映射