Scaffoldic/agentforge-graph

GitHub: Scaffoldic/agentforge-graph

该项目将代码仓库解析为包含符号、框架语义、架构决策与 Git 历史的结构化知识图谱，供 AI 编程 Agent 通过 MCP 进行关联检索与推理。

Stars: 0 | Forks: 0

# agentforge-graph [![CI](https://static.pigsec.cn/wp-content/uploads/repos/cas/ad/ad5834178f7599af9fdda11629d49cae07f2997beec49821b2920eff5bfd50e7.svg)](https://github.com/Scaffoldic/agentforge-graph/actions/workflows/ci.yml) [![PyPI](https://img.shields.io/pypi/v/agentforge-graph.svg)](https://pypi.org/project/agentforge-graph/) [![Python](https://img.shields.io/badge/python-3.13%2B-blue.svg)](https://pypi.org/project/agentforge-graph/) [![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](https://github.com/Scaffoldic/agentforge-graph/blob/main/LICENSE) 普通的代码图工具只能回答“什么连接到了什么”。而 Agent 还需要知道“这是做**什么**用的、受什么决策约束、API 暴露面有哪些、涉及哪些数据表、谁调用了它、发生了什么变化”。**agentforge-graph 将解析后的代码结构、框架语义、架构决策、git 演变历史以及 LLM 提取的富化信息整合到一个 Agent 可遍历的图中——每一个事实都附带有其来源。** 基于 [AgentForge](https://pypi.org/project/agentforge-py/) 构建。 ``` pip install agentforge-graph # ← the engine is in the box; nothing else to run ckg index . # repo → typed graph in seconds (no creds, no server) ckg serve-mcp --repo . # → 10 read-only tools for your agent ```

ckg indexes a FastAPI + SQLAlchemy app and surfaces its routes, ORM models, and dependency-injection graph
Index a FastAPI + SQLAlchemy app → its routes, ORM models (with relations), and DI graph — no creds, no server.

## 开箱即用的功能 - 🧩 **一条命令生成强类型代码图** — `pip install` → `ckg index .` → 生成带有基于描述符的稳定 ID 以及 `CONTAINS`/`IMPORTS`/`CALLS`/`INHERITS` 边的文件、类、函数、方法。**在 `.ckg/` 下内嵌 Kuzu + LanceDB —— 无需服务器，无需云服务，无需配置。** 支持 10 种语言：Python、TypeScript、JavaScript、Go、Ruby、PHP、Java、C#、C++、Rust。 - 🌐 **将框架语义作为图的边** *(核心差异化优势)* —— 涵盖路由、ORM 模型和 DI，而不仅仅是函数调用。`ckg routes` 就是你的 API 暴露面，`ckg models` 是你的数据模型，`ckg services` 是你的依赖注入图 —— 跨越 **11 个支持包**：FastAPI、Flask、SQLAlchemy、Django (Python)；Express、NestJS (JS/TS)；Spring (Java)；Gin (Go)；ASP.NET (C#)；Laravel (PHP)；Rails (Ruby)。 - 🏛️ **决策 ↔ 代码** *(核心差异化优势)* —— 提取 ADR/文档，并将它们与其 `GOVERN`（约束）的代码关联起来。命中 `payments/` 时，会在 Agent 进行编辑**之前**提示 *"ADR-0012 (已采纳)：幂等键必须在客户端生成"*。 - 🕰️ **内置 Git 演变追踪** —— `ckg history `、`ckg changed-since `，以及 `--as-of ` 历史重构。代码变动轨迹和作者信息也融入图中。 - 🔎 **混合检索** —— 向量搜索**入口** → 强类型**图扩展**。用自然语言提问，获取具有*关联性*的上下文：符号本身、它的调用者，*以及*约束它的决策。 - ⚡ **默认增量索引** —— 仅对 diff 部分重新索引。在拥有 5k 个文件的 repo 中编辑 3 个文件 → 只需几秒，而非几分钟。Embedding/富化信息仅重新计算发生变更的部分。 - 🤖 **原生支持 Agent** —— 通过 **MCP（10 个工具）** 或作为原生 AgentForge 工具集提供只读服务。每个响应都带有新鲜度标记。 - 🧠 **LLM 富化，可控且可选** —— 设计模式标签（如*"这个类是一个 Repository"*，附带置信度 + 理由）和自底向上的模块摘要，所有内容均具有 `llm`-来源标记。**CI 无需任何模型调用或云凭证。** - 🏢 **从单一 repo 扩展到整个组织** —— **集中**托管索引（共享目录或 SurrealDB/Neo4j），构建一次即可被多方**只读**消费；通过一个联邦 MCP endpoint 提供多 repo **工作区**服务；并**跨服务追踪请求** —— `ckg services-map` / `ckg trace` 绘制跨服务调用图（HTTP 客户端 → 路由，通过路径或 OpenAPI 契约匹配）。 **状态：0.6.1 —— 支持组织级规模，且现在只需一条命令即可完成构建。** 0.5 版本添加了集中托管、联邦多 repo 工作区和跨服务追踪；0.6 版本增加了**构建侧功能** —— 仅需一个 `workspace.yaml` + 一个配置 + `ckg build --workspace`（成员可以是本地的或通过 git URL 指定）即可搭建多 repo CKG，并配有快速失败的 `ckg doctor` 验证以及由配置/CLI 控制的追踪功能。已发布在 [PyPI](https://pypi.org/project/agentforge-graph/)。每种语言包都在真实的 OSS repo 上进行了验证，执行了带有凭证的 embedding/检索/富化运行；真实的 Agent 在无人值守的情况下通过这些工具回答了问题。详见 [`CHANGELOG`](https://github.com/Scaffoldic/agentforge-graph/blob/main/CHANGELOG.md) 和 [`docs/features/TRACKER.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/features/TRACKER.md)。 ## 三种运行方式 —— 单一 repo、工作区或集中存储 ``` # 1) 单个 repo — 数秒内生成一个 typed graph ckg index . && ckg routes . # 2) 中央 store — 在 repo 外部托管 index，以只读方式消费 # (在 ckg.yaml 中设置 store.central_root；由 CI 构建一次，供多处读取) ckg status . --read-only # 3) workspace — 使用一条命令构建多个 repo，随后生成 cross-service graph ckg build --workspace workspace.yaml # index (+embed) every member, one command ckg services-map --workspace workspace.yaml # who calls whom (HTTP → route) ckg trace payments --workspace workspace.yaml --direction upstream # blast radius ```

ckg run three ways: a single repo (index + routes), a central store (hosted + read-only), and a workspace (cross-service call graph + trace)
Single repo → central store → workspace — the cross-service call graph and blast-radius trace, all creds-free.

→ 选择你的路径：**[单一 repo](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/1-single-repo.md)** · **[工作区](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/2-workspace.md)** · **[集中存储](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/3-central-store.md)**。 ## 快速开始 ``` pip install agentforge-graph # engine included (tree-sitter + kuzu + lancedb) # 1) 将 repo index 到 graph 中 — 首次运行后的每次运行均为增量 ckg index . # files/classes/functions/calls (+ ADRs, routes, models…) # 探索 graph — 无需 embeddings，无需 creds，无需 server ckg map --budget 2000 # centrality-ranked repo orientation ckg routes # API surface: METHOD PATH → handler ckg models # ORM data models: table, fields, relations ckg services # dependency-injection map ckg decisions --status accepted # ADRs and what they govern ckg history # when/who/churn for a symbol ckg status # indexed commit, staleness, node counts ``` 使用任意 embedding 提供商（AWS Bedrock、OpenAI 或本地 OpenAI 兼容服务器 —— 参见 [模型](#models--pick-a-provider-or-bring-your-own)) 添加**语义搜索**： ``` pip install "agentforge-graph[bedrock]" # or [openai] ckg embed . # AST chunks → vectors ckg query "how are auth tokens validated" # ranked, *connected* context ckg query --symbol "" --mode impact # reverse deps — "who calls this" # 可选：显式的、有预算的 LLM enrichment ckg enrich . --all --budget-usd 2 # design-pattern tags + module summaries ckg tagged Repository # symbols tagged with a design pattern ``` ### 查看运行效果 ``` $ ckg index . indexed 1c2f3a4 · 412 files · 5,290 nodes / 9,133 edges · 3.1s $ ckg routes POST /payments/{pid}/refund → refund() (app/api.py:42) GET /health → health() (app/api.py:16) $ ckg models users [users] (app/models.py:7) fields: id, name, email relations: posts→posts (relationship) $ ckg query "how are auth tokens validated" auth/tokens.py:88 TokenValidator.validate (cosine 0.71) ← called by api/middleware.py:23 require_auth ⚖ governed by ADR-0007 (accepted): signing keys must rotate every 90 days ``` 最后那段代码就是其核心要义：一个自然语言查询能返回符号本身、**谁调用了它**，*以及* **约束它的决策** —— 彼此关联，且附带来源。 ### 将其提供给 Agent 通过 MCP 提供只读服务 —— **10 个工具**：`ckg_repo_map`、`ckg_search`、`ckg_symbol`、`ckg_impact`、`ckg_neighbors`、`ckg_status`、`ckg_routes`、`ckg_decisions`、`ckg_explain`、`ckg_history`： ``` claude mcp add ckg -- ckg serve-mcp --repo . # stdio (subprocess) ckg serve-mcp --repo . --transport http # or HTTP → http://127.0.0.1:8765/mcp ``` 通过 HTTP，将任何 MCP 客户端指向该 URL： `{"mcpServers": {"ckg": {"url": "http://127.0.0.1:8765/mcp"}}}`。或者作为原生 AgentForge 工具集： ``` from agentforge import Agent from agentforge_graph.serve import code_graph_tools agent = Agent(model="anthropic:claude-sonnet-4-6", tools=code_graph_tools(".")) ``` → 完整指南（工具 schema、客户端配置、护栏、新鲜度标记）： [`docs/guides/10-using-over-mcp.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/10-using-over-mcp.md)。 ## 图中包含什么 | 功能 | 你将获得什么 | |---|---| | **强类型代码图** | 带有基于描述符的稳定 ID 的文件、类、函数、方法；`CONTAINS`/`IMPORTS`/`CALLS`/`INHERITS` 边。在 10 个语言包中采用保守、无猜测的解析方式。 | | **框架感知** *(差异化优势)* | `Route → HANDLED_BY → handler`、`DataModel → HAS_FIELD`/`RELATES_TO`、`Service → INJECTED_INTO` —— 跨越 11 个支持包：FastAPI、Flask、SQLAlchemy、Django、Express、NestJS、Spring、Gin、ASP.NET、Laravel、Rails。`ckg routes`/`models`/`services`。 | | **决策 ↔ 代码** *(差异化优势)* | 提取 ADR/文档并将其链接到它们所 `GOVERN`（约束）的代码；文档正文被嵌入并可搜索。 | | **时序 / git 演变** | 按符号划分的历史记录、变动情况、作者信息；`changed-since`、`as-of` 历史重构。 | | **混合检索** | 向量入口 → 强类型图扩展。提供具有上下文关联的内容，而非扁平列表。 | | **LLM 富化** *(差异化优势)* | 可控的设计模式标签 + 自底向上的模块摘要 —— 携带 `llm`-来源，且可关闭。 | | **原生支持 Agent** | 只读的 MCP（10 个工具）或原生 AgentForge 工具集；每个响应都带有新鲜度标记。 | | **内嵌优先** | 本地 Kuzu 图 + LanceDB 向量位于 `.ckg/` 下。无需服务器。存储和模型可插拔。 | ## 检索质量（经实测）检索是面向 Agent 的核心交互界面，因此我们对其进行了量化测量——而非凭空感觉。在一个**客观的**自然语言→代码基准测试上（每个有文档记录的符号的 docstring 即为查询，该符号即为标准答案；标签直接来源于图的 `DESCRIBES` 边，且经过验证无数据泄露），跨越 **4 个真实的 OSS repo**（click、httpx、flask、fastapi）中的 **388 个查询**，使用 Bedrock `cohere.embed-v4` 进行测试： | | 基础混合检索 | + Bedrock cross-encoder 重排序 (w=0.3) | |---|---|---| | **MRR** | 0.952 | 0.971 | | **recall@1** | 0.915 | 0.948 | 基础检索开箱即用地将正确的代码定位在 **1 号位**。可选的 cross-encoder 重排序器（Bedrock Rerank —— 无需 torch）增加了一个较小但**具有统计学意义**的精度提升（ΔMRR +0.019，95% CI [+0.008, +0.031]，通过配对自助法得出 p < 0.001），耗时约 440 ms/查询 —— 因此它是**可选开启**的，适用于当 Top-1 精度的提升值得增加此延迟的场景。完整方法与数据： [`docs/validation/rerank/benchmark.md`](docs/validation/rerank/benchmark.md)。 ## 存储 —— 使用什么数据库，我可以切换吗？ **默认情况下，无需启动任何服务。** 图存在于内嵌的 **Kuzu** 数据库中，向量位于内嵌的 **LanceDB** 索引中，两者皆存放于你 repo 的 `.ckg/` 目录下（ADR-0006）。零配置，无服务器。存储在两个契约背后是**可插拔的** —— `GraphStore` 和 `VectorStore` ([`core/contracts.py`](https://github.com/Scaffoldic/agentforge-graph/blob/main/src/agentforge_graph/core/contracts.py)) —— 通过带有 entry-point 组的 **driver registry**（驱动注册表） ([`store/registry.py`](https://github.com/Scaffoldic/agentforge-graph/blob/main/src/agentforge_graph/store/registry.py)) 解析： ``` # agentforge.yaml (engine 配置位于 app: 下) app: store: graph: { driver: kuzu } # built-in vectors: { driver: lancedb } # built-in ``` 三个服务器端 backend 作为可选扩展提供了第一方支持：**Neo4j**（图）、 **Postgres/pgvector**（向量）和 **SurrealDB** —— 多模型，因此一台服务器即可同时作为图和向量存储。每一个都通过了与内嵌默认实现*相同*的 `GraphStoreConformance` / `VectorStoreConformance` 测试套件（在 CI 中针对活动服务器运行）。其他任何方案（SurrealDB 除外）均属于**树外适配器**：实现契约，通过一致性测试套件，注册 entry-point —— 然后只需 `pip install + 一行配置`，无需修改核心代码。 → [`docs/guides/09-storage-backends.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/09-storage-backends.md)。 ## 模型 —— 选择提供商或自带模型每一个模型边界都是一个由提供商注册表解析的**接口**，因此切换提供商只需更改一行配置（在 `agentforge.yaml` 的 `app:` 下）—— 而非更改代码： | 接口 | 第一方支持 | 选择方式 | |---|---|---| | `Embedder` | `bedrock` (Cohere `embed-v4`) · `openai` (包括**本地** OpenAI 兼容服务器) · `fake` (CI) | `embed.driver` | | `PatternJudge` / `Summarizer` | `bedrock` · `anthropic` (直接 API) · `scripted` (CI) | `enrich.provider` | - **在 AWS 上？** 默认的 `bedrock` (Claude + Cohere) 使用你的 AWS 凭证。 - **不在 AWS 上？** `enrich.provider: anthropic` (设置 `ANTHROPIC_API_KEY`) + `embed.driver: openai` (设置 `OPENAI_API_KEY`) 即可提供完整的实战路径，无需 AWS。 - **本地 / 自托管？** 将 `embed.base_url` 指向任何 OpenAI 兼容服务器 (Ollama, vLLM, LM Studio) —— 使用相同的 `openai` 驱动。 - **完全离线？** `embed.driver: fake` + `enrich.provider: scripted`。 CI 使用确定性 fake 模型，因此在构建或测试时无需任何模型调用或云凭证**。 → [`docs/guides/08-model-providers.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/08-model-providers.md)。 ## 架构完整概览（层级图、数据模型、ASCII 格式的 pipeline 及扩展点）请参见 **[`docs/ARCHITECTURE.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/ARCHITECTURE.md)**。简而言之：一个包含契约和值类型的 `core`；一个**确定性引擎**，它从不导入框架（解析、存储、解析引用、embed、检索、**框架**、知识库）；以及上层一个轻量级的**框架层**（`serve` = MCP/Tools，`enrich` = 带有预算控制的 LLM）。 ``` ckg CLI / MCP server / Agent │ serve · enrich (framework layer — may import agentforge) │ ingest · store · chunking · embed · retrieve · repomap · frameworks · knowledge · temporal │ (deterministic engine — no agentforge) core: contracts · models · SymbolID · provenance · kinds │ Kuzu (graph) + LanceDB (vectors) under .ckg/ ``` ## 配置与安装扩展只需一个配置文件：**`agentforge.yaml`**。 - **框架级键**位于顶层（Agent 模型、预算、MCP）—— 严格模式。 - **引擎配置**位于框架的 **`app:`** 直通块下：`store`、 `ingest`、`chunking`、`embed`、`retrieve`、`repomap`、`serve`、`frameworks`、 `knowledge`、`enrich`、`temporal`。引擎使用原生的 pyyaml 读取 `app:`，从不导入框架 (ADR-0001)，并且是宽松的（忽略未知键）。 - 仍然支持独立的 **`ckg.yaml`**（在顶层包含相同的配置块）以供脱离框架使用；引擎会自动发现其中一个文件。基础的 `pip install agentforge-graph` 包含确定性引擎 (tree-sitter, kuzu, lancedb, networkx)。可选扩展会添加提供商/backend： | 安装 | 添加内容 | |---|---| | `pip install agentforge-graph` | 基础包：引擎 + 框架运行时 + MCP 服务 | | `…[bedrock]` | `boto3` — Bedrock embeddings + Claude enrichment | | `…[openai]` | `openai` — OpenAI / 本地 OpenAI 兼容的 embeddings | | `…[neo4j]` / `…[pgvector]` | 可选的服务器端图 / 向量 backend | | `…[surrealdb]` | 可选的单一服务器 —— 同时包含图**和**向量 (多模型) | | `…[rerank]` | sentence-transformers cross-encoder (默认关闭) | Anthropic-API enrichment 路径 (`enrich.provider: anthropic`) 无需额外安装 —— 基础的安装包中已经包含了 `anthropic` SDK。 ## 文档导览 | 文档 | 内容简介 | |---|---| | [`docs/ARCHITECTURE.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/ARCHITECTURE.md) | 高层架构图 + 每个 pipeline (ASCII) | | [**`docs/guides/`**](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/README.md) | **分步指南** (每篇均附带 TL;DR)。**入门指引 —— 选择你的设置：** [单一 repo](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/1-single-repo.md) · [工作区](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/2-workspace.md) · [集中存储](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/getting-started/3-central-store.md)。 **主题指南：** [02 索引与检索](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/02-indexing-and-retrieval.md) · [03 框架提取](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/03-framework-extraction.md) · [04 跨文件解析](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/04-cross-file-framework-resolution.md) · [05 架构决策](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/05-architecture-decisions.md) · [06 时序/历史](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/06-temporal-history.md) · [07 富化](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/07-enrichment.md) · [08 模型提供商](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/08-model-providers.md) · [09 存储 backend](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/09-storage-backends.md) · [10 通过 MCP 使用](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/guides/10-using-over-mcp.md) | | [`examples/`](https://github.com/Scaffoldic/agentforge-graph/tree/main/examples) | 可运行的示例 repo (index → routes/models/services/query) | | [`docs/adr/`](https://github.com/Scaffoldic/agentforge-graph/tree/main/docs/adr/) | 9 项架构决策记录 (探讨 *为什么*) | | [`docs/features/`](https://github.com/Scaffoldic/agentforge-graph/tree/main/docs/features/) + [`TRACKER.md`](https://github.com/Scaffoldic/agentforge-graph/blob/main/docs/features/TRACKER.md) | 12 项功能规格说明 + 状态板 | | [`docs/design/`](https://github.com/Scaffoldic/agentforge-graph/tree/main/docs/design/) | 具体功能的设计文档 (探讨 *如何* 实现，构建前生成) | ## License [**Apache-2.0**](https://github.com/Scaffoldic/agentforge-graph/blob/main/LICENSE) —— 宽松许可证，包含明确的专利授权和专利反制条款。详见 [`LICENSE`](https://github.com/Scaffoldic/agentforge-graph/blob/main/LICENSE) 和 [`NOTICE`](https://github.com/Scaffoldic/agentforge-graph/blob/main/NOTICE)。与 AgentForge 保持一致，同样采用 Apache-2.0。

标签：AI编程助手, MCP, Python, SOC Prime, 云安全监控, 代码知识图谱, 开发工具, 无后门, 逆向工具, 静态分析