mertcanaltin/composto

GitHub: mertcanaltin/composto

Composto 是一款基于 AST 的代码上下文压缩引擎，通过将源代码转换为结构化中间表示来大幅减少 AI 代理的 token 消耗。

Stars: 74 | Forks: 6

# Composto **为 AI 代理提供极致节省 token 的代码上下文。以极少的 token 呈现文件的完整结构，并内置其因果历史。** Composto 将任何源代码文件压缩为 Health-Aware IR（健康感知中间表示），精确保留你的代理所需的内容——签名、类型、控制流、依赖关系——比原始代码节省 60-95% 的 token。在此之上，它将文件的因果历史（即在你正要修改的代码周边，历史上曾发生过哪些改动和故障）提取出来，作为参考上下文。本地优先，MIT 许可证。兼容 Claude Code、Cursor 和 Gemini CLI。 ``` $ composto ir src/memory/confidence.ts L1 USE:./types.js OUT INTERFACE:ConfidenceContext OUT INTERFACE:ScoreAndConfidence FN:calibrationFactor(signals: Signal[]) GUARD:[firing.length === 0 → 1.0, avg < 20 → 0.3, avg < 100 → 0.6] FN:historyFactor(totalCommits: number) GUARD:[totalCommits < 50 → 0.2, totalCommits < 200 → 0.5, totalCommits < 1000 → 0.8] OUT FN:computeScoreAndConfidence(signals: Signal[], ctx: ConfidenceContext) # 541 tokens of raw code → 230 tokens 的 IR（减少 57%）。结构完好无损： # 每个 signature、dependency 和 decision threshold 都得以保留。 ``` ## 3 步即可使用 ``` # 1. 查看你的 repo 消耗的 AI 成本 — 零安装，无需 API key，约 2 秒 cd your-project npx composto-ai score # scorecard: tokens, $/load, risk hotspots, a README badge # 2. 安装 npm install -g composto-ai # 3. 将其接入你的 AI agent，以便自动获取紧凑的 context composto init --client=claude-code # or cursor, or gemini-cli composto init --client=claude-code --with-compress # also auto-compress large Reads (saves tokens; see `stats`) # 重启你的 client。现有 settings 会进行合并，绝不会被覆盖。 ``` 就这么简单。你的代理现在会读取保持结构的 IR，而不是原始文件。

更多命令

``` composto score . # shareable scorecard + README badge (add --json to pipe) composto ir src/app.ts # compress one file to IR (L0/L1/L2/L3) composto context src/ --budget 4000 # pack a directory into a token budget composto context . --target # target file raw, surroundings as IR composto context . --json # machine-readable context for piping into agents composto proxy --port 8787 # compression proxy — point your LLM base URL at it composto impact src/auth/login.ts # advisory causal history for a file composto stats # hook telemetry + cumulative tokens saved ```

### 核心：极致节省 token 的结构化上下文 Composto 的核心骨干是基于 tree-sitter 的 AST 压缩器和智能上下文打包器。将任何文件压缩为 IR，或者将整个目录打包进一个 token 预算中： ``` composto ir src/app.ts # compress a file to IR (L0/L1/L2/L3) composto context src/ --budget 2000 # smart context within a token budget composto benchmark . # see compression stats ``` 详情请参阅下方的 [IR 层级](#ir-layers)、[Health-Aware IR](#health-aware-ir) 和 [上下文预算](#context-budget) 章节。 ### 进阶：作为参考上下文的因果历史 Composto 还会索引你的 git 历史，并提取出在你当前编辑文件时，历史上曾发生过哪些与之伴随的改动和故障——这是供代理权衡的参考上下文，而非硬性阻断条件。请参阅下方的 [因果上下文](#causal-context)。 ### MCP 插件 MCP 服务器内置于 `composto-ai` 中。首先全局安装该包，然后向你的客户端注册服务器： ``` npm install -g composto-ai ``` **Claude Code:** ``` claude mcp add composto -- composto-mcp ``` **Cursor** — 添加到 `~/.cursor/mcp.json`（或项目本地的 `.cursor/mcp.json`）： ``` { "mcpServers": { "composto": { "command": "composto-mcp" } } } ``` 然后重启 Cursor，并在 Settings → MCP 下验证 `composto` 是否已显示为绿色。 **Claude Desktop** — 将同样的代码块添加到 `~/Library/Application Support/Claude/claude_desktop_config.json`。 Composto 会为你的 AI 助手添加 5 个工具：`composto_ir`、`composto_benchmark`、`composto_context`、`composto_scan` 和 `composto_blastradius`（在 beta 测试期间，最后一个工具受 `COMPOSTO_BLASTRADIUS=1` 标志控制）。 #### Cursor：一键设置注册 MCP 服务器仅仅是*暴露*这些工具——Cursor 的代理通常默认使用其内置的 `read_file` / `codebase_search`。要同时配置 MCP 服务器**以及**一条告知代理何时调用 Composto 的项目规则，请运行： ``` cd your-project composto init ``` 这会写入 `.cursor/mcp.json`（项目本地 MCP 注册）和 `.cursor/rules/composto.mdc`（一条 `alwaysApply: true` 的规则，会被注入到每次对话中）。现有文件会被合并，绝不会被覆盖。重启 Cursor 并在 Settings → MCP 中检查 `composto` 是否已显示为绿色。如果不添加该规则，命中率约为 30-50%；添加后，约为 85-95%。规则模板内嵌于 [`src/cli/init.ts`](src/cli/init.ts)（`CURSOR_RULES_MDC`）中——你可以打开生成的 `.cursor/rules/composto.mdc` 进行针对项目的自定义。 #### Hook 强制注入 (v0.6.0+) 与其要求代理记住调用 `composto_blastradius`，不如挂载一个 hook，使其在每次执行 Edit / Write / MultiEdit **之前自动**被咨询。当判定结果为 `medium` 或 `high` 时，代理会内联接收到一个 `` 上下文块——你无需任何操作，警告只会在需要的地方自动出现。 ``` cd your-project composto init --client=claude-code # or cursor, or gemini-cli ``` 这会写入： - 平台的 MCP 配置（与之前相同） - 一个 **`PreToolUse` hook**（适用于 Claude Code / Gemini CLI），会在每次针对文件的工具调用时触发 `composto hook pretooluse`。该 hook 会提取目标文件，运行 `composto_blastradius`，并将判定结果作为 `additionalContext` 注入。如果判定为 `low` 则直接放行——不产生任何噪音。 - 对于 Cursor：会写入一条 `.cursor/hooks.json` 记录，在 `verdict: high` 时**拒绝**工具调用（根据 [forum #155689](https://forum.cursor.com/t/...)，Cursor 的 `additional_context` 会被丢弃，因此采用混合策略——现有的 `.cursor/rules/composto.mdc` 规则负责处理 `medium`/`low` 的情况，而 hook 仅在 `high` 时打断）。现有设置会被合并，绝不会被覆盖。重复运行 `composto init` 是幂等的——不会产生重复的 hook 记录。 **观察运行状态：** ``` composto stats # hook invocations, verdict distribution, p50/p95 latency composto stats --json # machine-readable composto stats --disable # opt out (writes .composto/telemetry-disabled marker) ``` 遥测数据**完全保留在本地**——写入你仓库中的 `.composto/memory.db`，不会有任何数据离开你的机器。没有用户 ID，没有云同步，无需账号。 **平台支持矩阵：** | 平台 | MCP | Hook | 策略 | |---|:---:|:---:|---| | Claude Code | ✅ | ✅ `PreToolUse` | `medium`\|`high`\|`unknown` 时附加 `additionalContext`，`low` 时直接放行 | | Cursor | ✅ | ✅ `preToolUse` | 通过 `permissionDecision` 在 `high` 时拒绝；`medium`/`low` 通过 `.mdc` 规则处理 | | Gemini CLI | ✅ | ✅ `BeforeTool` | `medium`\|`high`\|`unknown` 时附加 `additionalContext` | | Claude Desktop | ✅ | — | 仅支持 MCP（尚无 hook API） | ## 工作原理 Composto 使用 [tree-sitter](https://tree-sitter.github.io/) 将你的代码解析为 AST，随后遍历每个节点并对其进行分类： | 层级 | 操作 | 内容 | 节点占比 | |------|--------|------|-----------| | **Tier 1** | 保留 | imports, functions, classes, interfaces, types, enums | 0.8% | | **Tier 2** | 概括 | if, for, while, switch, return, throw, try/catch | 0.9% | | **Tier 3** | 压缩 | variable declarations → one-liner, await → kept | 6.9% | | **Tier 4** | 丢弃 | string contents, operators, punctuation, comments | **86.6%** | 你代码中 86.6% 的 AST 节点都是噪音。Composto 会将它们丢弃。 ## 命令 ``` # 可分享的 scorecard：AI context 成本 + risk hotspots + 一个 README badge composto score . # add --json to pipe into scripts/agents # 跨项目 benchmark token 节省情况 composto benchmark . # 运行 compression proxy — 将你的 LLM client 的 base URL 指向它 composto proxy --port 8787 # swaps raw code blocks for IR in-flight (BYOK) # 在不同的 detail levels 下生成 IR composto ir L0 # Structure map (~10 tokens) — just names composto ir L1 # Full IR — compressed code + health signals composto ir L2 # Delta context — only what changed composto ir L3 # Raw source — original code # 在 token budget 内进行智能 context 打包 composto context --budget # 将最大信息量适配到你的 budget 中： # hotspot 文件获取 L1（详细），其余获取 L0（结构） # 扫描 security issues 和 debug artifacts composto scan . # 分析 git 历史以了解 health trends composto trends . # 比较 LLM 质量：raw code vs IR（需要 ANTHROPIC_API_KEY） composto benchmark-quality # Historical blast radius — beta 版，由 COMPOSTO_BLASTRADIUS=1 控制开启 composto index # bootstrap .composto/memory.db from git history composto impact # risk verdict + signals for a file composto index --status # diagnostics: schema, freshness, calibration ``` ## 因果上下文除了压缩功能外，Composto 还会将你仓库的 git 历史索引到本地的 SQLite 图数据库中，并提取出当前代码无法告诉你的信息：*“这个区域被回退过吗？它是否存在一个修复集群？历史上有哪些文件曾与它一起发生过改动和故障？”*——这是任何 LLM 都无法仅从文件本身推断出的上下文。它作为**供代理权衡的参考上下文**提供，而非硬性阻断。每次查询的信号包括：`revert_match`、`hotspot`、`fix_ratio`、`author_churn`、`cochange`。当置信度较低时，工具会返回 `unknown` 而不是盲目猜测。 ``` $ composto impact src/auth/login.ts revert_match ■■■■■■■■■■ this file was touched by a Revert commit cochange ■■■■■ historically co-changed with session.ts, token.ts in fixes hotspot ■ 14 changes in the last 90 days ``` **实话实说，我们的现状。** 在 4 个仓库上进行的时间旅行回溯测试（fastify、express、got、flask——每个都被回退到修复前的快照）表明，因果层是一个**高召回率、参考级**的信号：在成熟的仓库中，它能恢复 67-80% 修复实际触及的文件。精确度适中（~0.55）——这些信号为你指出的是*候选者*，而不是为它们打包票，这也正是为什么 Composto 将它们作为上下文提供给代理判断，而不是作为阻塞性的判定结果。召回率会随着 git 历史记录的增加而提升，因此随着你的仓库日益成熟，其价值也会随之增长（年轻的仓库在积累修复历史之前，获取的信息微乎其微）。客观的评价是：**因果上下文是代理在编辑前会查阅的高召回率记忆层**——即“这些文件在历史上曾一起发生过故障”——而不是一个精确度检测关卡。压缩核心是无条件运行的；因果层则在其之上增添了针对特定仓库的记忆。可用作 CLI（`composto impact`、`composto index`）和 MCP 工具（`composto_blastradius`，在 beta 测试期间受 `COMPOSTO_BLASTRADIUS=1` 控制）。 ## 质量验证我们测试了 4 个从简单到困难的文件。在相同的问题下，对比原始代码与 IR：“这个文件是做什么的？” | 文件 | 复杂度 | 原始 Token | IR Token | 节省比例 | 理解度 | |------|-----------|-----------|----------|---------|--------------| | hotspot.ts | 简单 | 299 | 77 | 74.2% | 完全理解 | | layers.ts | 中等 | 765 | 249 | 67.5% | 完全理解 | | detector.ts | 中等 | 704 | 160 | 77.3% | 完全理解 | | ast-walker.ts | **困难 (448 行)** | 3,782 | 663 | 82.5% | ~90% | 即使是处理一个 448 行且带有嵌套 switch 的递归 AST 遍历器，LLM 也能仅凭 IR 完整解释其架构、全部 12 个函数以及数据流。 **IR 保留了什么：** 函数签名、参数类型、导入、控制流、返回值、类/接口声明。 **IR 丢弃了什么：** 字符串内容、正则表达式、操作符细节、格式化——即那些 LLM 本来就已经知道的东西。完整基准测试：[docs/benchmark-proof.md](docs/benchmark-proof.md) ## IR 层级 | 层级 | Token | 用例 | |-------|--------|----------| | **L0** | ~10 | “这个文件里有什么？”——仅包含函数/类名 | | **L1** | ~85 | “这个文件是做什么的？”——压缩后的代码 + 健康信号 | | **L2** | ~65 | “发生了什么改动？”——附带上下文的 git diff | | **L3** | 可变 | “给我看确切的代码”——原始源代码 | ### 何时使用哪一层 ``` "Explain the architecture" → L1 for all files "Fix this bug" → L3 for target file, L1 for context "Review this PR" → L2 for changed files, L1 for context "What files are in this repo?" → L0 for everything ``` ## Health-Aware IR Composto 会分析 git 历史并将健康信号直接嵌入到 IR 中： ``` FN:handleAuth({credentials}) [HOT:15/30 FIX:73% COV:↓ INCON] IF:!session → RET 401 RET { token, expiresAt } ``` - `[HOT:15/30]` — 过去 30 次提交中有 15 次改动（热点） - `[FIX:73%]` — 73% 的改动是 bug 修复 - `[COV:↓]` — 测试覆盖率下降 - `[INCON]` — 多位作者导致的模式不一致只有不够健康的代码才会被标注。健康的文件保持整洁。 ## 上下文预算无需猜测该发送哪些文件。让 Composto 来决定： ``` composto context src/ --budget 2000 ``` 输出： ``` == L1 (detailed) == [hotspot] src/auth/login.ts USE:[./types.js, ./session.js] OUT ASYNC FN:login(credentials) TRY IF:!valid → THROW:AuthError RET { token, user } == L0 (structure) == src/utils/helpers.ts FN:formatDate L5 FN:parseQuery L23 ... Budget: 1994/2000 tokens Files: 9 at L1, 16 at L0 ``` 作为热点的文件会获得完整细节。其他内容仅保留结构。绝不超出预算。 ## 统计数据 ``` L1 compression: ~81% fewer tokens (full IR, structure preserved) L0 compression: ~97% fewer tokens (structure map) Token counts: verified against a real BPE tokenizer, not estimates AST engine: AST-parsed, 0 regex fallback Languages: TypeScript, JavaScript, Python, Go, Rust Causal layer: high-recall advisory (0.67-0.80 recall on mature repos, time-travel backtest across 4 public repos); precision ~0.55, surfaced as context not a gate. ``` ## 配置可选的 `.composto/config.yaml`： ``` watchers: security: enabled: true severity: "src/**": warning "tests/**": info consoleLog: enabled: true trends: enabled: true hotspotThreshold: 10 bugFixRatioThreshold: 0.5 ``` 所有设置都有合理的默认值。配置文件是可选的。 ## 贡献 ``` git clone https://github.com/mertcanaltin/composto cd composto pnpm install pnpm test # 145 tests pnpm build # builds to dist/ npx composto benchmark . # see compression stats ``` ## 许可证 MIT

标签：LLM辅助, MITM代理, SOC Prime, 上下文优化, 人工智能, 代码压缩, 开发工具, 用户模式Hook绕过, 自动化攻击