ll1r1k-1337/llm-tool-capability

GitHub: ll1r1k-1337/llm-tool-capability

为不支持原生 function calling 的 LLM 提供兼容 OpenAI 格式的工具调用能力，支持代理模式、client 包装和自动 agent 循环。

Stars: 0 | Forks: 0

# llm-tool-capability 为原生不支持 function calling 的 LLM 提供**兼容 OpenAI 的开箱即用工具调用。** 它会将你的工具注入到 prompt 中，将模型返回的文本解析回 OpenAI 格式的 `tool_calls`，并且可以为你运行整个 agentic loop —— 支持流式传输。适用于任何兼容 OpenAI 的 endpoint：**Ollama、vLLM、LM Studio、 llama.cpp**、text-generation-webui 等。 ``` npm install llm-tool-capability ``` `openai` 是一个可选的 peer dependency —— 如果你想包装真实的 OpenAI client，请安装它（你也可以传递任何兼容 OpenAI 的 client 对象）。 ## 为什么许多开源模型非常擅长遵循指令，但**没有提供 `tools` 参数** —— 服务器会拒绝它或直接忽略它。这个 package 依然能让 `tools` / `tool_choice` 生效，通过： 1. 将你的工具 schema 和调用契约渲染到 system prompt 中。 2. 要求模型以带有标记的 ` ```tool_call ` JSON 块的形式发出调用。 3. 将这些块解析回**完全**符合 OpenAI 的 `message.tool_calls` 格式 (`{ id, type: "function", function: { name, arguments } }`，其中 `arguments` 是一个 JSON **字符串**)。你可以通过**三种方式**使用它：作为零代码的 **proxy**（运行一个服务器，将你的 OpenAI client 指向它），作为**直接替换的 client**（在代码中包装你的 client），或者作为 **agentic runner**（它为你运行工具循环）。 ## Proxy 模式（无需修改代码）在你的不支持工具的模型前运行一个本地兼容 OpenAI 的 proxy。任何 OpenAI client 只需要将其 `baseURL` 指向该 proxy —— 无需其他更改。 ``` npx llm-tool-proxy --upstream http://localhost:11434/v1 --port 8787 # llm-tool-proxy 监听于 http://127.0.0.1:8787/v1 # → upstream: http://localhost:11434/v1 ``` 现在将任何 OpenAI client 指向它，并像往常一样传递 `tools`： ``` import OpenAI from "openai"; const client = new OpenAI({ baseURL: "http://localhost:8787/v1", apiKey: "unused" }); const res = await client.chat.completions.create({ model: "qwen2.5:7b", messages: [{ role: "user", content: "What's the weather in Paris?" }], tools: [/* … */], // ← works, even though the model has no native tools }); // res.choices[0].message.tool_calls → populated, OpenAI shape (streaming too) ``` proxy 会将所有内容转发到 upstream，注入工具契约，解析回工具调用，并通过 SSE 流式传输 —— 传输格式与 OpenAI 完全相同。 **CLI 标志：** `--upstream `（必填）、`--upstream-key`、`--port`、 `--host`、`--api-key`（要求 client 提供 bearer token）、`--base-path`、 `--tag`、`--no-examples`、`--system-injection merge|prepend`、`--cors`（启用通配符 CORS —— 默认关闭）、`--max-body-size `（默认 10 MiB）、 `--log-file `（追加 client 请求、转换后的 upstream 请求和响应的 JSON-lines 调试日志 —— 内容详细；body 会被记录，但 header/token 绝不会被记录）、`--max-log-size `（限制日志文件大小；默认 100 MiB）。主要标志具有对应的环境变量（`UPSTREAM_BASE_URL`、`PORT`、 `PROXY_API_KEY`、`PROXY_LOG_FILE`，…）。使用以下代码将 proxy 嵌入到你自己的服务器中，而不是使用 CLI： ``` import { createProxyServer } from "llm-tool-capability/proxy"; createProxyServer({ upstreamBaseURL: "http://localhost:11434/v1", apiKey: process.env.PROXY_API_KEY, // optional client auth }).listen(8787); ``` **Endpoints：** `POST /v1/chat/completions`（支持工具）和 `GET /health`。基础路径下的所有其他路由 —— `/v1/completions`、`/v1/embeddings`、 `/v1/models` 等 —— 都会**透明地原样传递**给 upstream （不进行工具注入；这些 endpoints 没有工具），因此该 proxy 是一个完整的直接替换方案，而不仅仅是一个 chat endpoint。 ## Layer A —— 直接替换的 client `wrapToolSupport(client)` 返回一个 client，其 `chat.completions.create` 可以直接替换 OpenAI 的方法。像往常一样传递 `tools`；就像往常一样获取返回的 `tool_calls`。当你不传递 `tools` 时，它是完全透明的。 ``` import OpenAI from "openai"; import { wrapToolSupport } from "llm-tool-capability"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1", apiKey: "ollama" }); const client = wrapToolSupport(openai); const res = await client.chat.completions.create({ model: "llama3.1", messages: [{ role: "user", content: "What's the weather in Paris?" }], tools: [ { type: "function", function: { name: "get_weather", description: "Get the current weather for a city.", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, }, }, ], }); const toolCalls = res.choices[0].message.tool_calls; // [{ id: "call_…", type: "function", // function: { name: "get_weather", arguments: '{"city":"Paris"}' } }] ``` 由你自己驱动循环：执行调用，追加一条 `role: "tool"` 消息（带有 `tool_call_id`），然后再次调用。该 wrapper 会自动将这些原生工具角色重写回 prompt 契约中 —— 因此一个正常的 OpenAI 工具调用循环就可以直接工作。 ### 流式传输（Layer A） ``` const stream = await client.chat.completions.create({ model: "llama3.1", messages, tools, stream: true, }); for await (const chunk of stream) { const delta = chunk.choices[0]?.delta; if (delta?.content) process.stdout.write(delta.content); // prose, token by token if (delta?.tool_calls) handleToolCallDelta(delta.tool_calls); // OpenAI chunk deltas } ``` 正文内容会逐个 token 进行流式传输。每个工具调用在其块闭合时**原子地**发出（完整的 `arguments` 包含在一个 delta 中）—— 这可以避免在流式传输过程中出现部分/无效的 JSON。就像使用 OpenAI 一样，通过 `index` 进行累加。 ## Layer B —— agentic runner `createToolRunner` 为你执行循环：请求 → 解析 → 运行 handler → 反馈结果 → 重复，直到模型在不调用工具的情况下做出回答。 ``` import OpenAI from "openai"; import { createToolRunner, defineTool } from "llm-tool-capability"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1", apiKey: "ollama" }); const runner = createToolRunner(openai, { tools: [ defineTool({ name: "get_weather", description: "Get the current weather for a city.", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, handler: async ({ city }) => { const r = await fetch(`https://api.example.com/weather?city=${city}`); return r.json(); }, }), ], maxIterations: 8, }); const result = await runner.run({ model: "llama3.1", messages: [{ role: "user", content: "Is it raining in Paris?" }], }); console.log(result.content); // final answer console.log(result.toolExecutions); // every tool call + result, in order console.log(result.messages); // full transcript ``` 将**原始（raw）** client 传递给 `createToolRunner` —— 它会在内部对其进行包装。 ### 流式传输事件（Layer B） ``` for await (const ev of runner.runStream({ model: "llama3.1", messages })) { switch (ev.type) { case "text": process.stdout.write(ev.delta); break; case "tool_call": console.log("→ calling", ev.toolCall.function.name); break; case "tool_result": console.log("← result", ev.execution.content); break; case "final": console.log("\ndone:", ev.content); break; } } ``` ### 错误反馈未知工具、JSON schema 无效的参数、格式错误的 JSON 以及抛出异常的 handler 都 **不是**致命的：错误会作为工具结果反馈给模型，以便它可以在下一轮中自我纠正。每一次错误都会记录在 `result.toolExecutions[i]` 中，并标记为 `isError: true`。 ## 工作原理 | 关注点 | 行为 | | --- | --- | | 调用格式 | ` ```tool_call ` 块包含 `{"name", "arguments"}`（arguments 是一个 JSON 对象）。标记可配置。 | | 多次调用 | 多个连续的块，或者一个块内的数组。 | | 格式错误的 JSON | 轻度修复（尾随逗号、注释）；回退到原始字符串。 | | 宽松解析 | 如果找不到带标记的块，则接受看起来像调用的 ` ```json `/无标记块（可通过 `lenientFences` 切换）。 | | 历史记录 | 原生的 `assistant.tool_calls` 和 `role: "tool"` 消息会自动扁平化回契约中。 | | 校验 | 通过 `ajv` 根据每个工具的 JSON Schema 对参数进行校验（可通过 `validate` 切换）。 | | `tool_choice` | `auto`（默认）、`required`、`{ function: { name } }` 和 `none` 通过 prompt 指令来执行。 | | 循环安全 | `maxIterations` 上限（默认为 10）；返回 `finishReason: "max_iterations"` 或通过 `throwOnMaxIterations` 抛出异常。 | ## 选项 `wrapToolSupport(client, options)` / `createToolRunner(client, options)` 共享以下选项： - `toolCallTag` / `toolResultTag` —— 标签标记（默认 `tool_call` / `tool_result`）。 - `includeExamples` —— 包含一个 few-shot 示例（默认 `true`；能力较弱的模型会从中受益）。 - `template` —— 完全自定义指令块。 - `systemInjection` —— `"merge"`（追加到现有的 system 消息中，默认）或 `"prepend"`。 - `lenientFences` —— 接受 ` ```json `/无标记的相似内容（默认 `true`）。 - `generateId` —— 自定义工具调用 id 生成器。仅 Runner 选项：`tools`、`maxIterations`、`validate`、`throwOnMaxIterations`、 `onToolCall`、`onToolResult`。 ## 构建块内部组件已导出，可用于自定义 pipeline：`buildToolPrompt`、 `parseToolCalls`、`ToolCallStreamParser`、`ToolValidator`、`flattenMessages`、 `extractFencedBlocks`、`tryParseJson`。 ## 限制 - **工具调用参数以原子方式流式传输**，而不是逐个 token（正文内容确实是逐个 token 流式传输的）。这是为了稳健性而做出的刻意权衡。 - 质量取决于模型遵循指令的能力。较小的模型在使用 `includeExamples: true` 以及简短、清晰的工具列表时效果更好。 - 仅支持 `function` 工具（与 OpenAI 的 function 工具匹配）；较新的 `custom` 工具不在范围内。 - **流式传输仅处理第一个 choice。** `n > 1` 适用于非流式请求（每个 choice 独立解析），但不适用于流式传输 —— 这在实际情况中没问题，因为 OpenAI 禁止同时使用 `n > 1` 和 `tools`。 ## 许可证 MIT

标签：AI代理, AI风险缓解, API代理, DLL 劫持, GNU通用公共许可证, MITM代理, Node.js, Petitpotam, SOC Prime, 人工智能, 大语言模型, 开发工具, 暗色界面, 用户模式Hook绕过, 自动化攻击