studiomeyer-io/skilldoctor

GitHub: studiomeyer-io/skilldoctor

一款专门针对 AI agent 技能文件（SKILL.md / AGENTS.md / 子 agent）进行格式校验与安全扫描的 linter 工具，旨在防范 agent 技能供应链中的 prompt 注入与数据窃取风险。

Stars: 0 | Forks: 0

# skilldoctor **一个针对 AI-agent 技能与指令文件的 linter 和安全扫描工具。** 可以把它想象成 `eslint`，只不过它是用来检查如今像软件包一样被 agent 安装的 `SKILL.md`、`AGENTS.md` 以及子 agent 文件的。 [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/2dc498b7fb200649.svg)](https://github.com/studiomeyer-io/skilldoctor/actions/workflows/ci.yml) [![npm](https://img.shields.io/npm/v/skilldoctor.svg)](https://www.npmjs.com/package/skilldoctor) [![license](https://img.shields.io/npm/l/skilldoctor.svg)](./LICENSE) ``` npx @studiomeyer-io/skilldoctor check .claude/skills ``` ``` my-skill/SKILL.md [skill] F (0/100) 2:1 ✖ error `name` "My_Skill" is invalid. Use 1-64 lowercase chars … skill/invalid-name 6:1 ✖ error Contains "ignore previous instructions"-style injection. sec/prompt-injection 7:11 ✖ error Outbound network call near secret/env values — possible … sec/data-exfiltration ``` ## 为什么需要这个工具到了 2026 年，“技能”已经爆发式地成为扩展编程 agent 的主流方式。Claude Code 会读取 `SKILL.md` 文件；**`AGENTS.md`** 已经成为了一种跨工具的通用规范（被 Cursor、Codex、Gemini CLI、Copilot 以及[其他数十种 agent](https://agentskills.io/clients) 所采用）；子 agent 则携带着它们自己的 YAML frontmatter。[Agent Skills](https://agentskills.io) 格式现在已经成为了一项开放标准。这些文件**像软件包一样被共享和安装** —— 从 gist 复制、从 repo 克隆、或是从 marketplace 拉取。这在缺乏现成工具支持的情况下造成了两个空缺： 1. **没有 linter。** 没有任何工具来验证 frontmatter、捕捉过于宽泛的 `tools:` 授权，或是标记出会导致技能对 agent 不可见的缺失或模糊的 `description`。 2. **没有安全扫描。** 技能的*主体本身就是一段 agent 会去遵循的 prompt*。一个恶意的或是不严谨的技能可能会在看似有用的工作流中隐藏 prompt 注入文本、形如 `curl … $(env)` 的数据窃取指令，或是“禁用审批提示”的命令。这是一个真实的供应链攻击面 —— NVIDIA 就为此类风险专门发布了研究扫描器（"SkillSpector"），而且更广泛的 prompt 注入 / agent 供应链问题也已经有了详尽的记录（例如 [Simon Willison 谈 prompt 注入](https://simonwillison.net/series/prompt-injection/)，[OWASP LLM Top 10](https://genai.owasp.org/)）。用来**同步/安装**技能的工具已经存在。但针对这些技能文件的 **linter + 安全扫描器**却一直缺席。`skilldoctor` 就是为此而生的工具。 ## 安装 ``` # 一次性 npx @studiomeyer-io/skilldoctor check # 或添加到 project npm install --save-dev @studiomeyer-io/skilldoctor # （安装的命令是 `skilldoctor`） ``` 要求 Node.js ≥ 20。默认仅使用启发式算法 —— **无需 API key**。 ## 用法 ``` skilldoctor check [options] ``` `` 可以是一个文件、一个目录（递归扫描 `SKILL.md`、`AGENTS.md` 和 `agents/*.md`），或者是一个 glob 匹配模式，比如 `"**/SKILL.md"`。 | 选项 | 描述 | | --- | --- | | `--json ` | 输出机器可读的 JSON 报告。 | | `--sarif ` | 输出 SARIF 2.1.0 报告（用于 GitHub code scanning）。 | | `--fix` | 就地应用机械修复。**仅限 frontmatter —— 绝不修改主体。** | | `--fail-on ` | 如果有任何发现达到或超过 `error` \| `warning` \| `info`，则以非零状态退出。默认为 `error`。 | | `--no-color` | 禁用 ANSI 颜色（同时也会读取并遵循 `NO_COLOR` 环境变量）。 | | `--quiet` | 抑制终端报告输出（仍会写入 `--json`/`--sarif`）。 | | `-h, --help` / `-v, --version` | 帮助 / 版本信息。 | **退出码：** `0` 正常（相对于 `--fail-on` 而言），`1` 存在达到或超过阈值的发现，`2` 用法错误 / 未找到文件。 ``` skilldoctor check .claude/skills --fail-on warning skilldoctor check "**/SKILL.md" --sarif results.sarif skilldoctor check AGENTS.md --json report.json skilldoctor check ./skills --fix ``` ## 检查内容 skilldoctor 会严格根据 **基础 [Agent Skills 规范](https://agentskills.io/specification)** 进行验证，**识别**官方文档中记录的 Claude Code 扩展字段（因此它们绝不会被错误标记），并以**宽容**的态度对待真正未知的字段（标记为 `info`，而不是 error）—— 因为客户端可以自由添加它们自己的元数据，而工具不应凭空捏造规则。它支持理解三种文件类型： - **`SKILL.md`** —— Agent Skills / Claude Code 技能（包含 frontmatter `name` + `description`，可选 `allowed-tools` 等）。 - **subagent** —— 位于 `agents/` 目录中的 `.md` 文件（包含 frontmatter `name`、`description`、`tools`、`model` 等）。 - **`AGENTS.md`** —— 纯 Markdown 文件，不需要 frontmatter；仅应用内容和安全检查。 ### Lint 规则 (17) | 规则 | 默认严重程度 | 是否可自动修复 | 检查内容 | | --- | --- | --- | --- | | `skill/missing-name` | error | 否 | 必须包含 `name`。 | | `skill/invalid-name` | error | 否 | `name` 必须是 1-64 个小写字符（`a-z 0-9 -`），不得包含前导/后置/连续连字符。 | | `skill/name-dir-mismatch` | warning | 否 | `name` 必须与父目录匹配（依据规范）。 | | `skill/missing-description` | error | 是 | 必须包含 `description`（这是 agent 决定何时加载该技能的依据）。 | | `skill/empty-description` | error | 是 | `description` 为空。 | | `skill/description-too-short` | warning | 否 | 过短，无法清楚传达内容/触发条件。 | | `skill/description-too-long` | warning | 否 | 超过规范中 1024 字符的限制。 | | `skill/vague-description` | info | 否 | 措辞过于笼统，缺乏触发关键词。 | | `skill/empty-body` | warning | 否 | 指令主体为空。 | | `skill/frontmatter-schema` | error | 否 | YAML 无法解析 / 不是映射 / 字段类型错误。 | | `skill/unknown-field` | info | 否 | 字段不在规范或已知扩展中（宽容处理）。 | | `skill/duplicate-key` | warning | 否 | 某个 frontmatter key 出现了两次（YAML 会保留最后一个）。 | | `skill/trailing-whitespace` | info | 是 | frontmatter 中存在行尾空格。 | | `skill/duplicate-name` | error | 否 | 文件集中有两个文件声明了相同的 `name`。 | | `tools/wildcard-grant` | warning | 否 | 赤裸的 `*` / `all` 授权 —— 违反了最小权限原则。 | | `tools/over-broad-for-readonly` | warning | 否 | 描述声明为只读，却授予了写入/执行/网络工具。 | | `tools/duplicate-tool` | info | 是 | 列出了两次相同的工具。 | ### 安全扫描规则 (8) 对被视作不受信任输入的 **description + body** 执行扫描： | 规则 | 默认严重程度 | 是否可自动修复 | 检测内容 | | --- | --- | --- | --- | | `sec/prompt-injection` | error | 否 | "ignore previous instructions", "disregard your system prompt", 角色覆盖/越狱人设，注入的 "new instructions:"。 | | `sec/disable-safety` | error | 否 | 禁用安全防护/守卫/hook/审批的指令，或 `--dangerously-skip-permissions`。 | | `sec/data-exfiltration` | error | 否 | 在靠近 secrets/env 的位置存在对外请求（curl/POST/fetch 到外部 URL）—— 典型的数据窃取特征。 | | `sec/env-base64` | warning | 否 | 对 `env`/secrets 进行 base64/encode（隐蔽式数据窃取的先兆）。 | | `sec/secret-access` | warning | 否 | 读取 `~/.ssh`、`.aws/credentials`、`.env`、已知的 secret 环境变量等。 | | `sec/suspicious-tool-combo` | warning | 否 | 标榜为“只读/文档”的技能却授予了 **Bash + 网络**权限 —— 构成了可进行数据窃取的致命组合。 | | `sec/destructive-command` | warning | 否 | `rm -rf /`、`curl … \| sh`、`git push --force`、递归 `chmod 777`。 | | `sec/hidden-unicode` | warning | 否 | 零宽 / 双向控制字符，这些字符会对人工审查者隐藏文本（木马源攻击风格）。 | 所有的模式匹配均基于正则表达式/启发式算法，**确保 ReDoS 安全**（已锚定、设置了有限的匹配窗口 —— 不会发生灾难性回溯；我们包含了一个测试用例，会向扫描器抛入 100 KB 的对抗性输入，并断言其能在远低于一秒的时间内完成），并且**绝对不会执行任何操作**。 ## 评分每个文件都会获得一个 `0-100` 的分数以及对应的 `A`–`F` 等级。系统会根据发现的问题扣除分数，权重取决于类别和严重程度 —— **安全类问题的权重远高于 lint 类问题**，因此哪怕仅出现一次严重的安全命中，也无法让该文件保持及格分数。批量评级是基于所有文件分数的平均值，同时会向表现最差的文件倾斜，因此集合中哪怕存在一个危险的技能，也无法通过平均分被掩盖过去。 ## CI：GitHub Action 将 [`examples/skilldoctor.yml`](examples/skilldoctor.yml) 复制到 `.github/workflows/` 目录下，即可在每次 push 时对你的技能进行 lint 检查，并将发现的问题上传至 GitHub code scanning： ``` name: skilldoctor on: [push, pull_request] permissions: contents: read security-events: write # required to upload SARIF jobs: lint-skills: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: 20 } - run: npx @studiomeyer-io/skilldoctor check ".claude/skills" "**/AGENTS.md" --sarif skilldoctor.sarif --fail-on warning - if: always() uses: github/codeql-action/upload-sarif@v3 with: sarif_file: skilldoctor.sarif ``` ## 库 API skilldoctor 同时提供了 ESM + CJS 的双模块支持，并附带 TypeScript 类型。 ``` import { analyzeContent, fixFile, parseFile, sarifString } from "skilldoctor"; const report = analyzeContent("my-skill/SKILL.md", contents); console.log(report.grade, report.score); for (const f of report.findings) { console.log(`${f.line}:${f.column} ${f.severity} ${f.ruleId} ${f.message}`); } // mechanical fixes (frontmatter only) const fixed = fixFile(parseFile("my-skill/SKILL.md", contents)); if (fixed.changed) writeFileSync("my-skill/SKILL.md", fixed.output); ``` 核心导出项包括：`analyzeContent`、`analyzeFiles`、`analyzePaths`、`fixFile`、`parseFile`、`discoverFiles`、`renderTerminal`、`toJsonReport`/`jsonString`、`toSarif`/`sarifString`、`RULES` 以及所有相关的 TypeScript 类型。 ## 资料来源（格式经过验证，而非凭空捏造） skilldoctor 的规则是基于当前实际的规范制定的（在开发时进行了核对，而非凭记忆）： - **Agent Skills 标准** —— [agentskills.io/specification](https://agentskills.io/specification)：`name`（≤64，符合 `^[a-z0-9]+(-[a-z0-9]+)*$`，必须与目录匹配）、`description`（1-1024，必填）、`license`、`compatibility`（≤500）、`metadata`、`allowed-tools`（以空格分隔，实验性功能）。 - **Claude Code 技能** —— [code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills)：支持识别扩展字段（`when_to_use`、`disable-model-invocation`、`user-invocable`、`disallowed-tools`、`model`、`effort`、`context`、`agent`、`paths`、`shell` 等）；在技能列表中，合并后的 `description`+`when_to_use` 会在达到 1,536 字符时被截断。 - **Claude Code 子 agent** —— [code.claude.com/docs/en/sub-agents](https://code.claude.com/docs/en/sub-agents)：`name`（必填，小写+连字符）+ `description`（必填）、`tools`（以逗号分隔或列表形式，省略则继承）、`model`（`sonnet`/`opus`/`haiku`/`fable`/full-id/`inherit`）。 - **AGENTS.md** —— [agents.md](https://agents.md)：“只是标准的 Markdown ……没有必填字段” —— 因此 skilldoctor 只会对这些文件进行内容/安全检查。当某个字段的含义不确定时，skilldoctor **会选择宽容地进行警告，而不是凭空制定硬性规则**。 ## StudioMeyer MCP 工具包的一部分这是一个专为构建和运维 MCP 服务器及 agent 打造的、专注于特定领域的生产级小型工具家族 —— 您可以根据需要自由组合使用： - [mcp-armor](https://github.com/studiomeyer-io/mcp-armor) —— 运行时防御 sidecar：扫描工具调用、验证已签名的 manifest、拦截已知的恶意 CVE - [mcp-gauntlet](https://github.com/studiomeyer-io/mcp-gauntlet) —— 部署前的 `mcp-fuzz`（具备 schema 感知能力的 fuzzer）+ `mcp-storm`（负载测试工具） - [mcp-otel](https://github.com/studiomeyer-io/mcp-otel) —— W3C Trace Context → OpenTelemetry 桥接器 - [m-cache-kit](https://github.com/studiomeyer-io/mcp-cache-kit) —— 防泄漏的 SEP-2549 缓存（`ttlMs` + `cacheScope`） - **skilldoctor** *(本项目)* —— 针对 agent 技能文件的 linter + 安全扫描器 ## 许可证 [MIT](./LICENSE) © 2026 StudioMeyer。有关安全策略和威胁模型的边界说明，请参阅 [SECURITY.md](./SECURITY.md)。

标签：Apache Flink, MITM代理, SARIF, 代码规范检查, 提示词注入检测, 暗色界面, 自动化攻击, 静态代码扫描