goldmembrane/cleaner-code

GitHub: goldmembrane/cleaner-code

一款本地运行的 AI 代码安全扫描 MCP 服务器，用于检测隐形 Unicode、供应链攻击与代码混淆等威胁。

Stars: 0 | Forks: 0

# CodeSafer (cleaner-code) [![npm](https://img.shields.io/badge/mcp-server-blue)](https://modelcontextprotocol.io) [![license: ISC](https://img.shields.io/badge/license-ISC-green)](#license) [![Node](https://img.shields.io/badge/node-%3E%3D18-brightgreen)](https://nodejs.org) **Website:** [codesafer.org](https://codesafer.org/) · **MCP Clients:** Claude Code, Cursor, VS Code + Copilot, Cline ## 为什么使用 CodeSafer? AI 编码助手生成代码很快 — 但谁来检查其中隐藏的安全威胁？最近的供应链攻击表明，恶意代码可能以人类审核员和传统 Linter 常规遗漏的方式隐藏： - **隐形 Unicode 字符** 注入标识符（30 多种变体） - **双向文本/源代码注入** 攻击（显示与执行顺序不同，CVE-2021-42574） - **同形异义字** — 西里尔字符伪装成拉丁字符（CVE-2021-42694） - **Glassworm 风格的 Unicode 隐写** 在空白处隐藏有效载荷 - **规则文件后门** 植入在 `.cursorrules`、`CLAUDE.md` 等 AI 配置文件中 - **依赖项同形替换** 在 `package.json` 中 - **混淆模式** — `eval` 加 base64、反向 Shell、压缩载荷 CodeSafer 在代码运行于你的机器之前扫描所有这些威胁。 ## 它如何工作 CodeSafer 作为一个本地 MCP 服务器运行。你的 AI 客户端（Claude Code、Cursor 等）在审查或生成代码时调用其工具，并将结果以内联方式返回。 **混合检测：** 1. **8 个静态分析扫描器** — 针对已知攻击类别的确定性规则（覆盖其涵盖的模式时零误报）。 2. **CodeBERT 深度分析** — 变换器模型以置信度分数分类代码块为恶意/良性。捕捉静态规则遗漏的混淆或新型模式。 nothing leaves your machine. The AI analysis runs locally against a tokenizer server. ## 功能 | Capability | Details | |---|---| | Invisible character detection | 30+ Unicode variants including Zero-Width Space, Mongolian Vowel Separator | | BiDi / Trojan Source | Full CVE-2021-42574 coverage | | Homoglyph detection | Cyrillic/Greek/Latin confusables (CVE-2021-42694) | | Unicode steganography | Glassworm-style whitespace payloads | | Rules file backdoors | Scans `.cursorrules`, `CLAUDE.md`, `.claude/`, Cursor rules | | Dependency scanning | Typosquatting + suspicious install scripts in `package.json` | | Obfuscation detection | `eval` + base64, reverse shells, packed payloads | | AI deep analysis | CodeBERT transformer classifier with confidence scores | | MCP native | 6 MCP tools, stdio transport | | Local-first | No code uploaded — runs entirely on your machine | ## MCP 工具 CodeSafer 向你的 MCP 客户端公开六个工具： | Tool | Purpose | |---|---| | `scan_file` | Scan a single file for hidden malicious code patterns | | `scan_directory` | Recursively scan a directory across all source files | | `scan_rules_file` | Scan an AI configuration/rules file for prompt injection and Rules File Backdoor attacks | | `check_dependencies` | Check `package.json` for typosquatting, suspicious install scripts, and dependency risks | | `ai_analyze` | Deep AI analysis using the trained CodeBERT model (classifies chunks as malicious/benign with confidence) | | `explain_finding` | Get detailed explanation of a specific threat category, with attack scenarios and remediation | ## 安装 ### 先决条件 - Node.js 18 or later - An MCP-compatible client (Claude Code, Cursor, VS Code + Copilot, Cline) ### 从源代码 ``` git clone https://github.com/goldmembrane/cleaner-code.git cd cleaner-code npm install npm run build ``` ### 配置您的 MCP 客户端 **Claude Code** (`~/.claude.json` or project `.mcp.json`): ``` { "mcpServers": { "codesafer": { "command": "node", "args": ["/absolute/path/to/cleaner-code/dist/index.js"] } } } ``` **Cursor** (`.cursor/mcp.json`): ``` { "mcpServers": { "codesafer": { "command": "node", "args": ["/absolute/path/to/cleaner-code/dist/index.js"] } } } ``` Restart your client, and CodeSafer tools will appear in the tool picker. ## 用法 Once configured, ask your AI client things like: - *"Scan this file for hidden security issues."* - *"Check the dependencies in package.json for typosquatting."* - *"Scan `.cursorrules` for a rules-file backdoor."* - *"Run a deep AI analysis of `src/auth.ts`."* - *"Explain what a Trojan Source attack is and how to fix the finding above."* The client will call the appropriate MCP tool and return findings with severity, line numbers, and remediation guidance. ## 免费层级与计划 CodeSafer is free to use. Static analysis (`scan_file`, `scan_directory`, `scan_rules_file`, `check_dependencies`, `explain_finding`) has no limits. AI deep analysis (`ai_analyze`) includes **10 free runs per session**. Paid plans for higher AI quotas are available at [codesafer.org](https://codesafer.org/). ## 检测类别 CodeSafer detects threats across **9 categories**: 1. **Invisible Unicode characters** — 30+ variants including Zero-Width Space, Zero-Width Joiner 2. **BiDi / Trojan Source attacks** — CVE-2021-42574 3. **Homoglyphs** — Cyrillic/Greek characters masquerading as Latin (CVE-2021-42694) 4. **Unicode steganography** — Glassworm patterns in whitespace 5. **Rules file backdoors** — malicious instructions in `.cursorrules`, `CLAUDE.md`, etc. 6. **Dependency risks** — typosquatting and suspicious install scripts 7. **Obfuscation patterns** — `eval` + base64, packed payloads, reverse shells 8. **Static analysis findings** — 8 deterministic scanners 9. **AI deep analysis** — CodeBERT transformer for novel and obfuscated threats ## 项目结构 ``` cleaner-code/ ├── src/ │ ├── index.ts # MCP server entry point │ ├── api-server.ts # Optional HTTP API server │ ├── types.ts # Scanner interfaces │ ├── utils.ts # File collection, summary formatting │ └── scanner/ │ ├── invisible.ts # Invisible Unicode scanner │ ├── bidi.ts # BiDi / Trojan Source scanner │ ├── homoglyph.ts # Homoglyph scanner │ ├── encoding.ts # Encoding / charset scanner │ ├── obfuscation.ts # Obfuscation pattern scanner │ ├── steganography.ts # Unicode steganography scanner │ ├── rules-backdoor.ts # Rules file backdoor scanner │ ├── dependency.ts # Dependency risk scanner │ └── ai-analyzer.ts # CodeBERT deep analyzer ├── ml/ # ML model assets and tokenizer ├── functions/ # Cloud function deployments ├── deploy/ # Deployment manifests └── web/ # Landing page assets ``` ## 许可证 ISC — see the `LICENSE` file for details. ## 链接 - **Website:** [codesafer.org](https://codesafer.org/) - **Model Context Protocol:** [modelcontextprotocol.io](https://modelcontextprotocol.io/) - **Report issues:** [GitHub Issues](https://github.com/goldmembrane/cleaner-code/issues)

标签：AI代码安全扫描, AI生成代码, CodeBERT, CVE-2021-42574, CVE-2021-42694, Glassworm隐写, GNU通用公共许可证, MCP服务器, MITM代理, Node.js, Trojan Source防御, TypeScript, Unicode安全, 不可见字符检测, 云安全监控, 代码混淆检测, 依赖攻击防护, 反逆向工程, 同形异义字检测, 后端开发, 安全插件, 开发者安全, 本地运行, 深度学习, 规则文件后门, 静态分析