goldmembrane/cleaner-code
GitHub: goldmembrane/cleaner-code
一款本地运行的 AI 代码安全扫描 MCP 服务器,用于检测隐形 Unicode、供应链攻击与代码混淆等威胁。
Stars: 0 | Forks: 0
# CodeSafer (cleaner-code)
[](https://modelcontextprotocol.io)
[](#license)
[](https://nodejs.org)
**Website:** [codesafer.org](https://codesafer.org/) · **MCP Clients:** Claude Code, Cursor, VS Code + Copilot, Cline
## 为什么使用 CodeSafer?
AI 编码助手生成代码很快 — 但谁来检查其中隐藏的安全威胁?
最近的供应链攻击表明,恶意代码可能以人类审核员和传统 Linter 常规遗漏的方式隐藏:
- **隐形 Unicode 字符** 注入标识符(30 多种变体)
- **双向文本/源代码注入** 攻击(显示与执行顺序不同,CVE-2021-42574)
- **同形异义字** — 西里尔字符伪装成拉丁字符(CVE-2021-42694)
- **Glassworm 风格的 Unicode 隐写** 在空白处隐藏有效载荷
- **规则文件后门** 植入在 `.cursorrules`、`CLAUDE.md` 等 AI 配置文件中
- **依赖项同形替换** 在 `package.json` 中
- **混淆模式** — `eval` 加 base64、反向 Shell、压缩载荷
CodeSafer 在代码运行于你的机器之前扫描所有这些威胁。
## 它如何工作
CodeSafer 作为一个本地 MCP 服务器运行。你的 AI 客户端(Claude Code、Cursor 等)在审查或生成代码时调用其工具,并将结果以内联方式返回。
**混合检测:**
1. **8 个静态分析扫描器** — 针对已知攻击类别的确定性规则(覆盖其涵盖的模式时零误报)。
2. **CodeBERT 深度分析** — 变换器模型以置信度分数分类代码块为恶意/良性。捕捉静态规则遗漏的混淆或新型模式。
nothing leaves your machine. The AI analysis runs locally against a tokenizer server.
## 功能
| Capability | Details |
|---|---|
| Invisible character detection | 30+ Unicode variants including Zero-Width Space, Mongolian Vowel Separator |
| BiDi / Trojan Source | Full CVE-2021-42574 coverage |
| Homoglyph detection | Cyrillic/Greek/Latin confusables (CVE-2021-42694) |
| Unicode steganography | Glassworm-style whitespace payloads |
| Rules file backdoors | Scans `.cursorrules`, `CLAUDE.md`, `.claude/`, Cursor rules |
| Dependency scanning | Typosquatting + suspicious install scripts in `package.json` |
| Obfuscation detection | `eval` + base64, reverse shells, packed payloads |
| AI deep analysis | CodeBERT transformer classifier with confidence scores |
| MCP native | 6 MCP tools, stdio transport |
| Local-first | No code uploaded — runs entirely on your machine |
## MCP 工具
CodeSafer 向你的 MCP 客户端公开六个工具:
| Tool | Purpose |
|---|---|
| `scan_file` | Scan a single file for hidden malicious code patterns |
| `scan_directory` | Recursively scan a directory across all source files |
| `scan_rules_file` | Scan an AI configuration/rules file for prompt injection and Rules File Backdoor attacks |
| `check_dependencies` | Check `package.json` for typosquatting, suspicious install scripts, and dependency risks |
| `ai_analyze` | Deep AI analysis using the trained CodeBERT model (classifies chunks as malicious/benign with confidence) |
| `explain_finding` | Get detailed explanation of a specific threat category, with attack scenarios and remediation |
## 安装
### 先决条件
- Node.js 18 or later
- An MCP-compatible client (Claude Code, Cursor, VS Code + Copilot, Cline)
### 从源代码
```
git clone https://github.com/goldmembrane/cleaner-code.git
cd cleaner-code
npm install
npm run build
```
### 配置您的 MCP 客户端
**Claude Code** (`~/.claude.json` or project `.mcp.json`):
```
{
"mcpServers": {
"codesafer": {
"command": "node",
"args": ["/absolute/path/to/cleaner-code/dist/index.js"]
}
}
}
```
**Cursor** (`.cursor/mcp.json`):
```
{
"mcpServers": {
"codesafer": {
"command": "node",
"args": ["/absolute/path/to/cleaner-code/dist/index.js"]
}
}
}
```
Restart your client, and CodeSafer tools will appear in the tool picker.
## 用法
Once configured, ask your AI client things like:
- *"Scan this file for hidden security issues."*
- *"Check the dependencies in package.json for typosquatting."*
- *"Scan `.cursorrules` for a rules-file backdoor."*
- *"Run a deep AI analysis of `src/auth.ts`."*
- *"Explain what a Trojan Source attack is and how to fix the finding above."*
The client will call the appropriate MCP tool and return findings with severity, line numbers, and remediation guidance.
## 免费层级与计划
CodeSafer is free to use. Static analysis (`scan_file`, `scan_directory`, `scan_rules_file`, `check_dependencies`, `explain_finding`) has no limits.
AI deep analysis (`ai_analyze`) includes **10 free runs per session**. Paid plans for higher AI quotas are available at [codesafer.org](https://codesafer.org/).
## 检测类别
CodeSafer detects threats across **9 categories**:
1. **Invisible Unicode characters** — 30+ variants including Zero-Width Space, Zero-Width Joiner
2. **BiDi / Trojan Source attacks** — CVE-2021-42574
3. **Homoglyphs** — Cyrillic/Greek characters masquerading as Latin (CVE-2021-42694)
4. **Unicode steganography** — Glassworm patterns in whitespace
5. **Rules file backdoors** — malicious instructions in `.cursorrules`, `CLAUDE.md`, etc.
6. **Dependency risks** — typosquatting and suspicious install scripts
7. **Obfuscation patterns** — `eval` + base64, packed payloads, reverse shells
8. **Static analysis findings** — 8 deterministic scanners
9. **AI deep analysis** — CodeBERT transformer for novel and obfuscated threats
## 项目结构
```
cleaner-code/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── api-server.ts # Optional HTTP API server
│ ├── types.ts # Scanner interfaces
│ ├── utils.ts # File collection, summary formatting
│ └── scanner/
│ ├── invisible.ts # Invisible Unicode scanner
│ ├── bidi.ts # BiDi / Trojan Source scanner
│ ├── homoglyph.ts # Homoglyph scanner
│ ├── encoding.ts # Encoding / charset scanner
│ ├── obfuscation.ts # Obfuscation pattern scanner
│ ├── steganography.ts # Unicode steganography scanner
│ ├── rules-backdoor.ts # Rules file backdoor scanner
│ ├── dependency.ts # Dependency risk scanner
│ └── ai-analyzer.ts # CodeBERT deep analyzer
├── ml/ # ML model assets and tokenizer
├── functions/ # Cloud function deployments
├── deploy/ # Deployment manifests
└── web/ # Landing page assets
```
## 许可证
ISC — see the `LICENSE` file for details.
## 链接
- **Website:** [codesafer.org](https://codesafer.org/)
- **Model Context Protocol:** [modelcontextprotocol.io](https://modelcontextprotocol.io/)
- **Report issues:** [GitHub Issues](https://github.com/goldmembrane/cleaner-code/issues)
标签:AI代码安全扫描, AI生成代码, CodeBERT, CVE-2021-42574, CVE-2021-42694, Glassworm隐写, GNU通用公共许可证, MCP服务器, MITM代理, Node.js, Trojan Source防御, TypeScript, Unicode安全, 不可见字符检测, 云安全监控, 代码混淆检测, 依赖攻击防护, 反逆向工程, 同形异义字检测, 后端开发, 安全插件, 开发者安全, 本地运行, 深度学习, 规则文件后门, 静态分析