cloakllm/CloakLLM-JS
GitHub: cloakllm/CloakLLM-JS
一款面向 Node.js 的 PII 隐匿中间件,在 LLM API 调用前自动检测替换敏感信息并生成防篡改审计日志。
Stars: 0 | Forks: 0
# CloakLLM
**用于 LLM API 调用的 PII 隐匿和防篡改审计日志。**
CloakLLM 拦截您的 LLM API 调用,在数据到达提供商之前检测并隐匿 PII(个人身份信息),并将每个事件记录到旨在符合 EU AI Act 第 12 条合规要求的防篡改审计链中。
## 安装
```
npm install cloakllm
```
## 快速开始
### 使用 OpenAI SDK (一行代码)
```
const cloakllm = require('cloakllm');
const OpenAI = require('openai');
const client = new OpenAI();
cloakllm.enable(client); // That's it. All calls are now cloaked.
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit'
}]
});
// Provider never saw "sarah.j@techcorp.io"
// Response has the real email restored automatically
```
### 使用 Vercel AI SDK
```
const { createCloakLLMMiddleware } = require('cloakllm');
const { generateText, wrapLanguageModel } = require('ai');
const { openai } = require('@ai-sdk/openai');
const middleware = createCloakLLMMiddleware();
const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware });
const { text } = await generateText({
model,
prompt: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit'
});
// Provider never saw "sarah.j@techcorp.io"
// Response has the real email restored automatically
```
适用于任何 AI SDK 提供商(OpenAI、Anthropic、Google、Mistral 等),并支持 `generateText` 和 `streamText`。
### 独立使用
```
const { Shield } = require('cloakllm');
const shield = new Shield();
const [sanitized, tokenMap] = shield.sanitize(
'Send report to john@acme.com, SSN 123-45-6789'
);
// sanitized: "Send report to [EMAIL_0], SSN [SSN_0]"
// ... send sanitized text to any LLM ...
const restored = shield.desanitize(llmResponse, tokenMap);
// Original values restored
```
### 修订模式 (不可逆)
```
const { Shield, ShieldConfig } = require('cloakllm');
const shield = new Shield(new ShieldConfig({ mode: 'redact' }));
const [redacted] = shield.sanitize('Email john@acme.com about Sarah Johnson');
// redacted: "Email [EMAIL_REDACTED] about [PERSON_REDACTED]"
// No token map stored — cannot be reversed
```
### 实体详情 (合规元数据)
```
const { Shield } = require('cloakllm');
const shield = new Shield();
const [sanitized, tokenMap] = shield.sanitize('Email john@acme.com, SSN 123-45-6789');
// Per-entity metadata (no original text — PII-safe)
console.log(tokenMap.entityDetails);
// [
// { category: 'EMAIL', start: 6, end: 19, length: 13, confidence: 0.95, source: 'regex', token: '[EMAIL_0]' },
// { category: 'SSN', start: 25, end: 36, length: 11, confidence: 0.95, source: 'regex', token: '[SSN_0]' }
// ]
// Full report for dashboards
console.log(tokenMap.toReport());
```
## 检测内容
| 类别 | 示例 | 方法 |
|----------|----------|--------|
| `EMAIL` | `john@acme.com` | Regex |
| `SSN` | `123-45-6789` | Regex |
| `CREDIT_CARD` | `4111111111111111` | Regex |
| `PHONE` | `+1-555-0142` | Regex |
| `IP_ADDRESS` | `192.168.1.1` | Regex |
| `API_KEY` | `sk_live_abc123...` | Regex |
| `AWS_KEY` | `AKIAIOSFODNN7EXAMPLE` | Regex |
| `JWT` | `eyJhbG...` | Regex |
| `IBAN` | `DE89370400440532013000` | Regex |
| `PERSON` | John Smith | LLM (本地) |
| `ORG` | Acme Corp, Google | LLM (本地) |
| `GPE` | New York, Israel | LLM (本地) |
| `ADDRESS` | 742 Evergreen Terrace | LLM (本地) |
| `DATE_OF_BIRTH` | 1990-01-15 | LLM (本地) |
| `MEDICAL` | diabetes mellitus | LLM (本地) |
| `FINANCIAL` | account 4521-XXX | LLM (本地) |
| `NATIONAL_ID` | TZ 12345678 | LLM (本地) |
| `BIOMETRIC` | fingerprint hash | LLM (本地) |
| `USERNAME` | @johndoe42 | LLM (本地) |
| `PASSWORD` | P@ssw0rd123 | LLM (本地) |
| `VEHICLE` | plate ABC-1234 | LLM (本地) |
## 工作原理
```
Your app: "Email sarah.j@techcorp.io about Project Falcon"
Provider sees: "Email [EMAIL_0] about Project Falcon"
You receive: Original email restored in the response
```
1. **检测 (Detect)** — 正则表达式 模式查找结构化 PII(电子邮件、SSN、信用卡等)
2. **隐匿 (Cloak)** — 替换为确定性 token:`[EMAIL_0]`、`[SSN_0]`
3. **日志 (Log)** — 写入哈希链审计跟踪(每个条目包含前一个条目的 SHA-256 哈希值)
4. **还原 (Uncloak)** — 在 LLM 响应中恢复原始值
## 防篡改审计链
每个事件都会通过哈希链记录到 JSONL 文件中:
```
{
"seq": 42,
"event_type": "sanitize",
"entity_count": 3,
"categories": {"EMAIL": 1, "SSN": 1, "PHONE": 1},
"prompt_hash": "sha256:9f86d0...",
"entity_details": [
{"category": "EMAIL", "start": 0, "end": 13, "length": 13, "confidence": 0.95, "source": "regex", "token": "[EMAIL_0]"},
{"category": "SSN", "start": 15, "end": 26, "length": 11, "confidence": 0.95, "source": "regex", "token": "[SSN_0]"},
{"category": "PHONE", "start": 28, "end": 40, "length": 12, "confidence": 0.95, "source": "regex", "token": "[PHONE_0]"}
],
"prev_hash": "sha256:7c4d2e...",
"entry_hash": "sha256:b5e8f3..."
}
```
修改任何条目都会导致后续所有哈希值失效。使用以下命令验证:
```
npx cloakllm verify ./cloakllm_audit/
```
## CLI
```
# 扫描文本中的 PII
npx cloakllm scan "Email john@test.com, SSN 123-45-6789"
# 验证 audit chain 完整性
npx cloakllm verify ./cloakllm_audit/
# 显示 audit 统计信息
npx cloakllm stats ./cloakllm_audit/
```
## 配置
```
const { Shield, ShieldConfig } = require('cloakllm');
const shield = new Shield(new ShieldConfig({
detectEmails: true, // default: true
detectPhones: true,
detectSsns: true,
detectCreditCards: true,
detectApiKeys: true,
detectIpAddresses: true,
detectIban: true,
logDir: './my-audit-logs', // default: ./cloakllm_audit
auditEnabled: true, // default: true
skipModels: ['ollama/'], // skip local models
customPatterns: [
{ name: 'EMPLOYEE_ID', pattern: 'EMP-\\d{6}' }
],
// LLM Detection (opt-in, requires Ollama)
llmDetection: true, // Enable LLM-based detection
llmModel: 'llama3.2', // Ollama model
llmOllamaUrl: 'http://localhost:11434', // Ollama endpoint
llmTimeout: 10000, // Timeout in ms
llmConfidence: 0.85, // Confidence score
}));
```
## EU AI Act 合规性
EU AI Act 第 12 条要求 AI 系统具备防篡改审计日志。强制执行将于 **2026 年 8 月 2 日**开始。CloakLLM 提供:
- **哈希链日志** — 加密链接,任何修改都会破坏链条
- **O(n) 验证** — `cloakllm verify` 审计整个链条
- **日志中无 PII** — 仅记录哈希值和 token 计数(从不存储原始值)
- **事件级细节** — 记录每个 sanitize/desanitize 事件
## 路线图
- [x] 通过本地 LLM 进行基于 NER 的检测(姓名、组织、位置)
- [x] 本地 LLM 检测(可选,通过 Ollama)
- [x] 流式响应支持
- [x] Vercel AI SDK 中间件
- [x] 修订/清洗模式
- [x] 字段级 PII 元数据
- [ ] LangChain.js 集成
- [ ] OpenTelemetry span 发射
- [ ] RFC 3161 可信时间戳
## 许可证
MIT — 请参阅 [LICENSE](LICENSE)。
## 另请参阅
- **[CloakLLM Python](https://github.com/cloakllm/CloakLLM-PY)** — 带有 spaCy NER + LiteLLM 集成的 Python 版本
标签:AI风险缓解, API拦截, EU AI Act, MITM代理, Node.js SDK, OpenAI, PII脱敏, SDK集成, Vercel AI SDK, 个人身份信息, 中间件, 人工智能安全, 企业合规, 内存规避, 合规性, 大语言模型安全, 属性图, 敏感信息屏蔽, 数据可视化, 数据脱敏, 日志防篡改, 机密管理, 网络安全, 网络安全, 自定义脚本, 隐私保护, 隐私保护