cloakllm/CloakLLM-JS

GitHub: cloakllm/CloakLLM-JS

一款面向 Node.js 的 PII 隐匿中间件，在 LLM API 调用前自动检测替换敏感信息并生成防篡改审计日志。

Stars: 0 | Forks: 0

# CloakLLM **用于 LLM API 调用的 PII 隐匿和防篡改审计日志。** CloakLLM 拦截您的 LLM API 调用，在数据到达提供商之前检测并隐匿 PII（个人身份信息），并将每个事件记录到旨在符合 EU AI Act 第 12 条合规要求的防篡改审计链中。 ## 安装 ``` npm install cloakllm ``` ## 快速开始 ### 使用 OpenAI SDK (一行代码) ``` const cloakllm = require('cloakllm'); const OpenAI = require('openai'); const client = new OpenAI(); cloakllm.enable(client); // That's it. All calls are now cloaked. const response = await client.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit' }] }); // Provider never saw "sarah.j@techcorp.io" // Response has the real email restored automatically ``` ### 使用 Vercel AI SDK ``` const { createCloakLLMMiddleware } = require('cloakllm'); const { generateText, wrapLanguageModel } = require('ai'); const { openai } = require('@ai-sdk/openai'); const middleware = createCloakLLMMiddleware(); const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware }); const { text } = await generateText({ model, prompt: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit' }); // Provider never saw "sarah.j@techcorp.io" // Response has the real email restored automatically ``` 适用于任何 AI SDK 提供商（OpenAI、Anthropic、Google、Mistral 等），并支持 `generateText` 和 `streamText`。 ### 独立使用 ``` const { Shield } = require('cloakllm'); const shield = new Shield(); const [sanitized, tokenMap] = shield.sanitize( 'Send report to john@acme.com, SSN 123-45-6789' ); // sanitized: "Send report to [EMAIL_0], SSN [SSN_0]" // ... send sanitized text to any LLM ... const restored = shield.desanitize(llmResponse, tokenMap); // Original values restored ``` ### 修订模式 (不可逆) ``` const { Shield, ShieldConfig } = require('cloakllm'); const shield = new Shield(new ShieldConfig({ mode: 'redact' })); const [redacted] = shield.sanitize('Email john@acme.com about Sarah Johnson'); // redacted: "Email [EMAIL_REDACTED] about [PERSON_REDACTED]" // No token map stored — cannot be reversed ``` ### 实体详情 (合规元数据) ``` const { Shield } = require('cloakllm'); const shield = new Shield(); const [sanitized, tokenMap] = shield.sanitize('Email john@acme.com, SSN 123-45-6789'); // Per-entity metadata (no original text — PII-safe) console.log(tokenMap.entityDetails); // [ // { category: 'EMAIL', start: 6, end: 19, length: 13, confidence: 0.95, source: 'regex', token: '[EMAIL_0]' }, // { category: 'SSN', start: 25, end: 36, length: 11, confidence: 0.95, source: 'regex', token: '[SSN_0]' } // ] // Full report for dashboards console.log(tokenMap.toReport()); ``` ## 检测内容 | 类别 | 示例 | 方法 | |----------|----------|--------| | `EMAIL` | `john@acme.com` | Regex | | `SSN` | `123-45-6789` | Regex | | `CREDIT_CARD` | `4111111111111111` | Regex | | `PHONE` | `+1-555-0142` | Regex | | `IP_ADDRESS` | `192.168.1.1` | Regex | | `API_KEY` | `sk_live_abc123...` | Regex | | `AWS_KEY` | `AKIAIOSFODNN7EXAMPLE` | Regex | | `JWT` | `eyJhbG...` | Regex | | `IBAN` | `DE89370400440532013000` | Regex | | `PERSON` | John Smith | LLM (本地) | | `ORG` | Acme Corp, Google | LLM (本地) | | `GPE` | New York, Israel | LLM (本地) | | `ADDRESS` | 742 Evergreen Terrace | LLM (本地) | | `DATE_OF_BIRTH` | 1990-01-15 | LLM (本地) | | `MEDICAL` | diabetes mellitus | LLM (本地) | | `FINANCIAL` | account 4521-XXX | LLM (本地) | | `NATIONAL_ID` | TZ 12345678 | LLM (本地) | | `BIOMETRIC` | fingerprint hash | LLM (本地) | | `USERNAME` | @johndoe42 | LLM (本地) | | `PASSWORD` | P@ssw0rd123 | LLM (本地) | | `VEHICLE` | plate ABC-1234 | LLM (本地) | ## 工作原理 ``` Your app: "Email sarah.j@techcorp.io about Project Falcon" Provider sees: "Email [EMAIL_0] about Project Falcon" You receive: Original email restored in the response ``` 1. **检测 (Detect)** — 正则表达式模式查找结构化 PII（电子邮件、SSN、信用卡等） 2. **隐匿 (Cloak)** — 替换为确定性 token：`[EMAIL_0]`、`[SSN_0]` 3. **日志 (Log)** — 写入哈希链审计跟踪（每个条目包含前一个条目的 SHA-256 哈希值） 4. **还原 (Uncloak)** — 在 LLM 响应中恢复原始值 ## 防篡改审计链每个事件都会通过哈希链记录到 JSONL 文件中： ``` { "seq": 42, "event_type": "sanitize", "entity_count": 3, "categories": {"EMAIL": 1, "SSN": 1, "PHONE": 1}, "prompt_hash": "sha256:9f86d0...", "entity_details": [ {"category": "EMAIL", "start": 0, "end": 13, "length": 13, "confidence": 0.95, "source": "regex", "token": "[EMAIL_0]"}, {"category": "SSN", "start": 15, "end": 26, "length": 11, "confidence": 0.95, "source": "regex", "token": "[SSN_0]"}, {"category": "PHONE", "start": 28, "end": 40, "length": 12, "confidence": 0.95, "source": "regex", "token": "[PHONE_0]"} ], "prev_hash": "sha256:7c4d2e...", "entry_hash": "sha256:b5e8f3..." } ``` 修改任何条目都会导致后续所有哈希值失效。使用以下命令验证： ``` npx cloakllm verify ./cloakllm_audit/ ``` ## CLI ``` # 扫描文本中的 PII npx cloakllm scan "Email john@test.com, SSN 123-45-6789" # 验证 audit chain 完整性 npx cloakllm verify ./cloakllm_audit/ # 显示 audit 统计信息 npx cloakllm stats ./cloakllm_audit/ ``` ## 配置 ``` const { Shield, ShieldConfig } = require('cloakllm'); const shield = new Shield(new ShieldConfig({ detectEmails: true, // default: true detectPhones: true, detectSsns: true, detectCreditCards: true, detectApiKeys: true, detectIpAddresses: true, detectIban: true, logDir: './my-audit-logs', // default: ./cloakllm_audit auditEnabled: true, // default: true skipModels: ['ollama/'], // skip local models customPatterns: [ { name: 'EMPLOYEE_ID', pattern: 'EMP-\\d{6}' } ], // LLM Detection (opt-in, requires Ollama) llmDetection: true, // Enable LLM-based detection llmModel: 'llama3.2', // Ollama model llmOllamaUrl: 'http://localhost:11434', // Ollama endpoint llmTimeout: 10000, // Timeout in ms llmConfidence: 0.85, // Confidence score })); ``` ## EU AI Act 合规性 EU AI Act 第 12 条要求 AI 系统具备防篡改审计日志。强制执行将于 **2026 年 8 月 2 日**开始。CloakLLM 提供： - **哈希链日志** — 加密链接，任何修改都会破坏链条 - **O(n) 验证** — `cloakllm verify` 审计整个链条 - **日志中无 PII** — 仅记录哈希值和 token 计数（从不存储原始值） - **事件级细节** — 记录每个 sanitize/desanitize 事件 ## 路线图 - [x] 通过本地 LLM 进行基于 NER 的检测（姓名、组织、位置） - [x] 本地 LLM 检测（可选，通过 Ollama） - [x] 流式响应支持 - [x] Vercel AI SDK 中间件 - [x] 修订/清洗模式 - [x] 字段级 PII 元数据 - [ ] LangChain.js 集成 - [ ] OpenTelemetry span 发射 - [ ] RFC 3161 可信时间戳 ## 许可证 MIT — 请参阅 [LICENSE](LICENSE)。 ## 另请参阅 - **[CloakLLM Python](https://github.com/cloakllm/CloakLLM-PY)** — 带有 spaCy NER + LiteLLM 集成的 Python 版本

标签：AI风险缓解, API拦截, EU AI Act, MITM代理, Node.js SDK, OpenAI, PII脱敏, SDK集成, Vercel AI SDK, 个人身份信息, 中间件, 人工智能安全, 企业合规, 内存规避, 合规性, 大语言模型安全, 属性图, 敏感信息屏蔽, 数据可视化, 数据脱敏, 日志防篡改, 机密管理, 网络安全, 网络安全, 自定义脚本, 隐私保护, 隐私保护