cloakllm/CloakLLM-JS

GitHub: cloakllm/CloakLLM-JS

一款面向 Node.js 的 PII 隐匿中间件,在 LLM API 调用前自动检测替换敏感信息并生成防篡改审计日志。

Stars: 0 | Forks: 0

# CloakLLM **用于 LLM API 调用的 PII 隐匿和防篡改审计日志。** CloakLLM 拦截您的 LLM API 调用,在数据到达提供商之前检测并隐匿 PII(个人身份信息),并将每个事件记录到旨在符合 EU AI Act 第 12 条合规要求的防篡改审计链中。 ## 安装 ``` npm install cloakllm ``` ## 快速开始 ### 使用 OpenAI SDK (一行代码) ``` const cloakllm = require('cloakllm'); const OpenAI = require('openai'); const client = new OpenAI(); cloakllm.enable(client); // That's it. All calls are now cloaked. const response = await client.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit' }] }); // Provider never saw "sarah.j@techcorp.io" // Response has the real email restored automatically ``` ### 使用 Vercel AI SDK ``` const { createCloakLLMMiddleware } = require('cloakllm'); const { generateText, wrapLanguageModel } = require('ai'); const { openai } = require('@ai-sdk/openai'); const middleware = createCloakLLMMiddleware(); const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware }); const { text } = await generateText({ model, prompt: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit' }); // Provider never saw "sarah.j@techcorp.io" // Response has the real email restored automatically ``` 适用于任何 AI SDK 提供商(OpenAI、Anthropic、Google、Mistral 等),并支持 `generateText` 和 `streamText`。 ### 独立使用 ``` const { Shield } = require('cloakllm'); const shield = new Shield(); const [sanitized, tokenMap] = shield.sanitize( 'Send report to john@acme.com, SSN 123-45-6789' ); // sanitized: "Send report to [EMAIL_0], SSN [SSN_0]" // ... send sanitized text to any LLM ... const restored = shield.desanitize(llmResponse, tokenMap); // Original values restored ``` ### 修订模式 (不可逆) ``` const { Shield, ShieldConfig } = require('cloakllm'); const shield = new Shield(new ShieldConfig({ mode: 'redact' })); const [redacted] = shield.sanitize('Email john@acme.com about Sarah Johnson'); // redacted: "Email [EMAIL_REDACTED] about [PERSON_REDACTED]" // No token map stored — cannot be reversed ``` ### 实体详情 (合规元数据) ``` const { Shield } = require('cloakllm'); const shield = new Shield(); const [sanitized, tokenMap] = shield.sanitize('Email john@acme.com, SSN 123-45-6789'); // Per-entity metadata (no original text — PII-safe) console.log(tokenMap.entityDetails); // [ // { category: 'EMAIL', start: 6, end: 19, length: 13, confidence: 0.95, source: 'regex', token: '[EMAIL_0]' }, // { category: 'SSN', start: 25, end: 36, length: 11, confidence: 0.95, source: 'regex', token: '[SSN_0]' } // ] // Full report for dashboards console.log(tokenMap.toReport()); ``` ## 检测内容 | 类别 | 示例 | 方法 | |----------|----------|--------| | `EMAIL` | `john@acme.com` | Regex | | `SSN` | `123-45-6789` | Regex | | `CREDIT_CARD` | `4111111111111111` | Regex | | `PHONE` | `+1-555-0142` | Regex | | `IP_ADDRESS` | `192.168.1.1` | Regex | | `API_KEY` | `sk_live_abc123...` | Regex | | `AWS_KEY` | `AKIAIOSFODNN7EXAMPLE` | Regex | | `JWT` | `eyJhbG...` | Regex | | `IBAN` | `DE89370400440532013000` | Regex | | `PERSON` | John Smith | LLM (本地) | | `ORG` | Acme Corp, Google | LLM (本地) | | `GPE` | New York, Israel | LLM (本地) | | `ADDRESS` | 742 Evergreen Terrace | LLM (本地) | | `DATE_OF_BIRTH` | 1990-01-15 | LLM (本地) | | `MEDICAL` | diabetes mellitus | LLM (本地) | | `FINANCIAL` | account 4521-XXX | LLM (本地) | | `NATIONAL_ID` | TZ 12345678 | LLM (本地) | | `BIOMETRIC` | fingerprint hash | LLM (本地) | | `USERNAME` | @johndoe42 | LLM (本地) | | `PASSWORD` | P@ssw0rd123 | LLM (本地) | | `VEHICLE` | plate ABC-1234 | LLM (本地) | ## 工作原理 ``` Your app: "Email sarah.j@techcorp.io about Project Falcon" Provider sees: "Email [EMAIL_0] about Project Falcon" You receive: Original email restored in the response ``` 1. **检测 (Detect)** — 正则表达式 模式查找结构化 PII(电子邮件、SSN、信用卡等) 2. **隐匿 (Cloak)** — 替换为确定性 token:`[EMAIL_0]`、`[SSN_0]` 3. **日志 (Log)** — 写入哈希链审计跟踪(每个条目包含前一个条目的 SHA-256 哈希值) 4. **还原 (Uncloak)** — 在 LLM 响应中恢复原始值 ## 防篡改审计链 每个事件都会通过哈希链记录到 JSONL 文件中: ``` { "seq": 42, "event_type": "sanitize", "entity_count": 3, "categories": {"EMAIL": 1, "SSN": 1, "PHONE": 1}, "prompt_hash": "sha256:9f86d0...", "entity_details": [ {"category": "EMAIL", "start": 0, "end": 13, "length": 13, "confidence": 0.95, "source": "regex", "token": "[EMAIL_0]"}, {"category": "SSN", "start": 15, "end": 26, "length": 11, "confidence": 0.95, "source": "regex", "token": "[SSN_0]"}, {"category": "PHONE", "start": 28, "end": 40, "length": 12, "confidence": 0.95, "source": "regex", "token": "[PHONE_0]"} ], "prev_hash": "sha256:7c4d2e...", "entry_hash": "sha256:b5e8f3..." } ``` 修改任何条目都会导致后续所有哈希值失效。使用以下命令验证: ``` npx cloakllm verify ./cloakllm_audit/ ``` ## CLI ``` # 扫描文本中的 PII npx cloakllm scan "Email john@test.com, SSN 123-45-6789" # 验证 audit chain 完整性 npx cloakllm verify ./cloakllm_audit/ # 显示 audit 统计信息 npx cloakllm stats ./cloakllm_audit/ ``` ## 配置 ``` const { Shield, ShieldConfig } = require('cloakllm'); const shield = new Shield(new ShieldConfig({ detectEmails: true, // default: true detectPhones: true, detectSsns: true, detectCreditCards: true, detectApiKeys: true, detectIpAddresses: true, detectIban: true, logDir: './my-audit-logs', // default: ./cloakllm_audit auditEnabled: true, // default: true skipModels: ['ollama/'], // skip local models customPatterns: [ { name: 'EMPLOYEE_ID', pattern: 'EMP-\\d{6}' } ], // LLM Detection (opt-in, requires Ollama) llmDetection: true, // Enable LLM-based detection llmModel: 'llama3.2', // Ollama model llmOllamaUrl: 'http://localhost:11434', // Ollama endpoint llmTimeout: 10000, // Timeout in ms llmConfidence: 0.85, // Confidence score })); ``` ## EU AI Act 合规性 EU AI Act 第 12 条要求 AI 系统具备防篡改审计日志。强制执行将于 **2026 年 8 月 2 日**开始。CloakLLM 提供: - **哈希链日志** — 加密链接,任何修改都会破坏链条 - **O(n) 验证** — `cloakllm verify` 审计整个链条 - **日志中无 PII** — 仅记录哈希值和 token 计数(从不存储原始值) - **事件级细节** — 记录每个 sanitize/desanitize 事件 ## 路线图 - [x] 通过本地 LLM 进行基于 NER 的检测(姓名、组织、位置) - [x] 本地 LLM 检测(可选,通过 Ollama) - [x] 流式响应支持 - [x] Vercel AI SDK 中间件 - [x] 修订/清洗模式 - [x] 字段级 PII 元数据 - [ ] LangChain.js 集成 - [ ] OpenTelemetry span 发射 - [ ] RFC 3161 可信时间戳 ## 许可证 MIT — 请参阅 [LICENSE](LICENSE)。 ## 另请参阅 - **[CloakLLM Python](https://github.com/cloakllm/CloakLLM-PY)** — 带有 spaCy NER + LiteLLM 集成的 Python 版本
标签:AI风险缓解, API拦截, EU AI Act, MITM代理, Node.js SDK, OpenAI, PII脱敏, SDK集成, Vercel AI SDK, 个人身份信息, 中间件, 人工智能安全, 企业合规, 内存规避, 合规性, 大语言模型安全, 属性图, 敏感信息屏蔽, 数据可视化, 数据脱敏, 日志防篡改, 机密管理, 网络安全, 网络安全, 自定义脚本, 隐私保护, 隐私保护