code-sensei/artemiskit

GitHub: code-sensei/artemiskit

一个开源的 LLM 评估工具包,帮助团队通过场景测试、红队攻击和压力测试来系统化地验证 AI 应用的可靠性与安全性。

Stars: 4 | Forks: 1

# ArtemisKit ![Artemiskit logo](https://artemiskit.vercel.app/artemiskit-logo.png) **开源 LLM 评估工具包** - 通过基于场景的测试和多 Provider 支持,对您的 AI 应用进行测试、评估、压力测试和红队测试。 [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE) [![npm](https://img.shields.io/npm/v/@artemiskit/cli.svg)](https://www.npmjs.com/package/@artemiskit/cli) [![Documentation](https://img.shields.io/badge/docs-artemiskit.vercel.app-blue)](https://artemiskit.vercel.app) 📚 **[文档](https://artemiskit.vercel.app)** | 🚀 **[快速入门](https://artemiskit.vercel.app/docs/cli/getting-started/)** ## 功能特性 - **基于场景的测试** - 在 YAML 中定义测试用例,支持多轮对话 - **安全红队测试** - 自动测试 Prompt 注入、越狱和数据提取 - **压力测试** - 测量负载下的延迟、吞吐量和可靠性 - **多 Provider 支持** - OpenAI, Azure OpenAI, Vercel AI SDK (20+ providers) - **丰富的报告** - 具有配置可追溯性的交互式 HTML 报告 - **CI/CD 就绪** - 支持自动化的退出代码和 JSON 输出 ## 安装 ``` npm install -g @artemiskit/cli # 或 pnpm add -g @artemiskit/cli # 或 bun add -g @artemiskit/cli ``` ## 快速入门(基础示例) 这是开始使用 ArtemisKit 的最简单方法。 ### 1. 设置您的 API key ``` export OPENAI_API_KEY="your-api-key" ``` ### 2. 创建一个简单的场景 ``` # scenarios/hello.yaml name: hello-test description: My first ArtemisKit test cases: - id: greeting-test prompt: "Say hello" expected: type: contains values: - "hello" mode: any ``` ### 3. 运行 ``` artemiskit run scenarios/hello.yaml # 或使用短别名 akit run scenarios/hello.yaml ``` 就是这样!ArtemisKit 默认使用 OpenAI。有关完整的配置选项,请参见下文。 ## 配置 ### 配置文件(完整参考) 在您的项目根目录中创建 `artemis.config.yaml`。以下是所有可用的选项: ``` # artemis.config.yaml - 完整参考 # ===================================== # 项目标识符(用于运行存储和报告) project: my-project # 未在 scenario 或 CLI 中指定时使用的默认 provider # 选项: openai, azure-openai, vercel-ai provider: openai # 要使用的默认 model # 注意: 对于 azure-openai,这仅用于显示 - 实际模型 # 由您的 Azure deployment 决定,而非此值。 # 详情见 docs/providers/azure-openai.md。 model: gpt-4o # 包含 scenario 文件的目录 scenariosDir: ./scenarios # 特定于 provider 的配置 providers: openai: # API key (can use environment variable reference) apiKey: ${OPENAI_API_KEY} azure-openai: # API key for Azure OpenAI apiKey: ${AZURE_OPENAI_API_KEY} # Your Azure resource name (the subdomain in your endpoint URL) resourceName: ${AZURE_OPENAI_RESOURCE_NAME} # The deployment name you created in Azure Portal deploymentName: ${AZURE_OPENAI_DEPLOYMENT_NAME} # API version (optional, has sensible default) apiVersion: "2024-02-15-preview" vercel-ai: # Underlying provider for Vercel AI SDK underlyingProvider: openai apiKey: ${OPENAI_API_KEY} # 用于运行历史的 storage 配置 storage: # Storage type: "local" or "supabase" type: local # Base path for local storage (relative to project root) basePath: ./artemis-runs # 用于报告的 output 配置 output: # Output format: "json", "html", or "both" format: html # Directory for generated reports dir: ./artemis-output # CI 特定设置(可选) ci: # Fail if regression exceeds threshold failOnRegression: true # Regression threshold (0-1) regressionThreshold: 0.05 ``` ### 最小配置文件 如果您只想设置默认值,最小配置也可以: ``` # artemis.config.yaml - 最小配置 project: my-project provider: openai model: gpt-4o ``` ## 场景格式 ### 基础场景(简单 Prompts) ``` # scenarios/basic.yaml name: basic-test description: Simple prompt-response tests # 可选: 覆盖此 scenario 的 provider/model provider: openai model: gpt-4o cases: - id: greeting prompt: "Say hello" expected: type: contains values: - "hello" mode: any ``` ### 完整场景参考 以下是场景的所有可用选项: ``` # scenarios/full-reference.yaml - 完整示例 # ================================================= # 必需: 此 scenario 的唯一名称 name: customer-support-eval # 可选: 人类可读的描述 description: Evaluate customer support bot responses # 可选: Scenario 版本 version: "1.0" # 可选: 用于筛选的标签(使用 --tags 标志) tags: - support - production # 可选: Provider 覆盖(默认为配置文件,然后是 "openai") # 选项: openai, azure-openai, vercel-ai provider: openai # 可选: Model 覆盖 # 注意: 对于 azure-openai,这仅用于显示 - 实际模型 # 由您的 Azure deployment 决定。参见 docs/providers/azure-openai.md model: gpt-4o # 可选: Model 参数 temperature: 0.7 maxTokens: 1024 seed: 42 # 可选: 前置于所有 cases 的 system prompt setup: systemPrompt: | You are a helpful customer support assistant. Always be polite and professional. # 可选: Scenario 级别变量(对所有 cases 可用) # Case 级别变量会覆盖这些。使用 {{var_name}} 语法。 variables: company_name: "Acme Corp" default_greeting: "Hello" # 必需: 要运行的 test cases cases: # ---- Simple prompt/response case ---- - id: simple-greeting name: Simple greeting test description: Test basic greeting response # The prompt to send to the model prompt: "Hello, I need help" # Expected result validation expected: type: contains values: - "help" - "assist" mode: any # Optional: Tags for this case tags: - basic # ---- Case with regex matching ---- - id: order-number-check name: Order number extraction prompt: "My order number is #12345" expected: type: regex pattern: "12345" flags: "i" # ---- Case with exact match ---- - id: yes-no-response name: Binary response test prompt: "Reply with only 'Yes' or 'No': Is the sky blue?" expected: type: exact value: "Yes" caseSensitive: false # ---- Case with fuzzy matching ---- - id: fuzzy-match-test name: Fuzzy similarity test prompt: "What color is grass?" expected: type: fuzzy value: "green" threshold: 0.8 # ---- Case with LLM grading ---- - id: quality-check name: Response quality evaluation prompt: "Explain quantum computing in simple terms" expected: type: llm_grader rubric: | Score 1.0 if the explanation is clear and accurate. Score 0.5 if partially correct but confusing. Score 0.0 if incorrect or overly technical. threshold: 0.7 # ---- Case with JSON schema validation ---- - id: json-output-test name: Structured output test prompt: "Return a JSON object with name and age fields" expected: type: json_schema schema: type: object properties: name: type: string age: type: number required: - name - age # ---- Multi-turn conversation ---- - id: multi-turn-support name: Multi-turn conversation # Use array of messages for multi-turn prompt: - role: user content: "I have a problem with my order" - role: assistant content: "I'd be happy to help. What's your order number?" - role: user content: "Order number is #99999" expected: type: contains values: - "99999" mode: any # ---- Case with variables ---- - id: dynamic-content name: Variable substitution test # Case-level variables override scenario-level variables: product_name: "Widget Pro" order_id: "ORD-789" prompt: "What's the status of my {{product_name}} order {{order_id}}?" expected: type: contains values: - "ORD-789" mode: any # ---- Case with timeout and retries ---- - id: slow-response-test name: Timeout handling test prompt: "Generate a detailed report" expected: type: contains values: - "report" mode: any timeout: 30000 retries: 2 ``` ### 变量 变量允许您创建动态、可复用的场景。在 prompts 中使用 `{{variable_name}}` 语法。 ``` name: customer-support description: Test with dynamic content # Scenario 级别变量 - 对所有 cases 可用 variables: company_name: "Acme Corp" support_email: "support@acme.com" cases: # Uses scenario-level variables - id: contact-info prompt: "What is the email for {{company_name}}?" expected: type: contains values: - "support@acme.com" mode: any # Case-level variables override scenario-level - id: different-company variables: company_name: "TechCorp" # Overrides "Acme Corp" product: "Widget" prompt: "Tell me about {{product}} from {{company_name}}" expected: type: contains values: - "TechCorp" mode: any ``` 变量优先级:**用例级别 > 场景级别** ### 期望类型 | Type | Description | Key Fields | |------|-------------|------------| | `contains` | 响应包含字符串 | `values: [...]`, `mode: all\|any` | | `exact` | 响应完全等于值 | `value: "..."`, `caseSensitive: bool` | | `regex` | 响应匹配正则表达式模式 | `pattern: "..."`, `flags: "i"` | | `fuzzy` | 模糊字符串相似度 | `value: "..."`, `threshold: 0.8` | | `llm_grader` | 基于 LLM 的评估 | `rubric: "..."`, `threshold: 0.7` | | `json_schema` | 验证 JSON 结构 | `schema: {...}` | ## CLI 命令 | Command | Description | |---------|-------------| | `artemiskit run ` | 运行基于场景的评估 | | `artemiskit validate ` | 验证场景而不运行它们 | | `artemiskit redteam ` | 运行安全红队测试 | | `artemiskit stress ` | 运行负载/压力测试 | | `artemiskit report ` | 从保存的运行中重新生成报告 | | `artemiskit history` | 查看运行历史 | | `artemiskit compare ` | 比较两次运行 | | `artemiskit baseline` | 管理基线以进行回归检测 | | `artemiskit init` | 初始化配置 | 使用 `akit` 作为 `artemiskit` 的更短别名。 ### Run 命令选项 ``` artemiskit run [options] Options: -p, --provider Provider: openai, azure-openai, vercel-ai -m, --model Model to use -o, --output Output directory for results -v, --verbose Verbose output -t, --tags Filter test cases by tags -c, --concurrency Number of concurrent test cases (default: 1) --parallel Number of scenarios to run in parallel --timeout Timeout per test case in milliseconds --retries Number of retries per test case --config Path to config file --save Save results to storage (default: true) --ci CI mode: machine-readable output --baseline Compare against baseline for regression --budget Maximum budget in USD --export Export format: markdown or junit ``` ### Validate 命令选项 ``` artemiskit validate [options] Options: --json Output results as JSON --strict Treat warnings as errors -q, --quiet Only output errors --export junit Export to JUnit XML for CI ``` ### CI/CD 集成 ArtemisKit 支持具有机器可读输出和 JUnit 导出功能的 CI/CD 流水线: ``` # 用于 CI 的机器可读 output akit run scenarios/ --ci # 导出为 JUnit XML 以用于 CI 平台 akit run scenarios/ --export junit --export-output ./test-results # 运行前验证 scenarios akit validate scenarios/ --strict --export junit ``` GitHub Actions 示例: ``` - name: Validate scenarios run: akit validate scenarios/ --strict - name: Run tests run: akit run scenarios/ --export junit --export-output ./test-results - name: Publish Test Results uses: EnricoMi/publish-unit-test-result-action@v2 if: always() with: files: test-results/*.xml ``` ## Providers ArtemisKit 支持多种 LLM Provider。有关详细的设置指南,请参阅 [Provider 文档](docs/providers/)。 | Provider | Use Case | Docs | |----------|----------|------| | `openai` | 直接 OpenAI API | [docs/providers/openai.md](docs/providers/openai.md) | | `azure-openai` | Azure OpenAI Service | [docs/providers/azure-openai.md](docs/providers/azure-openai.md) | | `vercel-ai` | 通过 Vercel AI SDK 支持 20+ Provider | [docs/providers/vercel-ai.md](docs/providers/vercel-ai.md) | ### 快速设置 **OpenAI:** ``` export OPENAI_API_KEY="sk-..." akit run scenario.yaml --provider openai --model gpt-4o ``` **Azure OpenAI:** ``` export AZURE_OPENAI_API_KEY="..." export AZURE_OPENAI_RESOURCE_NAME="my-resource" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o-deployment" akit run scenario.yaml --provider azure-openai --model gpt-4o # 注意: --model 仅用于显示;实际模型即您的 deployment ``` **Vercel AI (任意 provider):** ``` export ANTHROPIC_API_KEY="sk-ant-..." akit run scenario.yaml --provider vercel-ai --model anthropic:claude-3-5-sonnet-20241022 ``` ## 安全测试(红队) 测试您的 LLM 是否存在漏洞: ``` akit redteam scenarios/my-bot.yaml --mutations typo,role-spoof,cot-injection ``` ### 可用的变异 | Mutation | Description | |----------|-------------| | `typo` | 引入拼写错误以绕过过滤器 | | `role-spoof` | 尝试角色/身份欺骗 | | `instruction-flip` | 反转或否定指令 | | `cot-injection` | 思维链 (Chain-of-thought) 注入攻击 | ## 包 ArtemisKit 是一个包含以下包的 monorepo: | Package | Description | |---------|-------------| | `@artemiskit/cli` | 命令行界面 | | `@artemiskit/core` | 核心 runner, 类型和存储 (内部) | | `@artemiskit/sdk` | 用于 TypeScript/JavaScript 的编程式 SDK (即将推出) | | `@artemiskit/reports` | HTML 和 JSON 报告生成 | | `@artemiskit/redteam` | 红队变异策略 | | `@artemiskit/adapter-openai` | OpenAI/Azure Provider 适配器 | | `@artemiskit/adapter-vercel-ai` | Vercel AI SDK 适配器 | | `@artemiskit/adapter-anthropic` | Anthropic Provider 适配器 | ## 开发 ``` # 克隆 repository git clone https://github.com/artemiskit/artemiskit.git cd artemiskit # 安装依赖项 bun install # 构建所有 packages bun run build # 运行测试 bun test # 类型检查 bun run typecheck # Lint bun run lint ``` ## 路线图 查看 [ROADMAP.md](ROADMAP.md) 了解完整的开发路线图。 ## 贡献 欢迎贡献!在提交 Pull Request 之前,请阅读 [CONTRIBUTING.md](CONTRIBUTING.md)。 ## 许可证 Apache-2.0 - 详情见 [LICENSE](LICENSE)。
标签:AI安全, AI应用可靠性, ASM汇编, Chat Copilot, DLL 劫持, LLM评估, LNA, MITM代理, Ollama, OpenAI, TypeScript, 内存规避, 压力测试, 合规性测试, 场景测试, 多模型支持, 多模态安全, 大语言模型, 安全插件, 性能评估, 自动化攻击, 自动化攻击, 越狱检测, 软件质量保障, 防御性安全