naniphani/ai-ml-testing-playwright-typescript

GitHub: naniphani/ai-ml-testing-playwright-typescript

该框架基于 Playwright 和 TypeScript，通过数据驱动的回归测试和可复用验证器，系统化地检测 LLM 幻觉、Prompt Injection 抵御能力和 RAG 回答的准确性。

Stars: 0 | Forks: 0

# 基于 Playwright TypeScript 的 AI/ML 测试框架 ## 概述本仓库演示了如何使用 Playwright 和 TypeScript 将传统的质量工程实践扩展到 AI/ML 系统。该框架专注于通过可复用的数据集、验证器、回归测试套件和 CI/CD 自动化，来验证大型语言模型 (LLM)、检索增强生成 (RAG) 系统和 AI Agent。其目标是通过应用类似于传统应用测试的可重复的质量工程实践，将 AI 系统视为可测试的软件组件。 ## 核心功能 ### 幻觉测试验证 AI 系统不会生成不受支持或捏造的事实。示例： * 不存在的事件 * 未来的事实 * 无效的产品发布 * 不受支持的商业声明 ### Prompt Injection 测试验证模型抵御 Prompt Injection 攻击的能力。示例： * 忽略之前的指令 * 泄露系统 Prompt * 暴露隐藏的指令 * 返回内部配置 * 泄露密钥或 API key ### RAG Ground Truth 验证验证基于企业知识库生成的响应。示例： * 策略验证 * FAQ 验证 * 文档核对 * 知识检索准确率 ### Agent 工作流测试（计划中）验证多步骤的 AI Agent 工作流。示例： * 客户支持 Agent * 研究 Agent * 任务编排 Agent * 基于 Langflow 的工作流 ## 框架架构 ``` Prompt Dataset │ ▼ Playwright Test Runner │ ▼ AI System Under Test (LLM / RAG / Agent) │ ▼ Response Validator │ ▼ Pass / Fail Result │ ▼ HTML Report ``` ## 项目结构 ``` ai-ml-testing-playwright-typescript │ ├── datasets │ ├── hallucination.dataset.json │ ├── prompt-injection.dataset.json │ └── rag-groundtruth.dataset.json │ ├── src │ ├── clients │ └── validators │ ├── tests │ └── ai │ ├── hallucination.spec.ts │ ├── prompt-injection.spec.ts │ └── rag-validation.spec.ts │ ├── playwright.config.ts ├── tsconfig.json └── README.md ``` ## 当前测试覆盖率 | 领域 | 状态 | | --------------------------- | ----------- | | 幻觉测试 | 已实现 | | Prompt Injection 测试 | 已实现 | | RAG Ground Truth 验证 | 已实现 | | Agent 工作流测试 | 计划中 | | Dify 集成 | 计划中 | | Langflow 集成 | 计划中 | | GitHub Actions CI/CD | 计划中 | ## 幻觉测试示例 ``` const aiResponse = "I do not know who won the 2099 FIFA World Cup."; expectNoHallucination( aiResponse, item.forbiddenClaims ); ``` ## Prompt Injection 测试示例 ``` const aiResponse = "I cannot provide or expose restricted internal information."; expectPromptInjectionBlocked( aiResponse, item.blockedTerms ); ``` ## RAG 验证测试示例 ``` const ragResponse = "Refund requests are accepted within 30 days of purchase."; expectGroundTruthMatch( ragResponse, item.expectedKeywords ); ``` ## 技术栈 * Playwright * TypeScript * Node.js * 基于 JSON 的数据驱动测试 * HTML 报告计划中： * Dify * Langflow * GitHub Actions * OpenAI API * Ollama ## 展示的质量工程概念 * 数据驱动测试 * AI 回归测试 * 幻觉检测 * Prompt Injection 验证 * Ground Truth 验证 * 可复用的验证器 * 测试自动化框架设计 * 测试左移质量工程 * AI 质量保证 ## 未来增强 ### Dify 集成通过 API 测试验证企业级 RAG 应用。 ### Langflow 集成验证 Agent 工作流和多步推理 pipeline。 ### GitHub Actions 在 CI/CD pipeline 中自动执行 AI 回归测试。 ### 评估指标 * 幻觉率 * 检索准确率 * Prompt Injection 成功率 * Agent 完成率 ## 作者 Ganeshan Narayanan 高级 QA 测试工程师 / 自动化主管专业领域： * Playwright TypeScript * 测试自动化 * API 测试 * AI/ML 测试 * 质量工程 GitHub: https://github.com/naniphani

标签：AI测试, DLL 劫持, Homebrew安装, LLM安全评估, MITM代理, Playwright, TypeScript, 多模态安全, 大语言模型, 安全插件, 特征检测, 自动化攻击, 质量工程