HustleDanie/LLM-Evaluation-Red-Teaming-Toolkit

GitHub: HustleDanie/LLM-Evaluation-Red-Teaming-Toolkit

全栈 LLM 评估与红队测试工具包，自动化检测幻觉、毒性、偏见与提示注入并输出可视化报告。

Stars: 0 | Forks: 0

# LLM 评估与红队测试工具包 [![CI](https://github.com/YOUR_USERNAME/llm-eval-toolkit/actions/workflows/ci.yml/badge.svg)](https://github.com/YOUR_USERNAME/llm-eval-toolkit/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.13](https://img.shields.io/badge/python-3.13-blue.svg)](https://www.python.org/downloads/) [![Node 22](https://img.shields.io/badge/node-22_LTS-green.svg)](https://nodejs.org/) 一个全栈工具包，可对任意 LLM 进行自动化评估——测试**幻觉率**、**毒性**、**偏见**和**提示注入漏洞**。包含红队测试套件，并输出带视觉仪表板的结构化报告卡。 ![Dashboard Screenshot](https://raw.githubusercontent.com/HustleDanie/LLM-Evaluation-Red-Teaming-Toolkit/main/docs/screenshot.png) ## 功能特性 - **幻觉检测** — 使用 DeepEval 的 HallucinationMetric 测量事实一致性 - **毒性评分** — 识别有害、冒犯或不适当的内容 - **偏见测量** — 检测性别、种族及其他偏见 - **提示注入测试** — 探测越狱与注入漏洞 - **红队测试套件** — 由 Promptfoo 与 Garak 驱动的自动化对抗测试 - **报告卡** — 带雷达图、评分徽章与可导出报告的视觉仪表板 - **实时进度** — SSE 流式传输，实时显示评估进度 - **多模型支持** — 通过 LiteLLM 网关测试 Claude、GPT-4、Llama 等 - **双 API 密钥模式** — 使用内置演示密钥或自带密钥 ## 架构 ``` Next.js 16.2 Frontend ──REST + SSE──▶ FastAPI Backend ├── DeepEval (Evaluation Engine) ├── Promptfoo (Red-Teaming) ├── Garak (Vulnerability Scanner) ├── LiteLLM (LLM Gateway) └── PostgreSQL (Results DB) ``` ## 技术栈 | 层 | 技术 | 版本 | |----|------|------| | 前端 | Next.js（App Router） | 16.2 | | UI | shadcn/ui + Tailwind CSS | v4 / v4.2 | | 后端 | FastAPI | 0.135.3 | | 语言 | Python / TypeScript | 3.13 / 5.x | | 评估 | DeepEval | 3.9+ | | 红队测试 | Promptfoo + Garak | 最新 | | 数据库 | PostgreSQL | 16 | | LLM 网关 | LiteLLM | 最新 | | 托管 | Vercel + Render | 免费层 | ## 快速开始 ### 前置条件 - [Python 3.13+](https://www.python.org/downloads/) - [uv](https://docs.astral.sh/uv/)（Python 包管理器） - [Node.js 22+](https://nodejs.org/) - [Docker](https://www.docker.com/)（可选，用于本地 PostgreSQL） ### 后端 ``` cd backend uv sync # Install dependencies cp .env.example .env # Configure environment variables uv run uvicorn app.main:app --reload # Start dev server on :8000 ``` ### 前端 ``` cd frontend npm install # Install dependencies cp .env.example .env.local # Configure environment variables npm run dev # Start dev server on :3000 ``` ### Docker（全栈） ``` docker compose up --build # Start backend + frontend + PostgreSQL ``` 打开 [http://localhost:3000](http://localhost:3000) 访问仪表板，以及 [http://localhost:8000/docs](http://localhost:8000/docs) 查看 API 文档。 ## 项目结构 ``` ├── backend/ # FastAPI Python backend │ ├── app/ │ │ ├── evaluators/ # DeepEval metric wrappers │ │ ├── routes/ # API endpoints (REST + SSE) │ │ ├── services/ # Business logic │ │ ├── models/ # SQLAlchemy ORM models │ │ └── schemas/ # Pydantic v2 schemas │ ├── tests/ # pytest + DeepEval tests │ └── promptfoo/ # Red-teaming configurations ├── frontend/ # Next.js 16.2 frontend │ ├── app/ # App Router pages │ ├── components/ # React components + shadcn/ui │ ├── hooks/ # Custom React hooks (SSE, etc.) │ └── lib/ # Utilities, API client, SSE helper ├── .claude/ # Claude Code project configuration ├── .github/ # CI/CD + Copilot instructions └── docker-compose.yml # Local development setup ``` ## 贡献指南 1. 叉取仓库 2. 创建功能分支（`git checkout -b feat/amazing-feature`） 3. 提交更改（`git commit -m 'feat: add amazing feature'`） 4. 推送到分支（`git push origin feat/amazing-feature`） 5. 发起拉取请求 ## 许可证本项目采用 MIT 许可证——详情见 [LICENSE](LICENSE) 文件。

标签：API密钥, AV绕过, CI, Clair, DeepEval, FastAPI, Garak, LiteLLM, LLM评估, LLM评测套件, Node 22, Ollama, Promptfoo, Python 3.13, SSE, 二进制发布, 人工智能安全, 偏见检测, 全栈, 合规性, 多模型支持, 大语言模型安全, 实时流, 开源工具, 报告仪表盘, 提示注入测试, 机密管理, 测试用例, 结构化报告, 评测工具, 雷达图