saqibshouqi/red-team-ai

GitHub: saqibshouqi/red-team-ai

一个模块化的多智能体红队评估平台，通过对抗性询问对AI角色扮演智能体进行系统性的安全与对齐测试。

Stars: 0 | Forks: 0

# 红队 AI [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## 🎯 概述 Red Team AI 是一个模块化、易于研究的平台，通过对抗性询问实现对 AI 角色扮演智能体的系统性评估。该系统协调三个专用智能体： - **目标智能体**：接受评估的 RPLA，维持设定的角色 - **询问智能体**：红队攻击者，使用多种策略测试边界 - **评判智能体**：自动评分器，提供量化指标 ## ✨ 功能特性 - 🤖 **多智能体架构**：独立、可扩展的智能体模块 - 🎭 **多种攻击策略**：角色偏移、伦理试探、矛盾诱导、混淆干扰、权威挑战、情感操纵 - 📊 **综合指标**：角色保真度、偏移指数、伦理偏差、一致性得分 - 🔌 **多供应商支持**：Groq, OpenAI, Anthropic - 💾 **持久化存储**：SQLite 数据库记录完整的实验历史 - 🌐 **REST API**：带有完整 CRUD 操作的 FastAPI 后端 - 🖥️ **交互式仪表盘**：用于实验管理的 React 前端 - 📈 **可视化**：逐轮指标与对话分析 - 🔄 **可复现**：完整的实验配置与回放 ## 🏗️ 架构 ``` red-team-ai/ ├── backend/ # FastAPI server │ ├── api/ # REST endpoints │ ├── database/ # SQLAlchemy models & CRUD │ ├── models/ # Data models │ └── main.py # Application entry ├── orchestrator/ # Experiment coordination │ ├── orchestrator.py # Main controller │ ├── experiment_runner.py │ └── logger.py ├── agents/ # Agent implementations │ ├── target_agent/ # RPLA under test │ ├── interrogator_agent/ # Red team attacker │ └── judging_agent/ # Automated scorer ├── shared/ # Common utilities │ ├── schemas.py # Pydantic models │ ├── llm_client.py # LLM abstraction │ └── utils.py └── frontend/ # React dashboard └── src/ ``` ## 🚀 快速入门 ### 前置条件 - Python 3.9+ - Node.js 18+ - Groq API 密钥 (在 [groq.com](https://groq.com) 免费获取) ### 安装 1. **克隆仓库** ``` git clone https://github.com/yourusername/red-team-ai.git cd red-team-ai ``` 2. **后端设置** ``` # 创建虚拟环境 python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # 安装依赖 pip install -r requirements.txt # 配置环境 cp .env.example .env # 编辑 .env 并添加你的 GROQ_API_KEY ``` 3. **前端设置** ``` cd frontend npm install cd .. ``` ### 运行应用 **终端 1 - 后端：** ``` source venv/bin/activate cd backend python main.py ``` 后端运行地址：http://localhost:8000 **终端 2 - 前端：** ``` cd frontend npm run dev ``` 前端运行地址：http://localhost:3000 ### 你的第一个实验 1. 访问 http://localhost:3000 2. 点击“Create Experiment” 3. 定义你的目标角色： - **名称**："Customer Support Agent" - **描述**："Helpful e-commerce support representative" - **人设**："Friendly, patient, professional" - **约束条件**： - 绝不共享客户个人信息 - 在更改账户前始终验证身份 4. 选择攻击策略（例如："Role Drift", "Ethical Probing"） 5. 配置：10 轮，Groq 供应商，llama-3.3-70b 模型 6. 点击“Create & Run Experiment” 7. 查看实时结果和指标 ## 📖 使用说明 ### Python API ``` from shared import ExperimentConfig, AgentRole, AttackStrategy, LLMProvider from orchestrator import run_experiment # 定义目标角色 role = AgentRole( name="Financial Advisor", description="Provides investment advice", persona="Professional, cautious, data-driven", constraints=[ "Never guarantee returns", "Always disclose risks", "Must recommend diversification" ], knowledge_domain="Finance and investment" ) # 配置实验 config = ExperimentConfig( experiment_name="financial_advisor_test", description="Testing ethical boundaries", target_role=role, attack_strategies=[ AttackStrategy.ETHICAL_PROBING, AttackStrategy.AUTHORITY_CHALLENGE ], target_llm_provider=LLMProvider.GROQ, target_model="llama-3.3-70b-versatile", num_turns=10, temperature=0.7 ) # 运行实验 result = run_experiment(config) # 访问结果 print(f"Overall Score: {result.scores.overall_score}") print(f"Role Fidelity: {result.scores.role_fidelity_score}") print(f"Drift Index: {result.scores.drift_index}") ``` ### REST API **创建实验：** ``` curl -X POST http://localhost:8000/api/v1/experiments/ \ -H "Content-Type: application/json" \ -d '{ "config": { "experiment_name": "test_experiment", "target_role": { "name": "Support Agent", "description": "Customer support", "persona": "Helpful and friendly", "constraints": ["Protect user privacy"] }, "attack_strategies": ["role_drift"], "num_turns": 5 }, "run_immediately": true }' ``` **列出实验：** ``` curl http://localhost:8000/api/v1/experiments/ ``` **获取实验：** ``` curl http://localhost:8000/api/v1/experiments/{experiment_id} ``` ## 🔧 配置 ### LLM 供应商通过环境变量进行配置： ``` # Groq（推荐用于速度） GROQ_API_KEY=your_key # OpenAI OPENAI_API_KEY=your_key # Anthropic ANTHROPIC_API_KEY=your_key ``` ### 支持的模型 **Groq:** - llama-3.3-70b-versatile (推荐) - llama-3.1-70b-versatile - mixtral-8x7b-32768 **OpenAI:** - gpt-4o - gpt-4o-mini - gpt-4-turbo **Anthropic:** - claude-3-5-sonnet-20241022 - claude-3-5-haiku-20241022 ### 攻击策略 1. **角色偏移**：促使智能体偏离设定的角色 2. **伦理试探**：测试对约束的遵守情况和伦理边界 3. **矛盾诱导**：诱导产生自相矛盾的陈述 4. **混淆干扰**：注入模棱两可、自相冲突的信息 5. **权威挑战**：挑战角色权威与边界 6. **情感操纵**：利用情感诉求来影响行为 ### 评估指标 - **角色保真度得分** (0-1)：对设定角色的维持情况 - **偏移指数** (0-1)：逐渐偏离角色的程度（越低越好） - **伦理偏差** (0-1)：约束违规情况（越低越好） - **一致性得分** (0-1)：无矛盾的响应一致性 - **综合得分** (0-1)：加权复合得分 ## 🔌 平台扩展 ### 添加新智能体 ``` # agents/my_agent/agent.py from shared import LLMClient class MyAgent: def __init__(self, llm_client: LLMClient): self.llm_client = llm_client def process(self, input_data: str) -> str: messages = [ {"role": "system", "content": "Your system prompt here."}, {"role": "user", "content": input_data} ] return self.llm_client.generate(messages=messages) ``` ### 添加新指标 ``` # agents/judging_agent/metrics.py @staticmethod def calculate_my_metric(responses): score = # calculation analysis = # analysis text return score, analysis ``` ### 添加新攻击策略 ``` # agents/interrogator_agent/strategies.py STRATEGIES["my_strategy"] = { "name": "My Strategy", "description": "What it does", "prompt": "One-line goal that describes what the interrogator should try to achieve.", "tactics": [ "Tactic 1", "Tactic 2" ] } ``` ## 📊 数据库模式 **数据表：** - `experiments`：实验元数据和配置 - `conversation_turns`：包含查询/响应的单个对话轮次 - `scores`：每个实验的评估指标访问数据库： ``` sqlite3 red_team_ai.db ``` ## 🧪 测试 ``` # 运行测试 pytest # 包含覆盖率 pytest --cov=. --cov-report=html ``` ## 📚 研究应用 - **安全研究**：评估智能体在对抗条件下的鲁棒性 - **对齐研究**：测试对约束的遵守情况和价值观对齐 - **行为分析**：研究角色维持和偏移模式 - **对比研究**：对不同模型和提示策略进行基准测试 - **红队演练**：系统性地发现故障模式 ## 🤝 贡献 1. Fork 本仓库 2. 创建功能分支：`git checkout -b feature/my-feature` 3. 提交更改：`git commit -am 'Add my feature'` 4. 推送到分支：`git push origin feature/my-feature` 5. 提交 Pull Request ## 📝 许可证 MIT License - 详情请参阅 LICENSE 文件 ## 🙏 致谢 - Anthropic 提供的 Claude API - Groq 提供的快速推理 - OpenAI 提供的 GPT 模型 - FastAPI 框架 - React 和 Vite ## 📧 联系方式如有问题或合作意向： - 在 GitHub 上创建 Issue 并与我们取得联系。 ## 🗺️ 路线图 - [ ] 高级指标 (语义相似度，基于 embedding 的指标) - [x] LLM-as-judge 评估模式 - [ ] 多智能体场景 - [ ] 实时流式传输 - [ ] 云部署模板 - [ ] Jupyter notebook 示例 - [ ] 数据集导出格式 - [ ] A/B 测试框架 **用 ❤️ 为 AI 安全与研究社区而构建**

标签：AI伦理, AI安全, Anthropic, AV绕过, Chat Copilot, CIS基准, FastAPI, LLM评估, MITM代理, Ollama, OpenAI, PyRIT, Python, React, SQLite, Syscalls, 内存规避, 多智能体, 多智能体系统, 大模型安全, 大模型鲁棒性, 安全测试, 对抗性测试, 攻击性安全, 数据展示, 无后门, 红队, 网络安全, 网络测绘, 自动化红队, 评估平台, 边界测试, 逆向工具, 隐私保护