csb1105/ai-redteam-artifacts

GitHub: csb1105/ai-redteam-artifacts

面向大语言模型红队评估的结构化资产库，通过对抗性prompt、多轮会话测试和解释稳定性评分体系，系统性探测LLM的故障模式、对齐漂移和解释不稳定性问题。

Stars: 1 | Forks: 1

# AI 红队资产 [![许可证](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) ![状态](https://img.shields.io/badge/status-active-green.svg) ![领域](https://img.shields.io/badge/domain-AI%20Red%20Team-critical.svg) 一个符合方法论且由证据驱动的资产库，用于 AI 红队行动和解释稳定性分析。本仓库涵盖了对抗性评估的完整生命周期：prompt 设计、多轮对话、故障模式分类、解释稳定性评分、纵向分析以及方法论原则。 ## 目的本仓库是一个**不断更新的 AI 红队和解释稳定性资产语料库**。它旨在： - **系统性探测** 模型的故障模式和解释不稳定性 - **记录证据** 以结构化、可审计的方式进行 - **分类并追踪** 故障模式随时间的变化 - **评估解释稳定性** 使用 D/C/A/S 指标 - **支持治理和原则** 适用于任务关键型部署 - **提供仪表板和控制台** 用于纵向和对比分析该结构体现了关注点分离： - `prompts/` — 对抗性和基线测试套件 - `sessions/` — 包含记录和元数据的具体测试运行 - `reports/` — 故障模式报告和综合文档 - `libraries/` — 机器可读的 prompt 和故障模式目录 - `docs/` — 方法论、词汇表、图表和原则 - `tools/` — 解析和分析工具 - `frontends/` — 仪表板、分析师控制台和 TypeScript API 类型 - `backend/` — 数据摄入管道、实时评分和 API 层 - `data/` — 符合解释稳定性 schema 的会话级 JSON ## 如何使用本仓库选择 prompt → 运行会话 → 保存记录 + 元数据 → 分类故障模式 → 生成报告 → 更新库 ## 仓库结构 ``` ai-redteam-artifacts/ ├── README.md ├── docs/ │ ├── methodology/ │ │ ├── longitudinal_stability_dashboard.md │ │ ├── model_comparison_dashboard.md │ │ ├── analyst_console.md │ │ ├── stability_ingestion_service.md │ │ ├── realtime_stability_scoring.md │ │ ├── instrumentation_index.md │ │ └── INSTRUMENTATION_README.md │ ├── glossaries/ │ └── diagrams/ ├── frontends/ │ ├── types/ │ │ └── stabilityApi.ts │ ├── dashboards/ │ └── analyst_console/ ├── backend/ │ ├── ingestion/ │ ├── realtime_scoring/ │ └── api/ ├── prompts/ │ ├── adversarial/ │ └── baseline/ ├── sessions/ ├── reports/ │ ├── failure_mode_reports/ │ └── synthesis/ ├── libraries/ ├── schemas/ │ └── interpretive_stability_schema.json └── data/ └── stability/ Workflow See docs/diagrams/redteam_cycle_diagram.md for the full red-team cycle. See docs/diagrams/failure_mode_tagging_pipeline.md for the transcript → tags → reports → synthesis pipeline. See docs/diagrams/failure_mode_decision_tree.md for the classification decision tree. See docs/diagrams/escalation_chain_propagation.md for escalation-chain modeling. See docs/diagrams/authority_erosion_ladder.md for the authority-erosion ladder. See docs/diagrams/constraint_decay_flow.md for constraint-decay modeling. See docs/diagrams/interpretive_drift_timeline.md for drift timelines. See docs/diagrams/system_constraint_flow.md for system-constraint flow. Architecture Interpretive Stability Schema: schemas/interpretive_stability_schema.json Stability Scoring Pipeline: docs/diagrams/stability_scoring_pipeline.md Ingestion Service Spec: docs/methodology/stability_ingestion_service.md Real-Time Scoring Spec: docs/methodology/realtime_stability_scoring.md Frontend API Types: frontends/types/stabilityApi.ts Backend API Layer: backend/api/ Doctrine Failure-Mode Interaction Matrix: docs/diagrams/failure_mode_interaction_matrix.md Severity Escalation Ladder: docs/diagrams/failure_mode_severity_escalation_ladder.md Meaning Architecture Instrumentation Index: docs/methodology/instrumentation_index.md Instrumentation Longitudinal Stability Dashboard: docs/methodology/longitudinal_stability_dashboard.md Model Comparison Dashboard: docs/methodology/model_comparison_dashboard.md Analyst Console: docs/methodology/analyst_console.md Full Instrumentation README: docs/methodology/INSTRUMENTATION_README.md ```

标签：AI治理, AI红队, AI风险评估, CISA项目, DevSecOps, JSON数据, LLM漏洞评估, MITM代理, TypeScript, 上游代理, 人工智能安全, 可解释性, 合规性, 多轮对话测试, 大语言模型安全, 安全合规, 安全插件, 密码管理, 对抗性机器学习, 对抗性测试, 对齐漂移, 提示词攻击, 故障模式分析, 机密管理, 系统稳定性, 网络代理, 网络安全, 自动化攻击, 防御加固, 隐私保护