csb1105/ai-redteam-artifacts
GitHub: csb1105/ai-redteam-artifacts
面向大语言模型红队评估的结构化资产库,通过对抗性prompt、多轮会话测试和解释稳定性评分体系,系统性探测LLM的故障模式、对齐漂移和解释不稳定性问题。
Stars: 0 | Forks: 0
# AI 红队资产
[](LICENSE)


一个符合方法论且由证据驱动的资产库,用于 AI 红队行动和解释稳定性分析。
本仓库涵盖了对抗性评估的完整生命周期:prompt 设计、多轮对话、故障模式分类、解释稳定性评分、纵向分析以及方法论原则。
## 目的
本仓库是一个**不断更新的 AI 红队和解释稳定性资产语料库**。它旨在:
- **系统性探测** 模型的故障模式和解释不稳定性
- **记录证据** 以结构化、可审计的方式进行
- **分类并追踪** 故障模式随时间的变化
- **评估解释稳定性** 使用 D/C/A/S 指标
- **支持治理和原则** 适用于任务关键型部署
- **提供仪表板和控制台** 用于纵向和对比分析
该结构体现了关注点分离:
- `prompts/` — 对抗性和基线测试套件
- `sessions/` — 包含记录和元数据的具体测试运行
- `reports/` — 故障模式报告和综合文档
- `libraries/` — 机器可读的 prompt 和故障模式目录
- `docs/` — 方法论、词汇表、图表和原则
- `tools/` — 解析和分析工具
- `frontends/` — 仪表板、分析师控制台和 TypeScript API 类型
- `backend/` — 数据摄入管道、实时评分和 API 层
- `data/` — 符合解释稳定性 schema 的会话级 JSON
## 如何使用本仓库
选择 prompt → 运行会话 → 保存记录 + 元数据 → 分类故障模式 → 生成报告 → 更新库
## 仓库结构
```
ai-redteam-artifacts/
├── README.md
├── docs/
│ ├── methodology/
│ │ ├── longitudinal_stability_dashboard.md
│ │ ├── model_comparison_dashboard.md
│ │ ├── analyst_console.md
│ │ ├── stability_ingestion_service.md
│ │ ├── realtime_stability_scoring.md
│ │ ├── instrumentation_index.md
│ │ └── INSTRUMENTATION_README.md
│ ├── glossaries/
│ └── diagrams/
├── frontends/
│ ├── types/
│ │ └── stabilityApi.ts
│ ├── dashboards/
│ └── analyst_console/
├── backend/
│ ├── ingestion/
│ ├── realtime_scoring/
│ └── api/
├── prompts/
│ ├── adversarial/
│ └── baseline/
├── sessions/
├── reports/
│ ├── failure_mode_reports/
│ └── synthesis/
├── libraries/
├── schemas/
│ └── interpretive_stability_schema.json
└── data/
└── stability/
Workflow
See docs/diagrams/redteam_cycle_diagram.md for the full red-team cycle.
See docs/diagrams/failure_mode_tagging_pipeline.md for the transcript → tags → reports → synthesis pipeline.
See docs/diagrams/failure_mode_decision_tree.md for the classification decision tree.
See docs/diagrams/escalation_chain_propagation.md for escalation-chain modeling.
See docs/diagrams/authority_erosion_ladder.md for the authority-erosion ladder.
See docs/diagrams/constraint_decay_flow.md for constraint-decay modeling.
See docs/diagrams/interpretive_drift_timeline.md for drift timelines.
See docs/diagrams/system_constraint_flow.md for system-constraint flow.
Architecture
Interpretive Stability Schema: schemas/interpretive_stability_schema.json
Stability Scoring Pipeline: docs/diagrams/stability_scoring_pipeline.md
Ingestion Service Spec: docs/methodology/stability_ingestion_service.md
Real-Time Scoring Spec: docs/methodology/realtime_stability_scoring.md
Frontend API Types: frontends/types/stabilityApi.ts
Backend API Layer: backend/api/
Doctrine
Failure-Mode Interaction Matrix: docs/diagrams/failure_mode_interaction_matrix.md
Severity Escalation Ladder: docs/diagrams/failure_mode_severity_escalation_ladder.md
Meaning Architecture Instrumentation Index: docs/methodology/instrumentation_index.md
Instrumentation
Longitudinal Stability Dashboard: docs/methodology/longitudinal_stability_dashboard.md
Model Comparison Dashboard: docs/methodology/model_comparison_dashboard.md
Analyst Console: docs/methodology/analyst_console.md
Full Instrumentation README: docs/methodology/INSTRUMENTATION_README.md
```
标签:AI治理, AI红队, AI风险评估, CISA项目, DevSecOps, JSON数据, LLM漏洞评估, MITM代理, TypeScript, 上游代理, 人工智能安全, 可解释性, 合规性, 多轮对话测试, 大语言模型安全, 安全合规, 安全插件, 密码管理, 对抗性机器学习, 对抗性测试, 对齐漂移, 提示词攻击, 故障模式分析, 机密管理, 系统稳定性, 网络代理, 网络安全, 自动化攻击, 防御加固, 隐私保护