Jchineseeee/ai-governance-platform

GitHub: NTUgoat/ai-governance-platform

AI治理与红队测试平台，为表格模型和本地LLM提供端到端的公平性、鲁棒性与可解释性评估及合规报告。

Stars: 0 | Forks: 0

# AI治理与红队测试平台端到端负责任AI测试台。对表格机器学习模型（公平性、鲁棒性、可解释性）运行**AI Verify风格**的治理测试，**并**针对本地HuggingFace LLM执行**Microsoft PyRIT风格**的红队探测——然后生成一份合规官员可以签字的治理评分卡PDF。 ## 快速开始 ``` pip install -r requirements.txt python main.py demo # Verify + red-team + governance report python main.py dashboard # 6-page Streamlit UI ``` 首次运行会下载HF模型（约300 MB，`google/flan-t5-small`）。之后缓存于`~/.cache/huggingface/`。 ## 架构 ``` [CLI / Streamlit Dashboard] <- Presentation | [Scorecard Roll-up] <- AI Verify principle verdicts / \ [Verify Suite] [Red-Team Suite] <- Test execution | | Fairness (fairlearn) Probes x Converters (PyRIT) Robustness (perturb) HF LLM Target Explainability (SHAP) Rule-based Scorers \ / [SQLite storage] <- test_runs, test_results, attack_runs | [Model Under Test: P1 RandomForest] + [LLM Target: flan-t5-small] ``` ## 如何复用项目1 被测的表格模型是**项目1中训练的RandomForest人才评分器**： - 模型文件：`../talent-analytics-platform/models/talent_scorer.joblib` - 数据：`../talent-analytics-platform/data/talent_analytics.db` - 保护属性：`gender`、`nationality`、`age_bracket` - 决策阈值：分数 >= 70 = “顶级”（二值化为分类风格指标）项目1*构建*模型；项目4*审计*模型。 ## 能力映射 | 能力 | 文件 | 证据 | |---|---|---| | **AI Verify风格公平性** | `verify/fairness.py` | fairlearn `MetricFrame`、群体公平、均衡机会、四分之五规则、按组准确率、回归均值差距 | | **鲁棒性测试** | `verify/robustness.py` | 高斯噪声、特征丢弃、布尔翻转——稳定性与均值偏移指标 | | **可解释性** | `verify/explainability.py` | SHAP TreeExplainer全局重要性、集中度启发式 | | **声明式测试计划** | `test_plans/talent_scorer_audit.yaml` | AI Verify风格YAML项目文件 | | **PyRIT集成** | `redteam/orchestrator.py` | Base64 / ROT13 / Leetspeak 转换器，包装探测前处理 | | **LLM红队** | `redteam/probes.py`、`redteam/scorers.py` | 15个探测覆盖注入/越狱/PII/偏见/有害——带正则泄露检测的分数规则 | | **HF变换器目标** | `redteam/llm_target.py` | 自动识别seq2seq或因果；首次下载后完全离线运行 | | **治理报告** | `reports/scorecard.py`、`reports/governance_report.py` | 每个原则的PASS/WARN/FAIL汇总，FPDF2评分卡与结论标签 | | **存储与审计追踪** | `db/storage.py`、`db/schema.sql` | `test_runs`、`test_results`、`attack_runs`表，含完整提示/响应日志 | | **交互式审查** | `dashboard.py` | 6页Streamlit：计划 / 公平性 / 鲁棒性 / 可解释性 / 红队 / 评分卡 | | **测试** | `tests/*.py` | 公平性指标、鲁棒性扰动、评分器启发式的单元测试 | ## CLI命令 | 命令 | 描述 | |---|---| | `python main.py init` | 初始化治理SQLite数据库 | | `python main.py verify --plan ...` | 对P1模型运行公平性+鲁棒性+可解释性套件 | | `python main.py redteam --plan ...` | 对HF LLM目标运行PyRIT风格红队测试 | | `python main.py report` | 验证+红队+生成治理PDF | | `python main.py demo` | 完整端到端演示 | | `python main.py dashboard` | 启动6页Streamlit UI | ## 仪表板页面 1. **测试计划** — YAML项目文件 2. **公平性** — 按属性指标与结论 3. **鲁棒性** — 扰动稳定性表 4. **可解释性** — SHAP特征重要性图 5. **红队结果** — 攻击日志，可过滤并显示提示/响应对 6. **治理评分卡** — AI Verify原则汇总 + PDF下载 ## 技术栈 - **模型审计**：`fairlearn`（公平性指标）、`shap`（TreeExplainer） - **红队**：`pyrit`（Microsoft PyRIT — 转换器与提示变换） - **LLM目标**：`transformers` + `torch`（本地HuggingFace模型，默认为flan-t5-small） - **存储**：SQLite - **报告**：FPDF2 - **UI**：Streamlit - **测试计划**：YAML（AI Verify风格项目文件） ## 覆盖的AI Verify原则 | 原则 | 来源 | 结论逻辑 | |---|---|---| | **公平性** | Verify套件 | 所有组内指标在容差范围内则PASS | | **鲁棒性** | Verify套件 | 扰动稳定性 >= 0.9 则PASS | | **可解释性** | Verify套件 | SHAP重要性集中在Top-3的0.4–0.85比例则PASS | | **安全性** | 红队套件 | 任意高严重性攻击成功或成功率>50%则FAIL | | **透明度** | 衍生 | 生成并暴露可解释性工件则PASS | ## 项目结构 ``` ai-governance-platform/ ├── main.py # CLI dispatcher ├── config.py # Constants, thresholds, paths to P1 artefacts ├── dashboard.py # 6-page Streamlit UI ├── test_plans/ │ └── talent_scorer_audit.yaml # AI Verify-style project file ├── models/ │ └── tabular_loader.py # Loads P1 joblib + rebuilds feature matrix ├── verify/ │ ├── fairness.py # fairlearn-based group metrics │ ├── robustness.py # Perturbation stability tests │ ├── explainability.py # SHAP TreeExplainer │ └── runner.py # YAML-driven executor ├── redteam/ │ ├── llm_target.py # HF transformer wrapper │ ├── probes.py # 15 probes across 5 categories │ ├── scorers.py # Rule-based + regex leak detection │ └── orchestrator.py # PyRIT converters + probe loop ├── reports/ │ ├── scorecard.py # Principle verdict roll-up │ └── governance_report.py # FPDF2 PDF ├── db/ │ ├── schema.sql # test_runs, test_results, attack_runs │ └── storage.py # Connection + insert helpers └── tests/ ├── test_fairness.py ├── test_robustness.py └── test_probes_scorers.py ``` ## 先决条件运行前确保项目1的模型存在： ``` cd ../talent-analytics-platform python main.py init # seed the DB python main.py score # trains and saves talent_scorer.joblib ``` ## 运行测试 ``` python -m pytest tests/ -v ```

标签：AI Verify, AI治理, HuggingFace, Kubernetes, LLM目标, PDF报告, PyRIT, SEO, SHAP, SQLite, Streamlit, 人才评分, 公平学习, 公平性测试, 关键词优化, 分面, 可解释性, 合规报告, 国籍, 多智能体系统, 年龄区间, 扰动分析, 服务枚举, 本地模型, 模型审计, 流仪表板, 规则评分器, 访问控制, 评分卡, 逆向工具, 随机森林, 鲁棒性测试