nhomyk/AgenticQA

GitHub: nhomyk/AgenticQA

一个面向 AI 智能体开发的全生命周期安全治理平台，集宪法治理、合规扫描、红队强化和自愈 CI 于一体。

Stars: 1 | Forks: 0

# AgenticQA **全球首个智能体开发生命周期 (ADLC) 平台 —— 一个闭环、自我增强的周期，自主管控、生成、扫描、测试、自愈并发布功能：描述 → 生成 → 安全扫描 → 测试 → 自愈 → 发布 → 学习 → 循环。** 基于宪法治理、取证决策可追溯性、对抗性红队强化、HIPAA/GDPR/EU AI Act 合规性、Prompt 注入检测、LLM 模型回归测试、加密输出溯源、AI 模型 SBOM 生成、多智能体信任图谱分析、配备 3 层中间件的 API 安全强化、智能体安全模块（破坏性操作拦截、作用域租赁、指令持久化）、具备 git worktree 隔离的并发工作池、跨语言覆盖率映射（11 种语言）、支持 6 种语言的 MCP 扫描器，以及一个极简未来主义着陆页，为非技术用户提供通往整个周期的单一输入入口 —— 无需 LLM 参与治理层。 [![CI Pipeline](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/3b43771d7c174542.svg)](https://github.com/nhomyk/AgenticQA/actions/workflows/ci.yml) [![Pipeline Validation](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/8d4d9bf4e0174543.svg)](https://github.com/nhomyk/AgenticQA/actions/workflows/pipeline-validation.yml) [![Security Scan](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/caadbfe629174544.svg)](https://github.com/nhomyk/AgenticQA/actions/workflows/feature-request.yml) [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/nhomyk/AgenticQA/badge)](https://securityscorecards.dev/viewer/?uri=github.com/nhomyk/AgenticQA) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![2308+ Tests](https://img.shields.io/badge/tests-2308%2B-brightgreen.svg)](https://github.com/nhomyk/AgenticQA/actions) ## GitHub Marketplace Actions 基于 AgenticQA 构建的独立安全与合规扫描器 —— 免费，无需 API 密钥，结果直接显示在您的 GitHub Security 标签页中。 | Action | 功能 | Marketplace | |--------|-------------|-------------| | **MCP Security Scan** | 扫描 MCP 服务器和 AI 智能体的 24 类漏洞：工具投毒、SSRF、Prompt 注入、数据流污点分析 | [![Marketplace](https://img.shields.io/badge/Marketplace-mcp--scan--action-blue?logo=github)](https://github.com/marketplace/actions/mcp-security-scan) | | **EU AI Act Compliance** | 附件 III 风险分类 + 第 9/13/14/22 条合规性检查 —— 罚款最高可达 3000 万欧元 | [![Marketplace](https://img.shields.io/badge/Marketplace-eu--ai--act--check--action-blue?logo=github)](https://github.com/marketplace/actions/eu-ai-act-compliance-check) | | **AgenticQA Architecture Scan** | 映射 13 个 CWE 类别的所有集成点 —— 攻击面评分、测试覆盖率缺口、SARIF 输出 | [![Marketplace](https://img.shields.io/badge/Marketplace-agenticqa--scan--action-blue?logo=github)](https://github.com/marketplace/actions/agenticqa-architecture-scan) | ``` # 仅需 3 行代码即可实现完整的 AI 系统安全覆盖： - uses: nhomyk/agenticqa-scan-action@v1 # architecture map — 13 CWE categories - uses: nhomyk/mcp-scan-action@v1 # MCP/AI-specific security threats - uses: nhomyk/eu-ai-act-check-action@v1 # EU AI Act compliance ``` ## ADLC：闭环而非流水线每一种软件开发方法论 —— SDLC、CI/CD、DevSecOps —— 的发明都是为了回答同一个问题：*如何在不破坏现有功能的前提下更快地交付？* 每一种方法都假设在每一个关键交接点都有人工参与。 **智能体开发生命周期 (ADLC)** 移除了这些交接环节。它是一个闭环、自我增强的周期： ``` ┌─────────────────────────────────────────────────────────────┐ │ THE ADLC CYCLE │ │ │ │ 1. DESCRIBE ──▶ 2. GENERATE ──▶ 3. SCAN │ │ ▲ │ │ │ │ AgenticQA ▼ │ │ 7. LEARN ◀── 6. SHIP IT ◀── 5. HEAL ◀── 4. TEST │ │ │ │ Every phase governed · Every output signed · No handoffs │ └─────────────────────────────────────────────────────────────┘ ``` 人类作为周期的两端 —— 开始时*描述*功能，结束时*审查判定结果*。中间的所有环节均自主治理。每一个周期都会让下一个周期更智能：模式不断积累，阈值自适应调整，风险画像日益精准。 ### 它解决的三个问题 1. **缺乏治理。** AI 智能体执行操作 —— 部署、删除、委托 —— 没有可执行的法律来约束它们。智能体宪法在每次操作前以低于 5ms 的速度强制执行 `ALLOW / REQUIRE_APPROVAL / DENY`。 2. **缺乏取证。** 当出现问题时，无法重建智能体*为何*做出某种决定的原因。AgenticQA 使用 HMAC-SHA256 对每个输出进行签名，并为每条追踪记录生成取证审计工件。 3. **缺乏学习闭环。** 每次运行都是孤立的。知识无法积累。AgenticQA 的闭环反馈机制意味着每一个结果 —— 通过、失败、修复 —— 都会反馈到系统的模式记忆、自适应阈值和开发者风险画像中。 ## 突破性功能 ### 🚀 GitHub Action — 单行 CI 安全只需**一行代码**即可将 AI 智能体安全扫描添加到任何仓库： ``` # .github/workflows/agenticqa-scan.yml name: AgenticQA Security Scan on: [push, pull_request] permissions: contents: read security-events: write jobs: security-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: nhomyk/AgenticQA@main with: path: '.' fail-on-critical: 'true' sarif: 'true' ``` **免费获得的功能：** - 跨 6 种语言（Python、TypeScript、JavaScript、Go、Rust、Java/Kotlin）的 **13 类安全扫描** - **SARIF 上传** —— 扫描结果将显示在 GitHub 的 Code Scanning 标签页中，与 CodeQL 并列 - **PR 自动评论** —— 扫描结果直接发布到 Pull Request 中 - **增量扫描** —— 基线缓存显示自上次推送以来的变化 - **安全评分** —— A+ 到 F 级，以主要 AI 框架为基准 - **工件上传** —— 完整的 JSON 报告保留 30 天 | Input | Default | Description | |---|---|---| | `path` | `.` | 要扫描的仓库路径 | | `fail-on-critical` | `true` | 发现严重问题时使工作流失败 | | `sarif` | `false` | 生成用于 GitHub Code Scanning 的 SARIF | | `scanners` | `all` | 逗号分隔的扫描器列表 | | `pr-comment` | `true` | 将结果作为 PR 评论发布 | | `baseline` | *(auto)* | 用于增量比较的上一次扫描 JSON | | Output | Description | |---|---| | `total-findings` | 所有扫描器的发现总数 | | `critical-findings` | 严重发现数量 | | `risk-level` | 总体风险：low/medium/high/critical | | `security-grade` | A+ 到 F 级评分 | | `security-percentile` | 相对于基准框架的 0-100 百分位 | | `detected-languages` | 逗号分隔的语言列表 | ### 🔄 并发功能请求流水线 — 自主发布功能提交功能请求，AgenticQA 将通过隔离的并发工作池并行生成、扫描、测试并提交代码： ``` # 通过 GitHub Actions workflow_dispatch 触发 gh workflow run feature-request.yml \ -f features='["Add health check endpoint", "Add input validation", "Add request logging"]' ``` **架构：** - **WorkerPool** —— 可配置的并发工作线程，每个都在隔离的 git worktree 中运行 - **原子队列提取** —— `BEGIN IMMEDIATE` SQLite 事务防止竞态条件 - **线程安全存储** —— 每个 DB 方法都受 `threading.Lock` 保护 - **按工作线程编排** —— 每个工作线程独立运行完整的 ADLC 周期： 1. 预扫描智能体（SRE、QA、Performance）分析代码库 2. LLM 生成实现代码 3. 安全扫描 + 测试生成循环（SDET） 4. Git 提交到唯一的功能分支 5. 工作流工件生成于 `.agenticqa/workflows/{id}.md` ``` # 检查队列状态 curl http://localhost:8000/api/workflow/queue-status # → {"pending": 0, "in_progress": 2, "completed": 5, "failed": 0} # 通过编程方式提交功能请求 curl -X POST http://localhost:8000/api/workflow/submit \ -H "Content-Type: application/json" \ -d '{"title": "Add OAuth support", "description": "Implement OAuth2 login flow"}' ``` **生命周期追踪：** `RECEIVED → PLANNED → APPROVED → QUEUED → IN_PROGRESS → COMPLETED` 包含生命周期事件、分支名称、提交 SHA 和工作线程分配的完整审计追踪。 ### ✅ 客户端部署预检 — 7 项探测就绪检查在任何目标仓库上运行 AgenticQA 之前，验证环境是否已正确配置： ``` python -m agenticqa.client_preflight --repo /path/to/client/repo # ✅ git_repo 仓库是一个有效的 git repo # ✅ git_config user.name 和 user.email 已配置 # ✅ python_tooling Python 3.11.0, pip 24.0, pytest 8.0.0 # ✅ node_tooling 不需要（无 package.json） # ✅ linter_availability flake8 适用于 Python # ✅ path_sanitization 路径通过安全验证 # ✅ import_chain 所有核心模块均可导入 # 用于 CI 集成的 JSON 输出 python -m agenticqa.client_preflight --repo . --json # 严格模式 — 出现警告时也失败 python -m agenticqa.client_preflight --repo . --fail-on-warning ``` | Probe | Checks | Fail/Warn | |---|---|---| | `git_repo` | 是否为 git 仓库？ | fail | | `git_config` | user.name + user.email 是否已设置？ | fail | | `python_tooling` | Python, pip, pytest 是否可用？ | fail | | `node_tooling` | 如果存在 package.json：node/npm 是否可用？ | warn | | `linter_availability` | 针对检测到的语言是否有 flake8/eslint？ | warn | | `path_sanitization` | 仓库路径是否通过 `sanitize_repo_path()` 检查？ | fail | | `import_chain` | `agenticqa.workflow_requests` 是否可加载？ | fail | 退出代码：`0` = 健康，`1` = 严重故障，`2` = 仅有警告。 ### ⚖️ 智能体宪法 — 行业首创全球没有其他智能体平台做到这一点。AgenticQA 附带一份**机器可读的治理文档**，每个智能体在执行操作前都会强制执行该文档： ``` # 任何 agent，任何位置，都可以查询法律 curl http://localhost:8000/api/system/constitution # 行动前检查 — 即时 ALLOW / REQUIRE_APPROVAL / DENY curl -X POST http://localhost:8000/api/system/constitution/check \ -H "Content-Type: application/json" \ -d '{"action_type": "delete", "context": {"ci_status": "FAILED", "trace_id": "tr-001"}}' # → {"verdict": "DENY", "law": "T1-001", "name": "no_destructive_without_ci", "reason": "..."} ``` **三个执行层级，毫秒级执行：** | Tier | Verdict | Laws | |---|---|---| | **Tier 1** | `DENY` | 未经 CI 通过不得执行破坏性操作 · 委托深度 ≤ 3 · 日志中不得包含 PII · 不得进行无追踪的外部写入 · 不得自我修改 · **智能体文件范围违规** | | **Tier 2** | `REQUIRE_APPROVAL` | 生产环境部署 · 基础设施变更 · 批量操作 >1K 条记录 | | **Tier 3** | Alert | 低置信度 · 高失败率 · RAG 相似度下降 | 门禁系统强制执行语义别名和抗拼写错误检查 —— `"delet"`、`"clean"`、`"wipe"`、`"release"`、`"ship"`、`"sync"` 都会路由到正确的法律条款。这是企业买家在评估智能体平台是否符合 SOC 2、HIPAA 和 GDPR 合规性时所要求的工件。没有竞争对手拥有此功能。 ### 🔴 红队智能体 — 对抗性自我强化 AgenticQA 的第八个智能体会探测自身的治理栈以寻找绕过方法，修补发现的漏洞，并提出宪法修正案 —— 无需人工干预： ``` curl -X POST http://localhost:8000/api/red-team/scan \ -H "Content-Type: application/json" \ -d '{"mode": "fast", "target": "both", "auto_patch": true}' # → {"bypass_attempts": 20, "scanner_strength": 0.64, "gate_strength": 1.0, # "patches_applied": 3, "proposals_generated": 2, # "prompt_injection_surface": 0.0, "prompt_injection_findings": 0, "status": "patched"} ``` **跨 4 个攻击类别的 20 种绕过技术：** | Category | Techniques | |---|---| | **credential_obfuscation** | base64 编码的密钥、分割字段令牌、反转密钥、十六进制编码、嵌套 JSON | | **shell_injection** | 拼接的 `rm`+`-rf`、base64 `curl\|bash`、环境变量间接引用、换行符分割的 curl、Unicode 同形字 | | **path_traversal** | URL 编码的 `%2e%2e`、三点、Windows 反斜杠、空字节注入 | | **constitutional_gate** | 拼写错误劫持操作、破坏性别名、部署别名、深度作为字符串、批量别名、空 trace_id | OutputScanner 通过 **4 轮解码架构** 进行防御：原始 JSON → Unicode NFKC 标准化 → base64 解码 → URL 解码。发现的绕过模式会持久化到 `.agenticqa/red_team_patterns.json`，并在每次未来的扫描中自动加载。宪法漏洞会写入 `.agenticqa/constitutional_proposals.json` 供人工审查 —— T1-005 法律防止智能体修改其自身的治理文件。 ### 🔍 Prompt 注入静态分析每次代码库扫描现在都会检查 **Prompt 注入攻击面** —— 用户控制的输入未经过滤直接流入 LLM 系统 Prompt 的代码路径： ``` curl "http://localhost:8000/api/redteam/prompt-injection?repo_path=." # → {"surface_score": 0.85, "total_findings": 3, "findings": [ # {"rule_id": "PROMPT_INJECTION_SURFACE", "severity": "critical", # "file": "app/api/chat/route.ts", "line": 14, # "message": "用户控制的输入直接拼接到 LLM prompt 中..."}]} ``` **4 条检测规则（SARIF 原生，安全严重性 7.0–9.5）：** | Rule | Severity | What It Catches | |---|---|---| | `PROMPT_INJECTION_SURFACE` | critical (9.5) | f-string / 模板字符串中的用户输入直接赋值给 `system`/`prompt` 变量 | | `SYSTEM_PROMPT_OVERRIDE` | high (9.0) | 用户控制消息数组中的 `role:` 字段 —— 攻击者可以注入 system-role 消息 | | `TEMPLATE_INJECTION` | high (8.0) | 使用用户控制数据进行的 `.format()` / `%` / Jinja2 模板渲染 | | `UNVALIDATED_LLM_OUTPUT` | medium (7.0) | LLM 响应直接传递给 `eval()`、`subprocess`、`os.system` 或 `innerHTML` | 扫描结果会显示在每次 CI 运行的 Red Team 步骤摘要中，与绕过尝试和扫描器强度并列。 ### 🛡️ 法律、HIPAA 及监管合规扫描 ComplianceAgent 现在在每个仓库上运行 **四个静态扫描器** —— 纯 Python 实现，无子进程，500 个文件的代码库可在亚秒级完成扫描： #### 法律风险扫描器 ``` curl "http://localhost:8000/api/compliance/legal-risk?repo_path=." # → {"risk_score": 0.92, "total_findings": 6, "critical_findings": 4, "findings": [ # {"rule_id": "CREDENTIAL_EXPOSURE", "severity": "critical", # "file": "app/api/route.ts", "line": 5, # "message": "硬编码的 MongoDB Atlas URI 包含嵌入凭据"}]} ``` | Rule | Severity | Detects | |---|---|---| | `CREDENTIAL_EXPOSURE` | critical/high | MongoDB Atlas URI、AWS AKIA 密钥、OpenAI `sk-` 密钥、私钥材料、硬编码密码 | | `PII_DOCUMENT_PUBLIC` | critical/high | 提交到 `public/`、`static/`、`assets/` 目录的法律/就业文档 | | `PRIVILEGE_BREACH` | | 30 行内出现文件内容读取 + LLM API 调用 —— 破坏律师-客户特权 (ABA Rule 1.6) | | `SSRF_RISK` | medium | 硬编码的 `localhost:PORT` URL 用作代理目标 —— 如果路径受用户控制则存在潜在 SSRF | | `NO_AUTH_ROUTE` | medium | 没有身份验证检查的 Next.js/Express 路由处理器 | #### HIPAA PHI 扫描器 ``` curl "http://localhost:8000/api/compliance/hipaa?repo_path=." # → {"risk_score": 0.95, "total_findings": 4, "critical_findings": 3} ``` | Rule | Severity | Detects | |---|---|---| | `PHI_HARDCODED` | critical | 源代码中的 SSN 字面量（`XXX-XX-XXXX`）、DOB/MRN 变量赋值 | | `PHI_TO_LLM` | critical | PHI 变量名出现在 LLM API 调用的 30 行范围内 —— 需要 HIPAA BAA (§164.502(e)) | | `PHI_DOCUMENT_PUBLIC` | critical | 提交到 Web 可访问目录的 HL7、FHIR、患者 CSV/JSON 文件 | | `PHI_IN_LOGS` | high | PHI 字段名（`patient_id`、`diagnosis`、`ssn`）传递给日志接收器 (§164.312(b)) | | `HIPAA_AUDIT_MISSING` | high | 健康数据路由（`/api/patient/`、`/api/health/`）缺少 `audit_log()` 调用 | ### 🇪🇺 EU AI Act 合规层 **2026 年 8 月全面实施。** 高风险 AI 系统（就业、法律、信贷、教育、关键基础设施）面临最高 3000 万欧元或全球营业额 6% 的罚款。AgenticQA 自动生成合规性证据： ``` curl "http://localhost:8000/api/compliance/ai-act?repo_path=." # → {"risk_category": "high_risk", "annex_iii_match": ["legal", "employment"], # "conformity_score": 0.25, "findings": [ # {"article": "Art.9", "status": "missing", "severity": "critical", # "remediation": "创建 RISK_MANAGEMENT.md 并包含风险注册表..."}, # {"article": "Art.22", "status": "missing", "severity": "critical", # "remediation": "在任何通过/失败决策前添加 human_override()..."}]} ``` | Article | Checks For | Missing → Severity | |---|---|---| | **Art. 9** | `RISK_MANAGEMENT.md`、风险登记册、代码级回退处理程序 | critical | | **Art. 13** | UI 代码或 API 响应中的 "AI-generated" 披露 | high | | **Art. 14** | `require_human_review`、覆盖机制、审计日志 | high | | **Art. 22** | LLM 输出在无人工覆盖和申诉的情况下用作通过/失败决策 | critical | 附件 III 分类扫描 README、`package.json` 和配置文件，识别 7 个高风险类别（就业、法律、信贷、教育、关键基础设施、生物识别、执法）。 ### 🔐 AI 输出溯源 — 加密监管链每次智能体执行都会自动**签名并记录**。证明 AI 说了什么、何时说的以及使用了哪个模型： ``` # 通过哈希验证任何输出 curl "http://localhost:8000/api/provenance/verify?output_hash=a3f9c1b2...&agent=sre_agent" # → {"valid": true, "reason": "valid", "record": { # "model_id": "claude-sonnet-4-6", "agent_name": "sre_agent", # "timestamp": "2026-02-26T14:22:01+00:00", "run_id": "12345678", # "output_length": 2847}} # agent 的审计链 curl "http://localhost:8000/api/provenance/chain?agent=compliance_agent&limit=20" ``` - **HMAC-SHA256** 签名：使用 `AGENTICQA_PROVENANCE_SECRET` 对 `hash(output) | model_id | timestamp | agent_name` 进行签名 - **常量时间比较** 防止针对验证的时序攻击 - **篡改检测**：`signature_mismatch` / `not_found` / `valid` 状态 - 存储于 `.agenticqa/provenance/{agent}.jsonl` —— 包含在 CI 学习缓存中 ### 📐 LLM 模型回归测试当您更换模型（Sonnet → Haiku，GPT-4o → GPT-4o-mini）时，智能体行为是否会倒退？AgenticQA 可以量化回答： ``` curl "http://localhost:8000/api/regression/compare?agent=sre_agent&baseline_model=claude-sonnet-4-6&candidate_model=claude-haiku-4-5" # → {"similarity_score": 0.43, "regression_detected": true, # "threshold_used": 0.75, "has_baseline": true} ``` - **Golden snapshots（黄金快照）** 在每次成功运行时从 `BaseAgent._record_execution()` 自动捕获 - **Embedding 策略**：fastembed（本地，无 API 调用）→ TF-IDF 256 桶余弦回退 - **阈值**：通过 `ThresholdCalibrator` 自适应（默认 0.75）；如果相似度低于阈值则标记为回归 - 快照作为 `artifact_type="llm_golden"` 存储在 `TestArtifactStore` 中 —— 与所有其他智能体使用相同的学习流水线 ### 🔬 取证 AI 决策审计报告每次智能体执行都会生成一个具有稳定审计 ID 的**可共享合规工件** —— 可直接嵌入到 Pull Request 描述中： ``` curl "http://localhost:8000/api/observability/traces/{trace_id}/audit-report?format=markdown" ``` ``` ┌─────────────────────────────────────────────────────┐ │ AUDIT REPORT — audit_id: a3f9c1b2d4e8 │ │ Verdict: ✅ PASS │ │ Decision Quality: 0.87 | Completeness: 0.94 │ │ Agents: QA_Assistant, SDET_Agent, DevOps_Agent │ │ Root Causes: none │ │ Recommendations: none │ └─────────────────────────────────────────────────────┘ ``` 不是日志转储。而是一份具有 SHA-256 可追溯性、反事实分析（“智能体本该怎么做？”）和根本原因归因的取证判定。 ### 🛠️ 自愈 CI — 循环中的测试修复 SREAgent 通过子进程沙箱修复循环来解决测试失败问题： 1. Haiku 读取失败的测试 + 错误消息（上限 4000 字符） 2. 生成修补版本 3. 在隔离的子进程中验证修复 —— 在确认通过之前绝不触碰生产代码 4. 如果沙箱运行通过则自动应用；将修复记录到工件存储中 ``` sre.execute({ "file_path": "src/feature.py", "errors": [...], "failing_tests": [{"test_file": "tests/test_feature.py", "test_name": "test_edge_case", "error_message": "AssertionError: expected 42, got None"}] }) # → {"fixes_applied": 3, "tests_repaired": 1, "test_repairs": [...]} ``` ### 🏭 智能体工厂 — 自然语言到受治理智能体用通俗易懂的英语描述一个智能体；工厂将脚手架搭建出一个完全受治理、符合宪法规定的智能体类： ``` curl -X POST http://localhost:8000/api/agent-factory/from-prompt \ -H "Content-Type: application/json" \ -d '{"description": "An agent that monitors S3 bucket sizes and alerts when storage exceeds thresholds"}' # → {"spec": {...}, "scaffold": "class StorageMonitor_Agent(BaseAgent): ...", "persisted": true} ``` 工厂自动将智能体的能力插入到 Task-Agent 本体中 —— 新的智能体类型可即时路由，无需任何手动 YAML 编辑。 ### 🔒 API 安全强化 — 3 层中间件栈每个 API 请求在到达任何端点之前都会经过三个安全中间件： ``` # 认证请求（首次启动时自动生成 token，打印到 stderr） curl -H "Authorization: Bearer $AGENTICQA_AUTH_TOKEN" \ http://localhost:8000/api/agents/execute -X POST -d '{...}' # Health + docs endpoints 绕过认证 curl http://localhost:8000/health # always accessible ``` | Middleware | What It Does | |---|---| | **BearerTokenMiddleware** | 时序安全的 `hmac.compare_digest` 认证；如果未设置 `AGENTICQA_AUTH_TOKEN` 则自动生成令牌；跳过 `/health` 和 `/docs`；通过 `AGENTICQA_AUTH_DISABLE=1` 禁用 | | **OriginValidationMiddleware** | 仅限 Localhost 的 Origin 头（DNS 重绑定防御）；无 Origin 请求（curl, CI）始终通过 | | **ResponseScanMiddleware** | 对每个 JSON 响应运行 OutputScanner；软模式添加警告头；`AGENTICQA_RESPONSE_SCAN_STRICT=1` 阻止泄露凭证的响应 | 额外强化： - CORS `allow_origins` 锁定到明确的 localhost 列表（不再使用 `*`） - `/api/agents/execute` 在分发前运行宪法检查（`_constitutional_check("run_agents")`） - 模仿 Docker 的 MCP-gateway 防御模式 ### 🛡️ 智能体安全模块 — 运行时护栏三个独立的安全模块，可组合实现纵深防御： #### 破坏性操作拦截器 ``` # 任何工具调用前的预检 curl -X POST http://localhost:8000/api/safety/intercept \ -d '{"tool": "rm", "args": ["-rf", "/data"], "agent": "sre_agent"}' # → {"classification": "destructive", "requires_approval": true, "token": "abc123"} # 人工批准 curl -X POST http://localhost:8000/api/safety/approve/abc123 ``` 将每个工具调用分类为 4 个层级：`safe → reversible → irreversible → destructive`。破坏性和不可逆操作需要通过基于令牌的队列进行明确的人工批准。 #### 智能体作用域租赁管理器 ``` curl -X POST http://localhost:8000/api/safety/lease \ -d '{"agent": "sre_agent", "max_reads": 100, "max_writes": 10, "max_deletes": 0, "ttl_seconds": 300}' # → {"lease_id": "lease-abc", "expires_at": "2026-03-01T15:05:00Z"} ``` 每个智能体的硬性操作上限：读取、写入、删除、执行。`check_and_consume()` 是硬性阻断 —— 没有软警告。租赁通过 TTL 过期，并可立即撤销。 #### 指令持久化卫士 ``` curl -X POST http://localhost:8000/api/safety/warden/check \ -d '{"agent": "compliance_agent", "context_usage_pct": 75}' # → {"compaction_risk": "high", "constraint_drift": false, # "recommended_action": "re_inject", "guardrail_block": "..."} ``` 在 50%/75%/90% 阈值监控上下文窗口压缩风险。检测约束漂移（智能体遗忘其护栏）。建议 `continue | re_inject | pause | terminate`。生成用于重新注入到压缩上下文中的 `GuardrailBlock`。 ### 📦 AI 模型 SBOM — AI 软件物料清单自动发现代码库中的每个 AI 模型依赖项，并标记许可和弃用风险： ``` curl "http://localhost:8000/api/compliance/ai-model-sbom?repo_path=." # → {"models_found": 4, "total_findings": 2, "findings": [ # {"rule_id": "RESTRICTED_LICENSE", "severity": "high", # "file": "src/model.py", "line": 12, # "message": "模型 'meta-llama/Llama-2-7b' 使用受限许可证 (Llama 2 Community)"}]} ``` | Detection | Coverage | |---|---| | **Import patterns** | 25+ 提供商模式：OpenAI, Anthropic, Google, HuggingFace, Cohere, Replicate, Meta, Mistral 等 | | **Model ID extraction** | `from_pretrained()`、`model=`、`GenerativeModel()`、pipeline 字符串 | | **License registry** | 50+ 具有已知许可分类的模型 | | Finding | Severity | What It Flags | |---|---|---| | `UNKNOWN_LICENSE` | high | 使用的模型没有已知的许可分类 | | `RESTRICTED_LICENSE` | high | 具有非商业或受限许可条款的模型 | | `EXTERNAL_API` | medium | 对模型提供商的外部 API 调用（数据出口风险） | | `DEPRECATED_MODEL` | medium | 已知被弃用的模型版本 | | `UNVERSIONED_MODEL` | medium | 没有固定版本的模型引用 | ### 🕸️ 多智能体信任图谱 — 框架感知分析检测跨 **14+ 框架** 的多智能体架构，并标记信任、委托和人工监督违规： ``` curl "http://localhost:8000/api/redteam/agent-trust-graph?repo_path=." # → {"frameworks_detected": ["langchain", "crewai"], "agents_found": 5, # "total_findings": 3, "findings": [ # {"rule_id": "CIRCULAR_TRUST", "severity": "critical", # "message": "检测到循环委托链：agent_a → agent_b → agent_a"}, # {"rule_id": "MISSING_HUMAN_IN_LOOP", "severity": "high", # "message": "未发现人工覆盖机制 — 违反 EU AI Act Art.14"}]} ``` | Frameworks Detected | |---| | LangGraph, CrewAI, AutoGen, LangChain, Swarm, Semantic Kernel, Haystack, DSPy, ControlFlow, BabyAGI, MetaGPT, CAMEL, TaskWeaver, OpenAI Assistants | | Finding | Severity | What It Detects | |---|---|---| | `CIRCULAR_TRUST` | critical | 委托图中的 DFS 循环检测 | | `MISSING_HUMAN_IN_LOOP` | critical | 无 `require_human_review`、覆盖或审批机制 —— EU AI Act Art.14 证据缺口 | | `UNCONSTRAINED_DELEGATION` | high | 智能体可以在没有范围限制的情况下委托给任何其他智能体 | | `PRIVILEGED_TOOL_ACCESS` | high | 智能体有权访问破坏性工具（文件删除、Shell 执行、DB 删除） | | `ESCALATION_PATH` | medium | 委托链在没有审批门的情况下到达更高权限的智能体 | ### 🌍 MCP 扫描器 — 多语言安全分析 MCP (Model Context Protocol) 扫描器现在可分析 **6 种语言** 的代码库，检测凭证泄露、注入风险和数据流违规： ``` # 扫描 Go/Rust/Java 项目 python -m agenticqa.security.mcp_scanner --repo /path/to/project # → {"files_scanned": 63, "risk_score": 1.0, "findings": [ # {"file": "interceptors.go", "pattern": "log.Logf credential", # "severity": "high", "language": "go"}]} ``` | Language | File Discovery | Key Patterns | |---|---|---| | **Python** | `*.py` | f-string 密钥、Prompt 中的 `os.environ`、`eval()`/`exec()` | | **TypeScript/JavaScript** | `*.ts`, `*.js`, `*.tsx`, `*.jsx` | 模板字符串注入、`process.env` 暴露、`innerHTML` | | **Go** | `*.go` | Prompt 中的 `os.Getenv()`、`log.Logf` 凭证泄露、`/bin/sh -c` 注入、`os.Environ()` | | **Rust** | `*.rs` | `std::env::var` 暴露、包含用户输入的 unsafe 块、`Command::new` 注入 | | **Java/Kotlin** | `*.java`, `*.kt` | Prompt 中的 `System.getenv()`、`Runtime.exec()`、JDBC 连接字符串 | | **Swift** | `*.swift` | `ProcessInfo.processInfo.environment` 暴露 | 数据流追踪器跨函数边界跟踪 source → transform → sink 路径。跳过目录：`vendor` (Go)、`target` (Rust)、`.gradle`/`.mvn` (Java)。 ### 🔧 TypeScript/JavaScript SRE — 完整 Linter 支持 SRE 智能体现在支持 TypeScript 和 JavaScript 项目，具备与 Python 相同的自动修复能力： ``` sre.execute({ "language": "typescript", "file_path": "src/app.ts", "errors": [{"rule": "no-var", "line": 5, "message": "Unexpected var, use let or const"}] }) # → {"fixes_applied": 3, "fix_rate": 0.75, "architectural_violations": 1} ``` **Linter 检测优先级：** oxlint（通过 pnpm `node_modules`）→ 系统 oxlint → ESLint 回退。当存在 `tsconfig.json` 或 `language` 以 `ts` 开头时自动检测 TypeScript。 **30+ 自动修复规则：** `no-var`、`prefer-const`、`unicorn/*`、`typescript/*`、`@typescript-eslint/*` 等。 **架构规则**（从修复率中排除 —— 刻意的设计选择）： `typescript/no-explicit-any`、`@typescript-eslint/no-explicit-any`、`no-shadow`、`complexity`、`oxc/no-accumulating-spread`、`import/no-cycle` 同时支持 oxlint `{"diagnostics":[...]}` 和平面数组 ESLint JSON 格式。像 `{"name":"no-var","plugin":"eslint"}` 这样的规则对象会自动标准化。 ### 🌐 极简 AI 着陆页 — 连接非技术与技术用户深色极简着陆页（`public/index.html`，在 `GET /` 提供服务）使 AgenticQA 对产品经理、安全官和高管触手可及 —— 无需终端： ``` python agent_api.py # start API on :8000 open http://localhost:8000 # landing page ``` **页面功能：** - 动画英雄区：在青色 CSS 网格上显示 *"Ship features. Fearlessly."*（背景 `#080b10`） - 单一 AI 功能输入框 —— 输入功能描述，按 Cmd+Enter - **5 步进度覆盖层** 实时动画：Architecture → Security → Code Gen → Tests → Release - **判定卡片** 带有渐变边框和 4 个指标磁贴：security · tests · coverage · time - 根据 `POST /api/demo/submit` 返回 `SHIP IT`（绿色）或 `REVIEW REQUIRED`（琥珀色） - "View in Dashboard →" 链接到 `:8501` 上的 Stream - "For advanced users" 部分展示 6 个仪表板模块非技术用户获得单字段 UI。高级用户获得完整的 16 页分析仪表板。同样的后端，同样的治理。 ### 🚀 自主功能流水线 — 端到端用通俗易懂的英语描述一个功能。流水线将生成代码、扫描、测试、自愈故障并给出判定 —— 零人工步骤： ``` python run_demo.py # 默认使用预写的 stub（无需 API key） ANTHROPIC_API_KEY=sk-ant-... python run_demo.py # 真实的 Claude Haiku 生成 UI 组件 ``` **8 步演示流程：** 1. 创建临时仓库 2. 运行 5 阶段引导（架构 + 安全 + 覆盖率 + 测试生成 + 基线） 3. 通过 `POST /api/demo/submit` 提交功能描述 4. 架构扫描 → 安全扫描 → UI 桩代码生成 5. 生成 UI 测试（Streamlit AppTest / Jest / Vitest） 6. 如果任何测试失败 → LLM 重写失败代码 → 安全重新扫描 → 重新测试（最多 `max_ui_retries=2` 次） 7. 记录覆盖率增量 8. 判定：`SHIP IT` 或 `REVIEW REQUIRED` **自愈循环（在 `agent_api.py` + `POST /api/pipeline/ui-test-scan` 中）：** ``` for attempt in range(max_ui_retries): result = run_ui_tests(generated_code) if result["passed"]: break generated_code = llm_rewrite(generated_code, result["failures"]) security_result = security_scan(generated_code) ``` **`POST /api/demo/submit`** —— 公共端点，无需认证： ``` curl -X POST http://localhost:8000/api/demo/submit \ -H "Content-Type: application/json" \ -d '{"description": "Add a login form with OAuth support"}' # → {"verdict": "SHIP IT", "elapsed_s": 4.2, # "security": {"findings": 0}, "tests": {"passed": 3}, "coverage": 0.34} ``` ### 📦 仓库引导 + 覆盖率智能将 AgenticQA 放入任何仓库，通过一次 API 调用即可获得完整基线： ``` curl -X POST http://localhost:8000/api/onboarding/run \ -H "Content-Type: application/json" \ -d '{"repo_path": "."}' # → {"phases_completed": 5, "architecture": {...}, "security": {...}, # "coverage": {"mapped_files": 42, "coverage_pct": 0.33}, # "generated_tests": 7, "baseline_delta": {"trend": "improving"}} ``` **5 阶段引导编排器：** | Phase | What It Does | |---|---| | **Architecture** | 扫描导入、HTTP 调用、ENV 使用、EXTERNAL_HTTP 暴露 | | **Security** | 8 项扫描：法律风险 · HIPAA · EU AI Act · Prompt 注入 · CVE 可达性 · AI 模型 SBOM · 智能体信任图谱 · MCP 扫描器 | | **Coverage** | 映射源文件 → 测试文件；计算每种语言的覆盖率 % | | **Test Generation** | LLM 为未映射的源文件生成测试；通过编译 + 仅收集模式验证 | | **Baseline** | 在 `~/.agenticqa/baselines/{repo_id}.json` 捕获 `BaselineDelta` 快照；趋势：improving / stable / declining | **CoverageMapper**（`src/agenticqa/onboarding/coverage_mapper.py`）支持跨 **11 种语言** 的词干变体匹配：Python, TypeScript, Go, Swift, Ruby, Java, Kotlin, JavaScript, Rust, C#, PHP。 `AuthService.py` 匹配 `test_auth.py`、`auth_test.py`、`AuthServiceTest.java` —— 无需配置。 **端点：** - `POST /api/onboarding/run` —— 运行完整的 5 阶段引导 - `GET /api/onboarding/status` —— 检索存储的基线和最后一次增量 **仪表板 "Onboarding" 页面：** Architecture | Security | Coverage | Generated Tests | Baseline Delta 标签页。 ### 🧠 会学习的智能体 — 无需重训练 **案例推理 (CBR)** —— 针对历史嵌入的确定性模式匹配。无需重训练。无漂移。 | | LLM-Based Agents | AgenticQA | |---|---|---| | **Cost per 1K decisions** | $30–100 | **$1** | | **Latency** | 2–5 seconds | **10–50ms** | | **Deterministic?** | No | **Yes** | | **Works offline?** | No | **Yes** | | **Gets better over time?** | Requires retraining | **Automatic** | **闭环 ML 学习循环（5 个阶段）：** 反馈循环 → 自适应阈值 → 模式驱动执行 → GraphRAG 知情委托 → 自适应策略选择（激进 / 标准 / 保守）。 ### 🏥 DataflowHealthMonitor — 本体感知的基础设施健康 ``` python -m agenticqa.monitoring.dataflow_health # → ✅ qdrant vector_store healthy 786 pts (critical) # → ✅ weaviate vector_store healthy v1.27.0 (secondary) # → ✅ neo4j graph_db healthy delegation store # → ✅ artifact_store file_system healthy 1534 artifacts curl http://localhost:8000/api/health/dataflow # → {"healthy": true, "broken_nodes": [], "affected_agents": {}} ``` 当 Qdrant 宕机时，响应会列出所有 8 个受影响的智能体。当 Neo4j 故障时，它只列出 4 个具备委托能力的智能体。**监控器读取与智能体使用的相同的本体。** ### 📊 SARIF 2.1.0 导出 — GitHub Code Scanning 中的发现 AgenticQA 的发现原生生成的显示在 **GitHub 的 Code Scanning 仪表板** 中，与 CodeQL 并列 —— 包含规则 ID、行号和安全严重性评分： ``` python -m agenticqa.export.sarif --sre sre-output.json --compliance compliance-output.json \ --redteam redteam-output.json --out results.sarif ``` **25+ SARIF 安全严重性映射** — linting、shell、bandit、CVE、法律风险、HIPAA PHI、Prompt 注入、EU AI Act 和 AI 输出溯源（`UNATTESTED_OUTPUT = 8.5`）。 ## 系统架构在您的整个 CI/CD 流水线中，八个专业智能体处于宪法治理之下： ``` Natural-language prompt / CI trigger / GitHub Action │ ▼ ┌─────────────────────────────────────────────────────┐ │ API Security Middleware (3 layers) │ │ Bearer auth · Origin validation · Response scanning │ └─────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ ConstitutionalGate + OutputScanner (Red Team) │ │ pre-action ALLOW/DENY · 4-pass decode · <5ms │ └─────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Agent Safety Modules │ │ DestructiveActionInterceptor · ScopeLeaseManager │ │ InstructionPersistenceWarden · approval queue │ └─────────────────────────────────────────────────────┘ │ ┌────┴──────────────────────────────────────┐ ▼ ▼ ▼ ▼ SDET QA Fullstack RedTeam Agent Agent Agent Agent │ │ delegates Adversarial probes ▼ + trust graph analysis SRE ←──── Self-Healing ────────────┘ Agent CI Repair (Python + TS/JS) │ ▼ Compliance ── Legal · HIPAA · EU AI Act · SBOM ──▶ violations[] Agent provenance + regression + trust graph │ ▼ DevOps Performance Agent Agent │ ▼ ┌──────────────────────────────────────────────┐ │ Hybrid RAG Layer │ │ Qdrant (primary) · Neo4j (graph) · SQLite │ │ Model Regression · Output Provenance │ │ Learning Metrics · Developer Profiles │ │ Org Memory · Repo Profiles │ └──────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────┐ │ Concurrent Feature Request Pipeline │ │ WorkerPool · git worktree isolation │ │ Atomic SQLite pickup · thread-safe store │ └──────────────────────────────────────────────┘ │ ▼ SARIF → GitHub Code Scanning + PR Comment + Security Grade ``` ## ADLC 周期 — 每个阶段的作用 | Phase | Agent(s) | What Happens | Feeds Back Into | |---|---|---|---| | **1. Describe** | *(user)* | 通过着陆页用通俗易懂的英语描述功能 | Feature → code prompt | | **2. Generate** | Fullstack + LLM | 生成 UI 代码，写入仓库 | Security scan input | | **3. Scan** | ComplianceAgent + ArchitectureScanner | 跨 6 种语言的 13 类安全扫描；上下文感知严重性 | Verdict gate | | **4. Test** | SDET + FrontendTestRunner | 自动生成测试；无头运行 | Self-heal trigger | | **5. Heal** | SRE + LLM | 失败测试 → LLM 重写 → 安全重新扫描 → 重新测试（最多 2 个循环） | Updated code + test results | | **6. Ship** | QA + ConstitutionalGate | 发布 SHIP IT / REVIEW REQUIRED 判定；签署溯源链 | Artifact store | | **7. Learn** | All agents | 通过/失败反馈到自适应阈值、开发者画像、组织记忆 | Future cycle Phase 1 | **每个周期都会增强下一个周期。** 50 个周期后：96% 模式置信度、按开发者的风险画像、跨仓库的制度记忆。 ## 关键成果 | **Model Regression** | 模型交换时的黄金快照 + 余弦相似度 | 部署前检测到回归 | | **Red Team Hardening** | 20 次对抗性绕过尝试，修补扫描器，提议修正案 | gate_strength 100%, scanner_strength 64%+ | | **Pattern Learning** | 闭环反馈：提升/惩罚文档，自适应阈值 | 50 次部署后 96% 置信度 | | **Constitutional Enforcement** | 每次破坏性操作前的预操作检查 | 零未授权破坏性操作 | | **Autonomous UI Generation** | LLM 从描述构建前端，0 个严重安全发现 | 5 秒内 SHIP IT | | **UI Self-Healing** | 自动生成测试，重写失败代码，重新验证 | 需要 0 次人工干预 | | **Coverage Mapping** | 跨 11 种语言的源文件→测试文件词干匹配 | 首次功能从 0.0% → 33%+ | | **GitHub Action** | 单行 CI 集成，包含 SARIF、PR 评论、增量扫描、安全评分 | 30 秒内添加到任何仓库 | | **Concurrent Pipeline** | 带 git worktree 隔离的 WorkerPool，原子 SQLite 提取，线程安全存储 | 并行构建 3+ 个功能 | | **API Security** | 3 层中间件：Bearer 认证、Origin 验证、响应扫描 | 响应中零凭证泄露 | | **Agent Safety** | 破坏性操作拦截、作用域租赁、指令持久化卫士 | 智能体操作的硬性上限 | | **AI Model SBOM** | 25+ 提供商模式，50+ 模型许可注册表，5 种发现类型 | 编目每个 AI 依赖项 | | **Trust Graph** | 14+ 多智能体框架，DFS 循环检测，EU AI Act Art.14 证据 | 阻止循环委托 | | **MCP Multi-Language** | 与 Python/TS/JS 并行扫描 Go/Rust/Java/Kotlin/Swift | 6 种语言，统一风险评分 | | **Client Preflight** | 带 CLI 的 7 项探测部署就绪检查 | 在 CI 故障发生前发现它们 | ## 16 页分析仪表板 ``` streamlit run dashboard/app.py ``` | Page | What It Shows | |---|---| | **Operator Console** | Prompt 接收 → 批准 → 排队 → 执行 → 重放。带时间线的追踪浏览器，审计报告生成器。功能请求流水线状态 | | **Governance** | 智能体宪法查看器，交互式预操作检查模拟器，智能体范围浏览器 | | **System Overview** | 栈剖析，框架矩阵，测试覆盖率，实时智能体指标 | | **Collaboration** | 交互式委托网络图 + 链条追踪 | | **Performance** | 瓶颈检测，延迟趋势，按智能体的健康评分 | | **GraphRAG** | 混合 RAG 架构图 + 实时推荐引擎 | | **Ontology** | 设计与现实对比 —— 预期路径 vs 实际委托使用情况 | | **Pipeline** | 数据流，7 层安全架构，API 连接测试器 | | **Red Team** | 模式/目标/自动修补控制 · 扫描器 + 门禁强度仪表 · 漏洞表 · Prompt 注入发现 · 信任图谱可视化 | | **Agent Learning** | 开发者风险图表 · 组织记忆面板 · 合规漂移 · 仓库修复率 · 学习指标 · 时间图 | | **Onboarding** | Architecture · Security · Coverage · Generated Tests · Baseline Delta 标签页 | | **Compliance Scan** | 法律风险 · HIPAA · EU AI Act · CVE 可达性 · AI 模型 SBOM · 信任图谱发现 | | **Architecture Scan** | 导入图 · HTTP 暴露 · ENV 密钥 · 攻击面评分 · 13 个类别 · 6 种语言 | | **Agent Safety** | 破坏性操作批准队列 · 作用域租赁创建器 · 指令持久化卫士 · 拦截器模拟器 | | **Release Readiness** | 预发布风险评分 · 开发者画像 · 组织记忆 · 违规预测 | | **Agent Factory** | 自然语言智能体构建器 · 规格预览 · 脚手架查看器 | ## 快速开始 ### 添加到任何仓库 ``` # .github/workflows/agenticqa-scan.yml - uses: nhomyk/AgenticQA@main with: sarif: 'true' ``` 就是这样。SARIF 发现将显示在 GitHub Code Scanning 中。PR 评论自动发布。 ### 本地运行 ``` git clone https://github.com/nhomyk/AgenticQA.git cd AgenticQA pip install -e . # 启动基础设施（可选 — 大多数功能无需此步骤即可运行） docker compose -f docker-compose.weaviate.yml up -d # 运行 2308+ 单元测试 pytest tests/ -m unit -v # 启动 control plane uvicorn agent_api:app --host 0.0.0.0 --port 8000 # 打开着陆页（无需 API key） open http://localhost:8000 # 运行完整客户端演示 python run_demo.py # 启动 dashboard streamlit run dashboard/app.py # 在任何目标仓库上运行部署预检 python -m agenticqa.client_preflight --repo /path/to/repo ``` ### 3 条命令引导任何仓库 ``` agenticqa bootstrap --repo . # generate CI wiring + config agenticqa ingest-junit results.xml # convert existing test output agenticqa doctor --repo . # readiness check with fix commands ``` ## 核心 API（140+ 端点） **治理** - `GET /api/system/constitution` —— 机器可读的法律集 - `POST /api/system/constitution/check` —— 预操作检查：`ALLOW / REQUIRE_APPROVAL / DENY` - `GET /api/system/agent-scopes` —— 按智能体的文件访问范围（8 个智能体） - `POST /api/system/agent-scopes/check` —— 对 agent × action × file path 进行范围检查 **合规与监管扫描** - `GET /api/compliance/legal-risk` —— 凭证、PII 文档、特权泄露、SSRF、缺少认证 - `GET /api/compliance/hipaa` —— PHI_HARDCODED、PHI_TO_LLM、PHI_IN_LOGS、HIPAA_AUDIT_MISSING - `GET /api/compliance/ai-act` —— EU AI Act 附件 III 分类 + 第 9/13/14/22 条合规评分 - `GET /api/compliance/ai-model-sbom` —— AI 模型 SBOM：25+ 提供商，许可注册表，5 种发现类型 **安全与对抗** - `POST /api/red-team/scan` —— 对抗性扫描（模式：fast|thorough，目标：scanner|gate|both，auto_patch: bool） - `GET /api/redteam/prompt-injection` —— 静态 Prompt 注入攻击面扫描（4 条规则，SARIF 原生） - `GET /api/redteam/agent-trust-graph` —— 多智能体信任图谱分析（14+ 框架，循环检测） - `POST /api/export/sarif` —— 将智能体结果转换为 GitHub Code Scanning 的 SARIF 2.1.0 - `GET /api/security/cve-reachability` —— 导入级 AST + pip-audit/npm-audit CVE 分析 **AI 输出溯源** - `GET /api/provenance/verify` —— 根据签名溯源日志验证输出哈希 - `GET /api/provenance/chain` —— 智能体的最近 N 条溯源记录 **模型回归** - `GET /api/regression/compare` —— 基线候选模型输出之间的余弦相似度 **智能体安全** - `POST /api/safety/intercept` —— 预分类工具调用（safe/reversible/irreversible/destructive） - `GET /api/safety/pending` —— 列出待处理的批准请求 - `POST /api/safety/approve/{token}` —— 按令牌批准破坏性操作 - `POST /api/safety/deny/{token}` —— 按令牌拒绝破坏性操作 - `POST /api/safety/lease` —— 创建智能体作用域租赁（读/写/删/执行上限 + TTL） - `GET /api/safety/lease/{id}` —— 获取租赁状态 - `DELETE /api/safety/lease/{id}` —— 撤销租赁 - `POST /api/safety/warden/register` —— 注册智能体护栏以进行持久化监控 - `POST /api/safety/warden/check` —— 检查压缩风险 + 约束漂移 - `GET /api/safety/warden/prompt` —— 生成护栏重新注入块 **功能请求流水线** - `POST /api/workflow/submit` —— 将功能请求提交到队列 - `GET /api/workflow/queue-status` —— pending/in_progress/completed/failed 计数 - `GET /api/workflow/request/{id}` —— 获取请求详情 + 生命周期事件 - `GET /api/workflow/requests` —— 列出所有带过滤器的请求 **可观测性与审计** - `GET /api/observability/traces/{id}/audit-report` —— 具有稳定审计 ID 的取证合规工件 - `GET /api/observability/traces/{id}/counterfactuals` —— “智能体本该怎么做？”分析 - `GET /api/health/dataflow` —— 本体感知的基础设施健康；严重故障时返回 503 - `GET /api/temporal/violations` —— 来自 Neo4j 的时间违规图快照 - `GET /api/learning-metrics` —— 学习指标历史 + 改进曲线 - `GET /api/developer-profiles` —— 通过 git blame 的按开发者 EWMA 风险画像 - `GET /api/org-memory` —— 跨仓库的组织记忆和不可修复规则 - `GET /api/repo-profile` —— 按仓库的 EWMA 修复率和运行历史 **智能体工厂** - `POST /api/agent-factory/from-prompt` —— 自然语言描述 → 脚手架 → 持久化智能体 **GitHub 集成** - `POST /api/github/pr-comment` —— 将扫描结果作为 PR 评论发布（通过 `--edit-last` 更新） - `POST /api/github/pr-inline-comments` —— 发布带有严重性图标的内联审查评论 **着陆页与引导** - `GET /` —— 着陆页（公开，无需认证） - `POST /api/demo/submit` —— 着陆页的轻量级自主流水线（公开，无需认证）；返回 `{ verdict, elapsed_s, security, tests, coverage }` - `POST /api/onboarding/run` —— 5 阶段引导：架构 + 7 项安全扫描 + 覆盖率映射 + LLM 测试生成 + 基线快照 - `GET /api/onboarding/status` —— 检索存储的基线和最后一次 `BaselineDelta` 趋势 - `POST /api/pipeline/ui-test-scan` —— 带自主自愈循环的独立 UI 测试扫描 ## 安全架构 ``` ┌──────────────────────────────────────────────────────────┐ │ 0. API Security Middleware (3 layers) │ │ BearerToken auth · Origin validation · Response scan │ ├──────────────────────────────────────────────────────────┤ │ 0b. Constitutional Gate │ │ pre-action ALLOW/DENY · 6 Tier 1 laws · agent scopes│ ├──────────────────────────────────────────────────────────┤ │ 0c. OutputScanner (Red Team hardened) │ │ 4-pass decode: raw → NFKC → base64 → URL │ │ 13 danger patterns + learned bypass patterns │ ├──────────────────────────────────────────────────────────┤ │ 0d. Agent Safety Modules │ │ DestructiveActionInterceptor · ScopeLeaseManager │ │ InstructionPersistenceWarden · token-based approval │ ├──────────────────────────────────────────────────────────┤ │ 0e. Static Security Scanners (ComplianceAgent) │ │ Legal Risk · HIPAA PHI · Prompt Injection · EU AI Act│ │ AI Model SBOM · Agent Trust Graph · CVE Reachability │ │ MCP Scanner (6 languages) · 30+ SARIF rules │ ├──────────────────────────────────────────────────────────┤ │ 0f. AI Output Provenance │ │ HMAC-SHA256 sign · tamper detection · chain of custody│ ├──────────────────────────────────────────────────────────┤ │ 1. CI/CD Pipeline Gate (16 jobs + GitHub Action) │ ├──────────────────────────────────────────────────────────┤ │ 2. Delegation Guardrails (max depth=3 · whitelist) │ ├──────────────────────────────────────────────────────────┤ │ 3. Task-Agent Ontology (20 task types · 70% min success)│ ├──────────────────────────────────────────────────────────┤ │ 4. Schema & PII Validation │ ├──────────────────────────────────────────────────────────┤ │ 5. Data Quality Testing │ ├──────────────────────────────────────────────────────────┤ │ 6. Immutability & Integrity (SHA-256 · duplicate detect)│ └──────────────────────────────────────────────────────────┘ ``` ## 技术栈 | Layer | Technology | |---|---| | **Agent Governance** | Constitution YAML + ConstitutionalGate（语义别名执行，抗拼写错误） | | **API Security** | BearerTokenMiddleware · OriginValidationMiddleware · ResponseScanMiddleware · CORS 锁定 | | **Agent Safety** | DestructiveActionInterceptor · AgentScopeLeaseManager · InstructionPersistenceWarden | | **Adversarial Hardening** | RedTeamAgent · AdversarialGenerator · PatternPatcher · OutputScanner（4 轮） | | **Prompt Injection** | PromptInjectionScanner（4 条规则） · SARIF PROMPT_INJECTION_SURFACE=9.5 | | **Legal & HIPAA** | LegalRiskScanner · HIPAAPHIScanner · SARIF 原生 · 纯 Python | | **EU AI Act** | AIActComplianceChecker · 附件 III 分类器 · 第 9/13/14/22 条检查 | | **AI Model SBOM** | AIModelSBOMScanner · 25+ 提供商模式 · 50+ 模型许可注册表 | | **Trust Graph** | AgentTrustGraphAnalyzer · 14+ 框架 · DFS 循环检测 · EU AI Act Art.14 | | **MCP Scanner** | MCPScanner · DataFlowTracer · 6 种语言 | | **AI Output Provenance** | OutputProvenanceLogger · HMAC-SHA256 · JSONL 链 · 验证 API | | **Model Regression** | ModelRegressionTester · GoldenSnapshot · fastembed → TF-IDF 余弦回退 | | **Agent Factory** | NaturalLanguageSpecExtractor · Claude Haiku（规格提取） · 脚手架生成器 | | **Feature Pipeline** | WorkerPool · git worktree 隔离 · 原子 SQLite 提取 · 线程安全存储 | | **Self-Healing CI** | SREAgent._attempt_test_repair · SubprocessRunner 沙箱 · Haiku 生成的补丁 | | **TypeScript SRE** | oxlint/ESLint 自动检测 · 30+ 修复规则 · 架构规则排除 | | **Infra Health** | DataflowHealthMonitor · 本体感知探测 · Weaviate 版本检测 | | **Client Preflight** | 7 项探测部署就绪 · 带 JSON 输出的 CLI · CI 集成 | | **Security Output** | SARIFExporter · SARIF 2.1.0 · GitHub Code Scanning · 30+ 严重性映射 | | **Shell Linting** | shellcheck（SC 代码） · SREAgent._run_shell_linter | | **Vector DB** | Qdrant（主）/ Weaviate 1.27.0+（备） | | **Graph DB** | Neo4j | | **Relational DB** | SQLite / PostgreSQL | | **Embeddings** | fastembed（本地）/ Sentence-Transformers | | **API** | FastAPI + Pydantic（140+ 端点） | | **Dashboard** | Streamlit + Plotly（16 页） | | **CI/CD** | GitHub Actions（16 个作业 + GitHub Action marketplace，SARIF 上传，夜间自验证） | | **GitHub Action** | 复合操作：单行 CI，SARIF，PR 评论，增量扫描，安全评分 | | **Testing** | Pytest（2308+ 单元/集成测试，6 个 e2e 流水线测试） | | **Language** | Python 3.8+ | | **Landing Page** | 原生 HTML/CSS/JS · 动画 CSS 网格 · 未来主义深色极简 UI | | **Frontend Testing** | Streamlit AppTest · Jest · Vitest · FrontendTestGenerator · FrontendTestRunner | | **Repo Onboarding** | CoverageMapper · ArchitectureScanner · RepoOnboarder · BaselineDelta | | **GitHub Integration** | PR 评论 · 内联审查评论 · SARIF 上传 · 工件缓存 | ## 项目结构 ``` AgenticQA/ ├── action.yml # GitHub Action — one-line CI for any repo (SARIF, PR comments, grades) ├── public/ # Landing page (served at GET /) │ └── index.html # Futuristic dark minimal AI input UI ├── run_demo.py # 8-step end-to-end client demo (no API key needed) ├── src/agenticqa/ │ ├── constitution.yaml # Agent Constitution — versioned, machine-readable law set │ ├── agent_scopes.yaml # Per-agent file access scopes (8 agents) │ ├── constitutional_gate.py # Pre-action enforcement: ALLOW / REQUIRE_APPROVAL / DENY │ ├── audit_report.py # Forensic compliance artifact builder (PR-embeddable) │ ├── observability.py # SQLite store: complexity tracking, anomaly detection │ ├── workflow_requests.py # Thread-safe PromptWorkflowStore (atomic pickup, full locking) │ ├── workflow_worker.py # WorkerPool + WorkflowExecutionWorker (git worktree isolation) │ ├── client_preflight.py # 7-probe deployment readiness check + CLI │ ├── security/ # LegalRiskScanner · HIPAAPHIScanner · PromptInjectionScanner · CVEReachability │ │ ├── ai_model_sbom.py # AI Model SBOM: 25+ provider patterns, license registry │ │ ├── agent_trust_graph.py # Multi-agent trust graph: 14+ frameworks, cycle detection │ │ ├── mcp_scanner.py # MCP Scanner: 6 languages (Python/TS/JS/Go/Rust/Java) │ │ ├── api_middleware.py # 3-layer API security: bearer auth, origin, response scan │ │ ├── destructive_action_interceptor.py # Tool call classification + approval queue │ │ ├── scope_lease_manager.py # Hard op caps per agent with TTL │ │ ├── instruction_persistence_warden.py # Context compaction risk + drift detection │ │ └── path_sanitizer.py # Path security with GITHUB_WORKSPACE support │ ├── compliance/ # AIActComplianceChecker (EU AI Act) · ComplianceDriftDetector │ ├── provenance/ # OutputProvenanceLogger · HMAC-SHA256 signing · verify API │ ├── regression/ # ModelRegressionTester · GoldenSnapshot · cosine similarity │ ├── verification/ # Feedback loop, outcome tracker, threshold calibrator, strategy selector │ ├── graph/ # Neo4j: delegation store, GraphRAG, temporal violation store │ ├── rag/ # Weaviate/Qdrant: vector retrieval, reranking │ ├── collaboration/ # Agent delegation, registry, guardrails │ ├── redteam/ # AdversarialGenerator (20 techniques) · PatternPatcher │ ├── factory/ # NaturalLanguageSpecExtractor · agent scaffold generator │ │ └── sandbox/ # SubprocessRunner · OutputScanner (4-pass decode) │ ├── monitoring/ # DataflowHealthMonitor · 5 probes · ontology-aware impact │ ├── onboarding/ # CoverageMapper · RepoOnboarder · BaselineDelta │ ├── testing/ # FrontendTestGenerator · FrontendTestRunner (Streamlit/Jest/Vitest) │ ├── export/ # SARIFExporter · SARIF 2.1.0 · 30+ severity mappings │ ├── github/ # PR commenter (upsert) · inline review comments │ └── cli.py # CLI: bootstrap, doctor, ingest-junit, preflight ├── dashboard/ # 16-page Streamlit analytics dashboard ├── agent_api.py # FastAPI control plane (140+ endpoints, 3-layer security middleware) ├── src/agents.py # 8 agents: QA, Performance, Compliance, DevOps, SRE, SDET, Fullstack, RedTeam ├── src/data_store/ # PatternAnalyzer, LearningMetrics, RepoProfile, DeveloperProfile, OrgMemory ├── ingest_ci_artifacts.py # CI data bridge — ESLint, red-team, migration → learning system ├── scripts/ # github_action_entrypoint.py, post_pr_comment.py, run_custom_agents.py ├── tests/ # 2308+ tests — unit, integration, e2e pipeline, preflight │ ├── test_e2e_full_pipeline.py # 6 e2e integration tests (real SQLite, real WorkerPool, real git) │ └── test_client_preflight.py # 17 unit tests for deployment readiness └── .github/workflows/ # 16-job CI pipeline + feature-request pipeline + nightly self-validation ``` ## 许可证 MIT

标签：ADLC, AgenticQA, AI代理开发, API安全加固, CI/CD平台, CISA项目, DevSecOps, DNS 反向解析, GDPR合规, Git隔离, HIPAA合规, Kubernetes, Linux系统监控, LLM回归测试, MCP扫描器, Python, QA自动化, SARIF, SBOM, 上游代理, 中间件安全, 人工智能治理, 代码安全, 企业级, 信任图谱分析, 加密输出溯源, 合规性扫描, 多语言覆盖, 子域名变形, 安全合规, 宪法治理, 对抗性红队测试, 工作流自动化, 开源框架, 持续部署, 持续集成, 提示词注入检测, 敏捷开发, 无后门, 智能开发周期, 模型安全, 欧盟AI法案, 漏洞枚举, 硬件无关, 网络代理, 自主多智能体系统, 自动化运维, 自愈系统, 请求拦截, 跌倒检测, 软件物料清单, 逆向工具