emmanuelgjr/genai_incidents

GitHub: emmanuelgjr/genai_incidents

一个收录超过 11,500 起 GenAI 与 Agentic AI 安全事件的综合数据集，将每条事件映射至 OWASP、NIST AI RMF、MITRE ATLAS 等标准框架，为 AI 安全研究和实践提供单一事实来源。

Stars: 27 | Forks: 6

# GenAI 与 Agentic AI 安全事件 [![事件](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fraw.githubusercontent.com%2Femmanuelgjr%2Fgenai_incidents%2Fmain%2Fdata%2Fstats.json&query=%24.incident_count&label=incidents&color=ffb000&logo=databricks&logoColor=white)](https://emmanuelgjr.github.io/genai_incidents/) [![验证数据集](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/5226c7435e222614.svg)](https://github.com/emmanuelgjr/genai_incidents/actions/workflows/validate.yml) [![PyPI 版本](https://img.shields.io/pypi/v/genai-incidents?logo=pypi&logoColor=white&label=pypi)](https://pypi.org/project/genai-incidents/) [![Python 版本](https://img.shields.io/pypi/pyversions/genai-incidents?logo=python&logoColor=white)](https://pypi.org/project/genai-incidents/) [![Hugging Face 数据集](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-dataset-yellow)](https://huggingface.co/datasets/emmanuelgjr/genai-incidents) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20248676.svg)](https://doi.org/10.5281/zenodo.20248676) [![许可协议：MIT (代码)](https://img.shields.io/badge/code-MIT-blue.svg)](LICENSE) [![许可协议：CC-BY-4.0 (数据)](https://img.shields.io/badge/data-CC--BY--4.0-lightgrey.svg)](LICENSE-DATA) - 🔎 **可搜索网站：** - 📦 **Python：** `pip install genai-incidents` - 🤗 **Hugging Face：** [`emmanuelgjr/genai-incidents`](https://huggingface.co/datasets/emmanuelgjr/genai-incidents) — `load_dataset("emmanuelgjr/genai-incidents")` （通过 `make huggingface` 构建） - 🛰️ **STIX 2.1 bundle**（用于 OpenCTI / MISP / TAXII）： — 事件作为 `x-genai-incident` SDO，关联至 MITRE ATLAS 的 `attack-pattern` 和 CVE 的 `vulnerability`。使用 `make stix` 在本地构建。 - 📡 **TAXII 2.1（静态）：** 发现地址位于 — STIX 集合的只读静态镜像（[用法与注意事项](https://emmanuelgjr.github.io/genai_incidents/taxii2/README.md)）。使用 `make taxii` 在本地构建。 - 🛡️ **MISP feed：** 将 MISP 实例订阅至（格式：*MISP Feed*）— 事件按年度事件分组，并带有 `genai-incidents:*` / `mitre-atlas:*` 标签。使用 `make misp` 在本地构建。 - 📖 **字段参考：** [`docs/DATA_DICTIONARY.md`](docs/DATA_DICTIONARY.md) · **来源与局限性：** [`docs/DATASHEET.md`](docs/DATASHEET.md) - 🎯 **范围 — 包含/排除的内容：** [`INCLUSION.md`](INCLUSION.md)（每个条目必须满足的收录政策） - 🛠️ **发现错误？** 提交 [数据更正](https://github.com/emmanuelgjr/genai_incidents/issues/new?template=data_correction.yml) 或 [范围争议](https://github.com/emmanuelgjr/genai_incidents/issues/new?template=scope_dispute.yml) — 接受的更改将记录在 [`CORRECTIONS.md`](CORRECTIONS.md) 中 - 🪪 **DOI：** [`10.5281/zenodo.20248676`](https://doi.org/10.5281/zenodo.20248676) — 见 [`CITATION.cff`](CITATION.cff) - 📄 **方法论：** [`docs/paper/genai-incidents-methods.md`](docs/paper/genai-incidents-methods.md) — 数据集是如何构建、映射和管理的 - 📜 **更新日志：** [`CHANGELOG.md`](CHANGELOG.md) 关于 **[11,500 多起 GenAI 和 Agentic AI 安全事件](https://emmanuelgjr.github.io/genai_incidents/)**（具体实时数量见上方徽章）的唯一事实来源，每个事件均映射至： - **OWASP Top 10 for LLM Applications (2025)** — `LLM01`–`LLM10` - **OWASP Agentic Top 10 (ASI)** — `ASI01`–`ASI10` - **NIST AI Risk Management Framework (AI 100-1)** — `GOVERN` / `MAP` / `MEASURE` / `MANAGE` 子类别 - **MITRE ATLAS** — 战术（`AML.TA00xx`）和技术（`AML.T00xx`） - _(配套参考)_ **MAESTRO** 架构层（`L1`–`L7`）该数据集以机器可读的 JSON（`data/incidents.json`）和人类可读的 Markdown 索引（`INCIDENTS.md`）两种形式发布。 ## 布局 ``` . ├── data/ │ ├── incidents.json ← full single source of truth (use this) │ ├── incidents.min.json ← slim variant: id, title, taxonomy mappings, primary reference │ └── legacy_consolidated.json ← intermediate output from the legacy parser ├── schema/ │ └── incident.schema.json ← JSON Schema for one incident ├── mappings/ │ ├── owasp_llm_top10_2025.json │ ├── owasp_asi_top10.json │ ├── nist_ai_rmf.json │ ├── mitre_atlas.json │ └── maestro_layers.json ├── legacy/ ← original source files (preserved verbatim) ├── ingest/ ← per-source aggregator outputs (CVE, AIID, ATLAS, etc.) ├── scripts/ │ ├── parse_existing.py ← parse legacy/ → data/legacy_consolidated.json │ ├── ingest_external.py ← parse cloned source repos under ../_external/ → ingest/*.json │ ├── scrape_aiid.py ← fetch all AIID incident pages (OG metadata) → ingest/aiid_full.json │ ├── ingest_airi_navigator.py ← MIT FutureTech AI Risk Navigator CSV → ingest/airi_navigator_incidents.json │ ├── ingest_aiaaic_sheet.py ← AIAAIC Repository public Google Sheet → ingest/aiaaic_sheet_incidents.json │ ├── ingest_oecd_aim.py ← OECD AI Incidents Monitor (10k pages) → ingest/oecd_aim_full_incidents.json │ ├── ingest_cve_nvd_expanded.py ← pull AI-relevant CVEs from NVD/GHSA/OSV → ingest/cve_nvd_expanded.json │ ├── merge_and_dedupe.py ← merge legacy + ingest/* → data/incidents.json │ ├── render_markdown.py ← data/incidents.json → INCIDENTS.md │ └── validate.py ← validate JSON against schema ├── INCIDENTS.md ← rendered index: unified table, newest-first ├── docs/incidents/.md ← per-year detail shards linked from INCIDENTS.md ├── tests/ ← pytest suite for merge/render helpers ├── LICENSE ← MIT (covers code in scripts/) ├── LICENSE-DATA ← CC-BY-4.0 (covers the dataset under data/) └── README.md ``` ## 什么算作一个事件？满足以下一项或多项的任何内容： 1. 涉及 GenAI 或 Agentic AI 系统的**真实世界**攻击利用、违规或滥用。 2. 影响 AI/ML/LLM/agent 技术栈的**公开披露的漏洞**（CVE 或供应商公告）。 3. 具有可信 PoC 和公开分析文章的**经研究证实的攻击**。 4. 由安全研究人员发布的**红队发现**，且包含足够详细以供重现或复制的信息。每个条目必须具有**至少一个可验证的外部 URL**。无来源的条目将被排除。除非涉及安全原语（数据泄露、完整性攻击、账户接管等），否则本仓库**不**包含广泛的、仅针对公平性/偏见的 AI 危害。 ## 结构 (摘要) 规范版本请参见 [`schema/incident.schema.json`](schema/incident.schema.json)。 ``` { "id": "INC-00001", // stable 5-digit ID "source_ids": ["AIID-123", "CVE-2025-..."], "cve_ids": ["CVE-2025-..."], "cwe_ids": ["CWE-918"], "cvss_score": 9.8, "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H", "aiid_id": 1234, // canonical AIID numeric ID when applicable "title": "...", "date": "2025-09", "disclosure_date": "2025-10-02", // separate from incident date when known "year": 2025, "category": "real-world | research | red-team | vulnerability-disclosure | threat-report | policy", "description": "...", "attack_vector": "prompt-injection | rce | supply-chain | data-exfiltration | ...", "affected": "vendor/product", "impact": "...", "severity": "Critical | High | Medium | Low | Info", "owasp_llm": ["LLM01", "LLM06"], "owasp_asi": ["ASI01", "ASI02"], "nist_ai_rmf": ["MEASURE-2.7", "MAP-3.5"], "mitre_atlas": ["AML.T0051", "AML.T0051.001"], "mitre_atlas_tactics": ["AML.TA0004"], "maestro_layers": [{"layer":"L3","label":"Agent Frameworks & Tooling","role":"origin"}], "mitigations": ["..."], "references": [ {"title":"Vendor advisory","url":"https://...","type":"vendor"} ], "tags": ["mcp","supply-chain"], "added": "2026-05-16", // stable across re-runs "updated": "2026-05-16" // only bumped when content actually changes } ``` ## 使用数据集 ### 作为 Python 库 ``` pip install genai-incidents ``` ``` from genai_incidents import query, by_cve, resolve_id for inc in query(severity="Critical", attack_vector="prompt-injection", year=2026): print(inc["id"], "-", inc["title"]) print(by_cve("CVE-2026-21520")) # all incidents that list this CVE print(resolve_id("INC-00139")) # follow merge history to the current canonical INC ``` ### 作为 JSON - 完整版：[`data/incidents.json`](data/incidents.json) - 精简版（适用于 UI）：[`data/incidents.min.json`](data/incidents.min.json) - 结构定义：[`schema/incident.schema.json`](schema/incident.schema.json) - ID 弃用记录：[`data/id_deprecations.json`](data/id_deprecations.json) — 用于解析对已合并 ID 的引用 ### 作为网站提供可过滤、可搜索、支持深链接的表格，地址位于 . ## 重新生成数据集 ``` pip install -r requirements.txt make build # parse legacy, merge + dedupe, render, validate make test # pytest tests/ make ingest-all # (heavy: refresh AIID/AIRI/AIAAIC/OECD AIM/NVD from network) ``` 或者单独运行各个步骤： ``` python scripts/parse_existing.py # legacy/ -> data/legacy_consolidated.json python scripts/merge_and_dedupe.py # legacy + ingest/* -> data/incidents.json python scripts/render_markdown.py # data/incidents.json -> INCIDENTS.md + docs/incidents/.md python scripts/validate.py # schema check ``` 去重键（保留第一个匹配项）：匹配 `cve_ids`，匹配 `source_ids`（其中 `AIID-N-OECD` 规范化为 `AIID-N`），匹配规范化的参考 URL，在 ±1 年内的模糊标题匹配。每次合并后都会对索引重新编制，以便将所有传递性重复项（例如：条目 A 吸收了 CVE-3，随后发现已存在带有 CVE-3 的条目 B → B 也会被合并到 A 中）全部折叠消除。合并操作会对分类映射、参考、标签、CVE/CWE ID 和 source ID 取并集；取最高严重级别；优先选择更具体的日期（YYYY-MM-DD 优先于仅年份），并拒绝未来年份的日期。 `added` 和 `updated` 会从之前的输出中保留；只有当条目内容实际发生变化时，`updated` 才会更新。这使得 `make build` 在 CI 偏差检查中保持确定性。 ## 添加条目两种途径： 1. **手动**：将格式正确的对象追加到 `data/incidents.json` 并运行 `scripts/render_markdown.py`。确保 `references` 至少包含一个可解析的 URL。 2. **自动化**：将包含原始条目的 JSON 数组放入 `ingest/.json`（任何合理的结构 — 字段容错度请参见 `scripts/merge_and_dedupe.py` 中的 `normalize_entry`），然后重新运行合并与渲染。在提交之前，请始终运行 `scripts/validate.py`。 ## 分类映射 `mappings/` 中的映射文件记录了本数据集使用的受控词汇表。它们源自以下原始出处： - OWASP LLM Top 10 (2025)： - OWASP Agentic Top 10 (ASI / "Agentic AI – Threats and Mitigations")： - NIST AI Risk Management Framework (AI 100-1)： - NIST AI 600-1 Generative AI Profile： - MITRE ATLAS： - MAESTRO (配套参考)：当框架发布新版本时，请更新 `mappings/` 中的映射 JSON，并重新运行合并与验证。 ## 聚合的来源当前数据集提取自以下公开来源。每个条目都保留了指向原始公告、文章或论文的链接： - **OWASP GenAI Security Project** — 事件汇总及 Top 10 参考 - **AI Incident Database (AIID)** ([incidentdatabase.ai](https://incidentdatabase.ai/), [github.com/responsible-ai-collaborative/aiid](https://github.com/responsible-ai-collaborative/aiid)) — 通过 OG 元数据抓取的、完整语料库中与安全相关的子集 - **OECD AI Incidents Monitor (AIM)** ([oecd.ai/en/incidents](https://oecd.ai/en/incidents)) — 通过官方 AIID-OECD 桥接文件与 AIID 进行交叉比对 - **AIAAIC** ([aiaaic.org](https://www.aiaaic.org/aiaaic-repository)) — AI、算法和自动化事件与争议 - **MITRE ATLAS** ([atlas.mitre.org](https://atlas.mitre.org/), [github.com/mitre-atlas/atlas-data](https://github.com/mitre-atlas/atlas-data)) — 从 YAML 语料库中解析的所有案例研究 - **AVID** — AI Vulnerability Database ([avidml.org](https://avidml.org/)) - **CSET-AIID Harm Taxonomy** ([github.com/georgetown-cset/CSET-AIID-harm-taxonomy](https://github.com/georgetown-cset/CSET-AIID-harm-taxonomy)) — 受控词汇参考 - **NVD / CVE.org / GitHub Security Advisories / OSV.dev / CISA KEV** — 通过 REST API 跨 56 个关键字提取的 AI/ML/LLM/agent CVE - **NVIDIA garak** ([github.com/NVIDIA/garak](https://github.com/NVIDIA/garak)) — 每个 LLM 漏洞扫描器探针对应一个条目（规范攻击类别） - **promptfoo** ([github.com/promptfoo/promptfoo](https://github.com/promptfoo/promptfoo)) — 每个红队插件/策略对应一个条目 - **ModelOriented/CVE-AI** ([github.com/ModelOriented/CVE-AI](https://github.com/ModelOriented/CVE-AI)) — 基于 XAI 的 AI 模型验证发现 - **研究人员与供应商博客** — Embrace The Red, Tenable, Palo Alto Unit 42, Trail of Bits, Aim Security, Noma Security, Wiz Research, Lakera, Invariant Labs, PromptArmor, Pillar Security, Token Security, HiddenLayer, Robust Intelligence, Protect AI, Cato Networks CTRL, Endor Labs, Sysdig, Zenity Labs, JFrog, Datadog Security Labs, Reco, AppOmni, BeyondTrust, Oasis Security, Mindgard, Koi Security, Imperva, Sonar, Oligo Security, OX Security, SentinelOne, Check Point Research, Trend Micro, Tinfoil Security, ZeroPath, Cymulate, MaccariTA 等。 - **供应商威胁报告** — Anthropic, OpenAI, Google Threat Intelligence (GTIG/TAG/Mandiant), Microsoft Threat Intelligence (MTAC/MSRC), AWS Security Bulletins, CrowdStrike, Recorded Future。 - **学术论文** — 精选自 USENIX Security / NDSS / S&P / CCS / arXiv 的条目，包含具体的对抗性 PoC。如果遗漏了来源或归属错误，请提交 issue 或 PR。 ## 许可协议 - **代码**（`scripts/`, `schema/`）：[MIT](LICENSE) - **数据和文档**（`data/`, `INCIDENTS.md`, `mappings/`）：[Creative Commons Attribution 4.0 International](LICENSE-DATA) 如果您在研究或工具中使用了本数据集，请引用本仓库。

标签：DLL 劫持, Homebrew安装, 人工智能, 大语言模型, 威胁情报, 安全事件, 安全规则引擎, 开发者工具, 用户模式Hook绕过, 逆向工具