GoPlusSecurity/ai-auditing-benchmark
GitHub: GoPlusSecurity/ai-auditing-benchmark
Stars: 8 | Forks: 1
# ai-auditing-benchmark
A benchmark dataset for **AI smart-contract auditing**. It curates and extracts **vulnerable contract source code** from real-world historical exploits caused by smart-contract vulnerabilities, intended for evaluating and training auditing capabilities (e.g., used together with tools like `ai-auditing-engine`).
## Data sources and goals
- **Sources**: Publicly available on-chain / security incident reports and repository snapshots, corresponding to contracts exploited or found defective in each incident.
- **Goals**:
- Provide **real vulnerable samples** that are reproducible and comparable.
- Support evaluating AI auditing performance at two granularities: **full context** and **reduced attack surface**.
## Directory structure
Data lives under `dataset/` and is organized by incident. Each incident directory name follows: `{IncidentDateYYYYMMDD}_{ProjectOrProtocolSlug}` (date first, to make sorting and searching easier).
dataset/
├── benchmark_complete/ # Full code of the exploited contracts (incl. deps/libs), as close as possible to an auditable/compilable snapshot
└── benchmark_simplified/ # Only vulnerability-related functions + minimal required deps; obviously irrelevant logic removed
### `benchmark_complete`
- Contains the **full source tree** of the exploited contracts (including interfaces, libraries, third-party dependencies, etc.), useful for:
- Cross-contract and cross-module interaction analysis
- Audit workflows that require a full call graph and state-flow context
### `benchmark_simplified`
- Based on the code for the **same incident**, it **keeps only the vulnerable functions** (and the minimal dependencies required for compilation and semantic understanding) and **removes functions unrelated to the vulnerability**, for:
- **Reducing input scope** when integrating with engines like **ai-auditing-engine**, making it easier to **pinpoint vulnerabilities precisely**
- **Lowering token and compute costs**, speeding up iterative evaluation
## Incident index (CSV)
The CSV in the repository root lists **all incidents** currently included in `dataset/` and should be treated as the authoritative source of metadata:
- Chinese: [`ai-auditing-benchmark_cn.csv`](ai-auditing-benchmark_cn.csv)
- English: [`ai-auditing-benchmark_en.csv`](ai-auditing-benchmark_en.csv)
Both CSVs have identical rows; only the field language differs. Column meanings:
- **Attack date**: Incident date (`YYYY.MM.DD`).
- **Project**: The exploited project or protocol (the display name may differ slightly from the directory slug, e.g., with `@` or parenthetical notes).
- **Vulnerability**: A short description of the vulnerability type.
- **Vulnerability details**: The exploit technique and defect description.
- **Attack transaction**: Representative on-chain transaction hash.
- **Vulnerable contract address**: Related contract address(es) (may span multiple lines within a cell).
- **Loss (10k USD)**: Reported or estimated loss amount.
**Mapping to directory names**: Incident folder names under both `dataset/benchmark_complete` and `benchmark_simplified` use `{IncidentDateYYYYMMDD}_{ProjectOrProtocolSlug}`. The date is derived from **Attack date** as an 8-digit number (e.g., `2025.05.28` → `20250528`). `{ProjectOrProtocolSlug}` corresponds to the **Project** column and is typically a filesystem-safe slug in lowercase/camel case (e.g., `@Corkprotocol` in the table maps to `20250528_Corkprotocol`). If the **Project** column includes extra notes (e.g., addresses in parentheses), the directory name usually still uses a short protocol identifier; the actual folder names in the repo are authoritative.
Source-tree paths vary by incident. Browse within the corresponding directory by subproject / contract name.
## Quick start (locate code by incident)
1. Find the target row in the CSV (by **Attack date / Project**).
2. Convert **Attack date** to `YYYYMMDD`, and combine it with the project slug: `{YYYYMMDD}_{ProjectSlug}`.
3. Choose a granularity:
- `dataset/benchmark_complete/{dir}/...`: Full context (closer to real audit inputs).
- `dataset/benchmark_simplified/{dir}/...`: Minimal necessary slice (fewer tokens, faster regression).
Example: `2025.05.28` + `@Corkprotocol` → `dataset/benchmark_simplified/20250528_Corkprotocol/`
## Suggested usage with AI auditing engines
1. **Regression and comparison**: For the same incident, run the same audit prompts/pipeline on both `benchmark_complete` and `benchmark_simplified`, and compare detection rate, false positives, and cost.
2. **Day-to-day iteration**: During development, use `benchmark_simplified` for quick validation; before release, spot-check with `benchmark_complete` for more production-like context.
## License and disclaimer
- Code snippets in this repository come from publicly available project sources or incident-related public materials; **copyright belongs to the original authors**. They are provided solely for security research and benchmark evaluation.
- Vulnerable code can be **destructive**. Do not use it for illegal purposes. If you use this dataset in papers or products, please cite the dataset name and the version/commit information.