saiamogh7-cmd/incident-response-env
GitHub: saiamogh7-cmd/incident-response-env
这是一个基于 FastAPI 和 Docker 的交互式沙盒环境,旨在为 AI 代理提供网络安全和 IT 事件响应的模拟训练与评估。
Stars: 0 | Forks: 0
## title: Incident Response Env
emoji: 🚨
colorFrom: red
colorTo: purple
sdk: docker
pinned: false
# Incident Response Environment API 🛡️
欢迎使用 **Incident Response Environment API**。该平台作为一个交互式、模拟化的沙盒环境,供人工智能(AI)代理在网络安全和 IT 事件响应方面进行训练。
我们的 API 为 AI 模型提供了一个安全的训练环境,用于识别其弱点并获得精确的性能指标——所有这些都不会给现实世界的系统带来风险。
## 🚀 为什么要使用此 API?
* **安全的沙盒测试:** 让您的 AI 在一个可丢弃的环境中练习修复崩溃的服务器或阻止模拟的黑客攻击。如果 AI 犯了错,不会造成任何实际损害。
* **强化学习:** 非常适合通过试错来训练 AI 代理。环境会针对某个操作是有助于解决问题还是使情况恶化提供即时反馈。
* **基准测试与验证:** 评估不同的 AI 模型并暴露其逻辑漏洞。获取硬数据和具体评分,以证明您的 AI 决策的准确性。
## ⚙️ API 端点
该 API 使用 FastAPI 构建,提供了一组简单的端点用于与模拟环境交互。您可以通过访问基础 URL 上的 `/docs` 来查看交互式的 Swagger UI 文档。
| 方法 | 端点 | 描述 |
| :--- | :--- | :--- |
| `GET` | `/health` | **健康检查:** 验证 API 服务器是否处于唤醒和运行状态。 |
| `GET` | `/tasks` | **列出任务:** 返回可用的模拟场景或“关卡”列表。 |
| `POST` | `/reset` | **重置环境:** 初始化一个新会话并返回事件的起始状态。 |
| `POST` | `/step` | **执行操作:** 向环境发送命令(例如,“阻止 IP 地址”)并接收结果。 |
| `GET` | `/state` | **检查状态:** 查看模拟环境的当前状态,包括您的分数和活动警报。 |
## 💻 快速入门指南 (Python)
以下是一个简单的示例,展示了如何将 AI 代理或 Python 脚本连接到 API 以开始训练:
```
import requests
BASE_URL = "[https://saiamogh7-cmd-incident-response-env.hf.space](https://saiamogh7-cmd-incident-response-env.hf.space)"
# 1. 检查 API 是否存活
requests.get(f"{BASE_URL}/health")
# 2. 重置环境以开始新的模拟
reset_data = requests.post(f"{BASE_URL}/reset", json={}).json()
print("Starting State:", reset_data)
# 3. 执行一步(将您的 AI 决策发送到环境)
# Note: 检查 /docs schema 以确保您的 payload 符合预期的字段
action_payload = {
"action": "Investigate server logs",
"text": "Checking for unusual login activity"
}
step_result = requests.post(f"{BASE_URL}/step", json=action_payload).json()
# 4. 查看您的操作结果
print("Result of action:", step_result)
# Incident Response & Runbook Automation Environment




The **Incident Response & Runbook Automation Environment** is a deterministic, containerized OpenEnv simulation designed to test the capabilities of autonomous AI agents in high-stakes Site Reliability Engineering (SRE) scenarios. Built for researchers and developers evaluating LLMs on complex operational workflows, this environment presents an active microservice architecture where simulated metrics degrade, alarms page the agent, and unstructured logs roll in real-time. It matters because true DevOps autonomy extends beyond simple code generation; models must synthesize telemetry, triage cascading failures under time pressure, and author actionable, precise remediation narratives without human intervention.
## Environment 概述
| Property | Details |
| --- | --- |
| **Observation Space** | Composite JSON object (Metrics, Logs, Alerts, KB Articles) |
| **Action Space** | Structured JSON object conformant to `IncidentAction` schema |
| **Reward Range** | `[0.0, 1.0]` per episode |
| **Episode Length** | Hard limit evaluated up to `8` total steps |
## Tasks
| Task | Difficulty | Objective | Key Metric | Score Threshold |
| --- | --- | --- | --- | --- |
| `easy` | Easy | Identify the root cause service from structured logs and metrics. | Triage accuracy | 0.7 |
| `medium` | Medium | Write a step-by-step remediation runbook for a service cascade failure. | Semantic runbook mapping | 0.7 |
| `hard` | Hard | Diagnose a multi-service cascading failure and produce a complete postmortem. | RCA and structural completeness | 0.7 |
## Observation Space
The `IncidentObservation` schema captures the telemetry snapshot returned on every environment step.
| Field | Type | Description |
| --- | --- | --- |
| `step` | `int` | The monotonically increasing counter of the current action sequence. |
| `alert` | `object` | The active page/alert payload detailing the initial trigger condition. |
| `metrics` | `array` | A snapshot array of `ServiceMetric` objects (latency, error rates, CPU). |
| `recent_logs` | `array` | A trailing snapshot of `LogEntry` objects surrounding the incident timeline. |
| `kb_articles` | `array` | Retrieved context including runbooks or architecture overviews. |
| `current_incident_status` | `string` | Progress state: `open`, `investigating`, `mitigated`, or `resolved`. |
| `previous_actions` | `array` | An immutable append-only ledger summarizing the agent's historical actions. |
| `time_elapsed_minutes` | `int` | The simulated elapsed time mapping task duration. |
## Action Space
Agents interact with the environment by constructing a validated `IncidentAction` JSON.
| Field | Type | Valid Values / Spec | Description |
| --- | --- | --- | --- |
| `action_type` | `string` | `'diagnose'`, `'escalate'`, `'write_runbook'`, `'apply_fix'`, `'write_postmortem'`, `'resolve'` | The functional operation the agent is electing to execute. |
| `reasoning` | `string` | Detailed rationale. | The diagnostic chain-of-thought backing the selection. |
| `target_service` | `string` | Optional name of the service. | The microservice the action targets (e.g., `auth-service`). |
| `runbook_steps` | `array` | Optional string array. | Used when `write_runbook` is selected to layout remediation. |
| `severity_assessment`| `string` | Optional P-level tier. | Escalation tier used when altering the alert priority. |
| `postmortem_sections`| `object` | Optional dictionary. Keys: `summary`, `timeline`, `root_cause`, `impact`, `action_items` | Structured postmortem narrative payload. |
## Reward Function
The reward mechanics are deterministically tied directly to the task difficulty via automated grader logic evaluating the distance between the state representation and ground truth:
1. **Easy:** Returns `1.0` for correctly isolating the true anomalous service, partial grading for closely matched upstream dependencies.
2. **Medium:** Utilizes a semantic keyword evaluation over the proposed `runbook_steps` arrays to assign partial continuous credits up to `1.0`.
3. **Hard:** Cross-validates independently generated sections within `postmortem_sections` against required systemic themes along with priority matching.
**Time-Pressure Discount**: The environment imposes a time penalty. For any task spanning beyond `3` steps, the raw score incurs a compounding `-0.05` deduction to incentivize efficiency natively simulating service downtime cost.
## Quick Start
### a. Docker 本地运行
```bash
docker build -t incident-response-env .
docker run -p 7860:7860 incident-response-env
```
### b. 使用 cURL 测试
```
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_level": "easy"}'
```
### c. 运行基线
```
export HF_TOKEN="your_huggingface_token_here"
export ENV_URL="http://localhost:7860"
python inference.py
```
## 基线分数
| 任务 | 模型 | 平均分数 | 步数 |
| --- | --- | --- | --- |
| easy | Qwen2.5-72B | 0.85 | 3 |
| medium | Qwen2.5-72B | 0.45 | 8 |
| hard | Qwen2.5-72B | 0.20 | 8 |
## 项目结构
```
.
├── Dockerfile # Container orchestration script
├── inference.py # Native python baseline validation agent script
├── openenv.yaml # Standard OpenEnv metadata schema
├── README.md # Environment documentation (this file)
└── server/
├── environment.py # State-transition and core simulation mechanics
├── graders.py # Task difficulty logic and deterministic scoring
├── main.py # FastAPI container entrypoint and HTTP wrappers
├── models.py # Pydantic typing for observation/action spaces
├── scenarios.py # Static pre-rendered incidence ground truth files
└── requirements.txt # Isolated Python container dependencies
```
## License
MIT License.
标签:AI训练, API服务, AV绕过, Docker, FastAPI, 交互式文档, 仿真环境, 反取证, 安全评估, 安全防御评估, 强化学习, 模拟黑客攻击, 沙箱, 混合加密, 系统状态监控, 网络安全, 自动化运维, 蓝队防御, 请求拦截, 逆向工具, 隐私保护