Itachi-1824/warroom-env
GitHub: Itachi-1824/warroom-env
这是一个基于 OpenEnv 的 SOC 事件响应仿真环境,旨在通过模拟具备自适应能力的动态攻击者,来评估 AI 智能体在安全调查取证与威胁遏制中的实际表现。
Stars: 0 | Forks: 0
title: WarRoom - SOC Incident Response
emoji: "🛡"
colorFrom: red
colorTo: gray
sdk: docker
app_port: 8000
tags:
- openenv
# WarRoom — 针对自适应攻击者的 SOC 事件响应
An OpenEnv environment simulating a Security Operations Center where AI agents investigate and contain live security incidents. The adversary is **active** — advancing their kill chain every step the agent wastes, and **adapting** to containment actions by switching C2 channels, accelerating exfiltration, or pivoting to other compromised accounts.
## 为何重要
Cybersecurity incident response is the highest-stakes task in enterprise IT. The global market exceeds $200B, every Fortune 500 company has a SOC, and no agent benchmark exists for this domain. WarRoom fills that gap with the most realistic agent evaluation environment for threat investigation ever built.
**What makes WarRoom unique:**
- **Live adversary** — the attacker progresses their kill chain in real-time
- **Adaptive adversary** — responds to containment with failover C2, account pivots, accelerated exfil
- **Forensic depth** — memory analysis, file analysis, payload decoding, registry forensics
- **NIST methodology scoring** — rewards proper investigation before containment
- **5 scenarios** spanning malware, phishing, ransomware, APT, and insider threat
## 动作空间
### 调查命令
| Command | Description |
|---------|-------------|
| `check_alerts` | View active SIEM alerts |
| `list_hosts` | List all network hosts and their status |
| `check_metrics ` | View CPU/memory/network metrics |
| `read_logs [--tail N]` | Read security and system logs |
| `check_connections ` | View active network connections |
| `check_processes ` | View running processes |
| `query_threatintel ` | Look up IOCs (IP, domain, hash) |
| `check_email_gateway` | View email gateway alerts |
| `check_dns_logs [--host ]` | View DNS query logs |
| `check_firewall [--src ] [--dst ]` | View firewall logs |
### 取证深度分析命令
| Command | Description |
|---------|-------------|
| `analyze_memory ` | Memory forensics — injected code, C2 configs, credential artifacts |
| `analyze_file ` | Static file analysis — strings, imports, verdict |
| `decode_payload ` | Decode encoded PowerShell/shell payloads |
| `check_registry ` | Registry forensics — persistence, modifications |
### 遏制命令
| Command | Description |
|---------|-------------|
| `isolate_host ` | Network-isolate a compromised host |
| `block_ip ` | Block an IP at the firewall |
| `block_domain ` | Sinkhole a domain via DNS |
| `disable_account ` | Disable a compromised user account |
| `kill_process ` | Kill a malicious process |
| `quarantine_file ` | Quarantine a suspicious file |
### 解决命令
| Command | Description |
|---------|-------------|
| `submit_iocs ` | Submit identified IOCs |
| `identify_root_cause ` | Declare the root cause |
| `update_status ` | Post incident status update |
| `resolve` | Mark incident as resolved |
## 观察空间
| Field | Type | Description |
|-------|------|-------------|
| `output` | str | Command output (logs, alerts, forensic results, etc.) |
| `active_alerts` | int | Number of active alerts (increases as attacker progresses) |
| `elapsed_minutes` | int | Simulated time since incident start |
| `hosts_compromised` | int | Hosts with confirmed compromise (can increase dynamically) |
| `hosts_contained` | int | Hosts successfully isolated |
| `threat_contained` | bool | Whether threat is fully contained |
| `done` | bool | Whether the episode has ended |
| `reward` | float | Step reward |
| `metadata.data_exfiltrated_mb` | float | Data lost (increases if agent is slow) |
| `metadata.attacker_active` | bool | Whether attacker is still operating |
## 任务(5 个场景)
### 任务 1:恶意软件检测与遏制(简单)
Single-host infection via phishing email with macro document. Clear EDR alerts, one compromised host. Attacker escalates to credential dumping if agent takes >15 steps.
### 任务 2:涉及横向移动的钓鱼活动(中等)
Targeted phishing hit 5 employees. Two entered credentials on fake SSO page. Attacker used stolen creds to RDP into file server and access confidential data. Agent must map the full chain and contain all compromised assets.
### 任务 3:勒索软件爆发 — 与加密赛跑(中-困难)
Ransomware spreading via PsExec/SMB. Multiple hosts already encrypting. **Every 5 steps, another host falls.** Agent must race to isolate hosts before the file server and backups are encrypted. Maximum time pressure.
### 任务 4:针对自适应攻击者的 APT 调查(困难)
72-hour multi-stage intrusion: spearphish -> C2 -> credential dump -> lateral movement -> data exfiltration. **The adversary adapts:**
- Block primary C2 -> attacker switches to DNS-over-HTTPS backup channel
- Isolate patient zero -> attacker accelerates exfil from other hosts
- Disable compromised account -> attacker pivots to another dumped account
- Red herring alerts from legitimate Windows Update activity
- Anti-forensics: attacker deletes logs if agent is slow
### 任务 5:内部威胁 — 微妙行为分析(困难)
Senior engineer exfiltrating intellectual property. No malware, no external attackers. Subtle signals: after-hours bulk data transfers, personal cloud sync tools, password-protected archives of source code, HR portal access (resignation policy). Agent must distinguish malicious intent from normal behavior and build a evidence-based case before acting.
## 奖励设计
### 逐步奖励
- **Investigation (first-time)**: +0.01 to +0.05 (forensic deep-dives earn more)
- **Correct containment**: +0.05 to +0.06
- **Incorrect containment** (wrong target): -0.03
- **Status updates/IOC submission**: +0.02
- **Root cause**: +0.03
- **Resolution bonus**: up to +0.30 (based on grader score)
### 评分器细分(0.0-1.0)
| Component | Weight | Description |
|-----------|--------|-------------|
| IOC Identification | 30% | Recall of ground-truth IOCs, penalize false positives |
| Root Cause Analysis | 20% | Keyword matching against ground-truth |
| Containment Actions | 20% | Recall of required + bonus actions |
| NIST Methodology | 15% | Investigation depth before first containment action |
| Data Loss Prevention | 10% | Less data exfiltrated = higher score |
| Communication | 5% | Status updates posted |
## 动态攻击者机制
The attacker advances their kill chain based on step count:
- **Early steps**: Baseline scenario (initial compromise already happened)
- **Mid-game**: Attacker escalates (credential dumping, lateral movement)
- **Late game**: Catastrophic events (domain admin compromise, backup destruction, mass exfiltration)
Adaptations trigger when specific containment actions are taken, making containment a strategic decision — not just "block everything."
## 设置
```
# 安装依赖
pip install "openenv-core[core]" pydantic openai
# 本地运行
cd warroom_env
uvicorn server.app:app --host 0.0.0.0 --port 8000
# 或使用 uv
uv run server
# Docker
docker build -t warroom-env:latest .
docker run -p 8000:8000 warroom-env:latest
# 运行推理
export API_BASE_URL="https://integrate.api.nvidia.com/v1"
export MODEL_NAME="nvidia/llama-3.1-nemotron-ultra-253b-v1"
export HF_TOKEN="your-api-key"
python inference.py
# 验证
openenv validate
```
## 基线分数
Tested with multiple models via NVIDIA NIM (`https://integrate.api.nvidia.com/v1`):
### google/gemma-4-31b-it(推荐)
| Task | Difficulty | Score | Steps |
|------|-----------|-------|-------|
| task_1_malware | Easy | 0.68 | 16 |
| task_2_phishing | Medium | 0.61 | 15 |
| task_3_ransomware | Medium-Hard | 0.62 | 13 |
| task_4_apt | Hard | ~0.30 | ~35 |
| task_5_insider | Hard | ~0.25 | ~30 |
### 模型比较
| Model | T1 | T2 | T3 | T4 | T5 | Avg |
|-------|-----|-----|-----|-----|-----|-----|
| gemma-4-31b-it | 0.68 | 0.61 | 0.62 | ~0.30 | ~0.25 | ~0.49 |
| nemotron-3-super-120b | 0.70 | 0.13 | 0.18 | 0.00 | 0.34 | 0.27 |
| mistral-small-4-119b | 0.23 | 0.22 | 0.17 | 0.14 | 0.56 | 0.26 |
| qwen3.5-122b | 0.47 | 0.13 | 0.16 | 0.02 | 0.01 | 0.16 |
The hard tasks (APT with adaptation, Insider threat) genuinely challenge frontier models, demonstrating meaningful difficulty progression.
## MITRE ATT&CK 覆盖率
Scenarios cover 15+ techniques across the full kill chain:
| Tactic | Techniques |
|--------|-----------|
| Initial Access | T1566.001/002 (Spearphishing) |
| Execution | T1059.001 (PowerShell), T1204.002 (Malicious File) |
| Persistence | T1547.001 (Registry Run Keys) |
| Credential Access | T1003.001 (LSASS), T1078 (Valid Accounts) |
| Lateral Movement | T1021.001 (RDP), T1021.002 (SMB), T1570 (Tool Transfer) |
| Collection | T1005 (Local Data), T1213 (Data from Repos), T1560.001 (Archive) |
| Exfiltration | T1041 (Over C2), T1052.001 (USB) |
| Impact | T1486 (Ransomware), T1490 (Inhibit Recovery) |
标签:Agent评估, AI智能体, APT攻击, C2通信, DAST, Docker, IR, JARM, NIST框架, OpenEnv, PyTorch, TGT, 云资产清单, 内存分析, 内部威胁, 动态攻击, 勒索软件, 安全运营中心, 安全防御评估, 恶意软件分析, 攻防演练, 数字取证, 数据窃取, 模拟环境, 网络安全, 网络映射, 自动化脚本, 自适应对手, 请求拦截, 逆向工具, 逆向工程, 钓鱼检测, 隐私保护, 黑客松