4161726f6e/Synthetic-Log-Hunt-Forge
GitHub: 4161726f6e/Synthetic-Log-Hunt-Forge
用于蓝队训练和检测工程的合成日志生成器,自动产出包含攻击场景的Windows和Syslog数据集并封装为CTFd挑战。
Stars: 0 | Forks: 0
# Synthetic Log Hunt Forge (SLHF)
**SLHF** 生成逼真的合成 Windows Event Log (JSONL) 和 Syslog 数据集,用于蓝队训练、威胁狩猎和检测工程 —— 封装为完全自动化的 CTFd 挑战。
## 功能说明
每次运行会生成:
- **各主机的 Windows JSONL 日志** — Security 频道事件,包含逼真的关联字段(`logon_id`、进程谱系、AD 对象属性)
- **各主机的 Syslog** — 来自 Linux 主机的 RFC 5424 风格文本行
- **Ground truth 标签** — 攻击窗口、攻击阶段、MITRE ATT&CK ID
- **CTFd 挑战导出** — 五个完全自动评分的挑战,包含正确的 flag、提示、前置依赖链和日志文件附件
- **学习者场景简报** — 调查目标、追踪技术入门、策略类别提示(不含答案)
- **讲师笔记** — 完整的答案参考、锚点事件、MITRE 映射
- **验证链报告** — 每次尝试的诊断 JSON,包含修复建议
- **分析报告** — 选定的 playbook、事件计数、历史成功率
## 快速开始
```
git clone https://github.com/YOUR-ORG/slhf.git
cd slhf
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Linux / macOS:
source .venv/bin/activate
pip install -r requirements.txt
# 生成包含所有 9 个攻击场景的数据集
python cli.py --output ./out --seed 1337 --attacks 9 --anomalies 5 --days 8 --noise high
```
成功后,输出内容将写入 `./out/`。
## 输出结构
```
out/
├─ logs/
│ ├─ windows/.jsonl # One file per Windows host
│ └─ syslog/.log # One file per Linux host
├─ ground_truth/
│ └─ labels.jsonl # Malicious event labels
├─ reports/
│ ├─ scenario_brief.md # Learner-facing brief (no answers)
│ ├─ instructor_notes.md # Instructor answer key
│ └─ timeline.md # Ground-truth phase/anchor timeline
├─ ctfd/
│ ├─ ctf.toml # ctfcli reference file
│ ├─ README.txt # Setup instructions
│ ├─ logs.zip # Shared log archive
│ ├─ challenges/
│ │ ├─ investigation__identify-malicious-hosts/
│ │ │ ├─ challenge.yml
│ │ │ └─ dist/
│ │ │ ├─ logs.zip
│ │ │ └─ scenario_brief.md
│ │ ├─ detection__critical-event-ids/
│ │ ├─ timeline__attack-phases-ordered/
│ │ ├─ threat-intel__mitre-attack-techniques/
│ │ └─ investigation__false-positives/
│ └─ instructor/
│ ├─ answer_key.json
│ └─ full_labels.jsonl
├─ validation_chain/
│ ├─ attempt_001_validation_report.json
│ └─ summary.json
└─ analytics/
└─ analytics_report.json
```
## CLI 参考
```
python cli.py [OPTIONS]
Options:
--output DIR Output directory (required)
--seed INT Random seed (default: 1337)
--attacks INT Attack playbooks to inject, min 1 (default: 2)
--anomalies INT Benign decoy playbooks to inject, min 0 (default: 3)
--days INT Observation window in days, min 1 (default: 3)
--noise LEVEL Background noise level: low | medium | high (default: high)
--noise-multiplier N Scale factor on top of noise preset (default: 1.0)
Use 0.25 for fast test runs, 2.0 for denser noise
--max-attempts INT Validation retry limit (default: 10)
--list-playbooks Print available playbooks and exit
--reset-metrics Clear accumulated run history and exit
```
### 调用示例
```
# 所有 9 个攻击、所有 5 个 decoys、8 天窗口
python cli.py --output ./out --seed 1337 --attacks 9 --anomalies 5 --days 8
# 最小噪声的快速冒烟测试
python cli.py --output ./test-out --seed 42 --attacks 1 --noise low --noise-multiplier 0.25
# 列出可用的 playbooks
python cli.py --list-playbooks
# 修复 bug 后重置历史指标
python cli.py --reset-metrics
```
## 攻击 playbook
| Playbook | 阶段 | 关键事件 ID | MITRE |
|---|---|---|---|
| `windows_dcsync_chain` | initial_access → privileged_context → credential_access | 4624, 4662 | T1078, T1003.006 |
| `windows_lolbin_chain` | initial_access → script_execution → payload_execution | 4688 chain | T1204.002, T1059.005, T1059.001 |
| `cloud_to_ad` | web_exploit → shell → ad_activity → credential_tool_execution | syslog + 4769 + 4688 | T1190, T1059, T1558, T1003 |
| `lateral_movement_pth` | initial_compromise → explicit_credential_use → lateral_logon → remote_execution | 4624, 4648, 4688 | T1078, T1550.002, T1569.002 |
| `persistence_scheduled_task` | initial_access → task_creation → payload_execution | 4624, 4698, 4688 | T1078, T1053.005, T1059.001 |
| `ransomware_precursor` | initial_access → shadow_copy_deletion → volume_enumeration → service_install | 4624, 4688, 7045 | T1078, T1490, T1082, T1543.003 |
| `brute_force_spray` | spray_attempt × 3 → successful_logon → discovery | 4625, 4624, 4688 | T1110.003, T1078, T1087.002 |
| `kerberoasting` | initial_access → spn_ticket_request × 3 | 4624, 4769 (etype 0x17) | T1078, T1558.003 |
| `data_exfiltration` | initial_access → data_staging → archive_creation → exfiltration_http | 4624, 4688, syslog | T1078, T1074.001, T1560.001, T1048.002 |
## 良性诱饵 playbook
| Playbook | 模拟对象 | 区分因素 |
|---|---|---|
| `backup_activity` | DCSync (4662 包含复制属性) | `backup_service` 账户 |
| `admin_powershell` | LOLBIN 链 (wscript → powershell) | `admin` 账户,工作时间 |
| `helpdesk_remote_logon` | Pass-the-Hash (4624 type 3 + 4648) | `helpdesk_svc` 账户 |
| `legitimate_scheduled_task` | 持久化任务 (4698) | `patch_svc` 账户,已签名的二进制路径 |
| `backup_shadow_copy` | 勒索软件前兆 (vssadmin + 7045) | `backup_svc` 账户,使用 `list` 而非 `delete` |
## CTFd 集成
生成的五个挑战已完全实现自动评分,并带有前置依赖链:
```
Identify Malicious Hosts (200 pts) ─┬─▶ Critical Event IDs (150 pts)
│ ▶ Attack Phases (250 pts)
│ ▶ MITRE ATT&CK (150 pts)
└─▶ False Positives (100 pts)
Total: 850 pts
```
### 设置(每个 CTFd 实例仅需一次)
```
pip install ctfcli
cd out/ctfd
ctf init # enter your CTFd URL and admin API token when prompted
ctf challenge install challenges/investigation__identify-malicious-hosts
ctf challenge install challenges/detection__critical-event-ids
ctf challenge install challenges/timeline__attack-phases-ordered
ctf challenge install challenges/threat-intel__mitre-attack-techniques
ctf challenge install challenges/investigation__false-positives
```
然后在 CTFd 管理面板中,准备好后将每个挑战设置为 **Visible**。
### 重新生成(新 seed)
```
python cli.py --output ./out --seed 9999 --attacks 9 --anomalies 5 --days 8
cd out/ctfd
ctf challenge sync challenges/*
```
`sync` 会更新 flag 并重新上传日志附件,同时不会丢失解答历史记录。
## 编写新 playbook
将攻击 playbook 放在 `playbooks/attack/*.yaml` 中,良性诱饵放在 `playbooks/benign/*.yaml` 中。schema 在加载时会进行验证。
**最低必填字段:**
```
playbook_id: my_new_attack
classification: malicious # malicious | suspicious
description: "Brief description"
phases:
- id: phase_name # snake_case, unique within playbook
event_id: 4688 # Windows event ID, OR use source: syslog
process_chain: # for 4688: explicit parent/child
parent: cmd.exe
child: powershell.exe
command_line_contains:
- "-enc"
technique: T1059.001 # MITRE ATT&CK technique ID
anchors:
- event_id: 4688
condition: "description of what makes this distinctive"
correlation:
- logon_id # fields used to link phases together
```
**支持的事件 ID:** 4624, 4625, 4648, 4662, 4663, 4688, 4698, 4769, 7045,以及用于 Linux 日志阶段的 `source: syslog`。
在生成数据集之前,运行 `python cli.py --list-playbooks` 以验证您的新 playbook 是否正确加载。
## 架构
```
cli.py
└── slhf/regenerator.py Adaptive retry loop with seed bumping
└── slhf/engine.py Dataset orchestration
├── slhf/topology.py Seed-variable network topology
├── slhf/playbook_loader.py YAML loading + schema validation
├── slhf/noise.py Background event generation
├── slhf/injector.py Playbook-driven event injection
├── slhf/ground_truth.py Label collection
├── slhf/emitters.py JSONL / syslog file writers
├── slhf/ctfd.py CTFd challenge export
└── slhf/scenario_brief.py Learner + instructor documents
slhf/provability_validator.py Post-generation validation (8 check types)
slhf/adaptive_regenerator.py Config adjustment on validation failure
slhf/learning/
├── learner.py Failure pattern memory (cross-run persistence)
├── metrics.py Run success/failure metrics
└── tuner.py Pre-run config tuning from history
```
## 运行测试
```
pip install pytest
pytest tests/test_slhf.py -v
```
包含 49 个测试,覆盖 flag 格式化、playbook schema 验证、注入器阶段处理、可证明性验证器故障代码、CLI 参数验证、RNG 确定性和调查规则推导。
## 环境要求
- Python 3.10+
- `pyyaml` — 用于 playbook 加载和 CTFd 挑战导出
- `ctfcli` — 用于将挑战上传到 CTFd(可选,单独安装:`pip install ctfcli`)
- `git` — ctfcli 执行 `ctf init` 时必需
## 许可证
MIT
标签:CTFd, 安全训练, 恶意代码分类, 数据合成, 日志生成, 时序数据库, 逆向工具