4161726f6e/Synthetic-Log-Hunt-Forge

GitHub: 4161726f6e/Synthetic-Log-Hunt-Forge

用于蓝队训练和检测工程的合成日志生成器，自动产出包含攻击场景的Windows和Syslog数据集并封装为CTFd挑战。

Stars: 0 | Forks: 0

# Synthetic Log Hunt Forge (SLHF) **SLHF** 生成逼真的合成 Windows Event Log (JSONL) 和 Syslog 数据集，用于蓝队训练、威胁狩猎和检测工程 —— 封装为完全自动化的 CTFd 挑战。 ## 功能说明每次运行会生成： - **各主机的 Windows JSONL 日志** — Security 频道事件，包含逼真的关联字段（`logon_id`、进程谱系、AD 对象属性） - **各主机的 Syslog** — 来自 Linux 主机的 RFC 5424 风格文本行 - **Ground truth 标签** — 攻击窗口、攻击阶段、MITRE ATT&CK ID - **CTFd 挑战导出** — 五个完全自动评分的挑战，包含正确的 flag、提示、前置依赖链和日志文件附件 - **学习者场景简报** — 调查目标、追踪技术入门、策略类别提示（不含答案） - **讲师笔记** — 完整的答案参考、锚点事件、MITRE 映射 - **验证链报告** — 每次尝试的诊断 JSON，包含修复建议 - **分析报告** — 选定的 playbook、事件计数、历史成功率 ## 快速开始 ``` git clone https://github.com/YOUR-ORG/slhf.git cd slhf python -m venv .venv # Windows: .venv\Scripts\activate # Linux / macOS: source .venv/bin/activate pip install -r requirements.txt # 生成包含所有 9 个攻击场景的数据集 python cli.py --output ./out --seed 1337 --attacks 9 --anomalies 5 --days 8 --noise high ``` 成功后，输出内容将写入 `./out/`。 ## 输出结构 ``` out/ ├─ logs/ │ ├─ windows/.jsonl # One file per Windows host │ └─ syslog/.log # One file per Linux host ├─ ground_truth/ │ └─ labels.jsonl # Malicious event labels ├─ reports/ │ ├─ scenario_brief.md # Learner-facing brief (no answers) │ ├─ instructor_notes.md # Instructor answer key │ └─ timeline.md # Ground-truth phase/anchor timeline ├─ ctfd/ │ ├─ ctf.toml # ctfcli reference file │ ├─ README.txt # Setup instructions │ ├─ logs.zip # Shared log archive │ ├─ challenges/ │ │ ├─ investigation__identify-malicious-hosts/ │ │ │ ├─ challenge.yml │ │ │ └─ dist/ │ │ │ ├─ logs.zip │ │ │ └─ scenario_brief.md │ │ ├─ detection__critical-event-ids/ │ │ ├─ timeline__attack-phases-ordered/ │ │ ├─ threat-intel__mitre-attack-techniques/ │ │ └─ investigation__false-positives/ │ └─ instructor/ │ ├─ answer_key.json │ └─ full_labels.jsonl ├─ validation_chain/ │ ├─ attempt_001_validation_report.json │ └─ summary.json └─ analytics/ └─ analytics_report.json ``` ## CLI 参考 ``` python cli.py [OPTIONS] Options: --output DIR Output directory (required) --seed INT Random seed (default: 1337) --attacks INT Attack playbooks to inject, min 1 (default: 2) --anomalies INT Benign decoy playbooks to inject, min 0 (default: 3) --days INT Observation window in days, min 1 (default: 3) --noise LEVEL Background noise level: low | medium | high (default: high) --noise-multiplier N Scale factor on top of noise preset (default: 1.0) Use 0.25 for fast test runs, 2.0 for denser noise --max-attempts INT Validation retry limit (default: 10) --list-playbooks Print available playbooks and exit --reset-metrics Clear accumulated run history and exit ``` ### 调用示例 ``` # 所有 9 个攻击、所有 5 个 decoys、8 天窗口 python cli.py --output ./out --seed 1337 --attacks 9 --anomalies 5 --days 8 # 最小噪声的快速冒烟测试 python cli.py --output ./test-out --seed 42 --attacks 1 --noise low --noise-multiplier 0.25 # 列出可用的 playbooks python cli.py --list-playbooks # 修复 bug 后重置历史指标 python cli.py --reset-metrics ``` ## 攻击 playbook | Playbook | 阶段 | 关键事件 ID | MITRE | |---|---|---|---| | `windows_dcsync_chain` | initial_access → privileged_context → credential_access | 4624, 4662 | T1078, T1003.006 | | `windows_lolbin_chain` | initial_access → script_execution → payload_execution | 4688 chain | T1204.002, T1059.005, T1059.001 | | `cloud_to_ad` | web_exploit → shell → ad_activity → credential_tool_execution | syslog + 4769 + 4688 | T1190, T1059, T1558, T1003 | | `lateral_movement_pth` | initial_compromise → explicit_credential_use → lateral_logon → remote_execution | 4624, 4648, 4688 | T1078, T1550.002, T1569.002 | | `persistence_scheduled_task` | initial_access → task_creation → payload_execution | 4624, 4698, 4688 | T1078, T1053.005, T1059.001 | | `ransomware_precursor` | initial_access → shadow_copy_deletion → volume_enumeration → service_install | 4624, 4688, 7045 | T1078, T1490, T1082, T1543.003 | | `brute_force_spray` | spray_attempt × 3 → successful_logon → discovery | 4625, 4624, 4688 | T1110.003, T1078, T1087.002 | | `kerberoasting` | initial_access → spn_ticket_request × 3 | 4624, 4769 (etype 0x17) | T1078, T1558.003 | | `data_exfiltration` | initial_access → data_staging → archive_creation → exfiltration_http | 4624, 4688, syslog | T1078, T1074.001, T1560.001, T1048.002 | ## 良性诱饵 playbook | Playbook | 模拟对象 | 区分因素 | |---|---|---| | `backup_activity` | DCSync (4662 包含复制属性) | `backup_service` 账户 | | `admin_powershell` | LOLBIN 链 (wscript → powershell) | `admin` 账户，工作时间 | | `helpdesk_remote_logon` | Pass-the-Hash (4624 type 3 + 4648) | `helpdesk_svc` 账户 | | `legitimate_scheduled_task` | 持久化任务 (4698) | `patch_svc` 账户，已签名的二进制路径 | | `backup_shadow_copy` | 勒索软件前兆 (vssadmin + 7045) | `backup_svc` 账户，使用 `list` 而非 `delete` | ## CTFd 集成生成的五个挑战已完全实现自动评分，并带有前置依赖链： ``` Identify Malicious Hosts (200 pts) ─┬─▶ Critical Event IDs (150 pts) │ ▶ Attack Phases (250 pts) │ ▶ MITRE ATT&CK (150 pts) └─▶ False Positives (100 pts) Total: 850 pts ``` ### 设置（每个 CTFd 实例仅需一次） ``` pip install ctfcli cd out/ctfd ctf init # enter your CTFd URL and admin API token when prompted ctf challenge install challenges/investigation__identify-malicious-hosts ctf challenge install challenges/detection__critical-event-ids ctf challenge install challenges/timeline__attack-phases-ordered ctf challenge install challenges/threat-intel__mitre-attack-techniques ctf challenge install challenges/investigation__false-positives ``` 然后在 CTFd 管理面板中，准备好后将每个挑战设置为 **Visible**。 ### 重新生成（新 seed） ``` python cli.py --output ./out --seed 9999 --attacks 9 --anomalies 5 --days 8 cd out/ctfd ctf challenge sync challenges/* ``` `sync` 会更新 flag 并重新上传日志附件，同时不会丢失解答历史记录。 ## 编写新 playbook 将攻击 playbook 放在 `playbooks/attack/*.yaml` 中，良性诱饵放在 `playbooks/benign/*.yaml` 中。schema 在加载时会进行验证。 **最低必填字段：** ``` playbook_id: my_new_attack classification: malicious # malicious | suspicious description: "Brief description" phases: - id: phase_name # snake_case, unique within playbook event_id: 4688 # Windows event ID, OR use source: syslog process_chain: # for 4688: explicit parent/child parent: cmd.exe child: powershell.exe command_line_contains: - "-enc" technique: T1059.001 # MITRE ATT&CK technique ID anchors: - event_id: 4688 condition: "description of what makes this distinctive" correlation: - logon_id # fields used to link phases together ``` **支持的事件 ID：** 4624, 4625, 4648, 4662, 4663, 4688, 4698, 4769, 7045，以及用于 Linux 日志阶段的 `source: syslog`。在生成数据集之前，运行 `python cli.py --list-playbooks` 以验证您的新 playbook 是否正确加载。 ## 架构 ``` cli.py └── slhf/regenerator.py Adaptive retry loop with seed bumping └── slhf/engine.py Dataset orchestration ├── slhf/topology.py Seed-variable network topology ├── slhf/playbook_loader.py YAML loading + schema validation ├── slhf/noise.py Background event generation ├── slhf/injector.py Playbook-driven event injection ├── slhf/ground_truth.py Label collection ├── slhf/emitters.py JSONL / syslog file writers ├── slhf/ctfd.py CTFd challenge export └── slhf/scenario_brief.py Learner + instructor documents slhf/provability_validator.py Post-generation validation (8 check types) slhf/adaptive_regenerator.py Config adjustment on validation failure slhf/learning/ ├── learner.py Failure pattern memory (cross-run persistence) ├── metrics.py Run success/failure metrics └── tuner.py Pre-run config tuning from history ``` ## 运行测试 ``` pip install pytest pytest tests/test_slhf.py -v ``` 包含 49 个测试，覆盖 flag 格式化、playbook schema 验证、注入器阶段处理、可证明性验证器故障代码、CLI 参数验证、RNG 确定性和调查规则推导。 ## 环境要求 - Python 3.10+ - `pyyaml` — 用于 playbook 加载和 CTFd 挑战导出 - `ctfcli` — 用于将挑战上传到 CTFd（可选，单独安装：`pip install ctfcli`） - `git` — ctfcli 执行 `ctf init` 时必需 ## 许可证 MIT

标签：CTFd, 安全训练, 恶意代码分类, 数据合成, 日志生成, 时序数据库, 逆向工具