aquarink/humanized-cti-llm

GitHub: aquarink/humanized-cti-llm

该项目通过事件抽象和 MITRE ATT&CK 对齐，将网络遥测数据转化为可解释、适配 LLM 的拟人化网络威胁情报报告。

Stars: 0 | Forks: 0

# humanized-cti-llm **仓库描述：** 事件级别的网络遥测抽象、MITRE ATT&CK 对齐、可解释的证据追踪，以及适配 LLM 的拟人化网络威胁情报（CTI）报告生成。该原型遵循 `Pebri___CTI.pdf` 中的方法学方向：原始网络遥测数据不会直接发送给 LLM。数据首先被转换为事件抽象，随后结合 MITRE ATT&CK 上下文进行丰富，最后生成分析师风格的 CTI 报告。 ## 建议的 GitHub 元数据 - 仓库名称：`humanized-cti-llm` - 简短描述：`使用事件抽象和 MITRE ATT&CK 对齐，从网络遥测生成可解释且适配 LLM 的 CTI 报告。` - 标签：`cyber-threat-intelligence`, `cti`, `llm`, `mitre-attack`, `xai`, `network-telemetry`, `ddos`, `incident-response` ## 建议的数据集 - `datasets/UWF-ZeekData24-csv/`：原型的主数据集，因为它已经具备了流量特征和 MITRE 标签（`label_tactic`、`label_technique`、`label_cve`）。 - `datasets/Syn.csv`、`datasets/SDN-TCP-SYN ATTACK-DDOS-CLEAN.csv` 和 `datasets/BCCC-Cpacket-Cloud-DDoS-2024.csv`：用于体量型案例研究的简单 DDoS 数据集。其 schema 仅包含 IP、时间戳和标签，因此抽象程度较为有限。 ## 运行 Pipeline 从 UWF-ZeekData24 生成 CTI 报告： ``` python3 scripts/generate_cti_reports.py --dataset uwf24 --limit-files 4 --max-incidents 8 ``` 默认情况下，pipeline 会过滤掉 benign/normal/`none` 流量，以便报告专注于与 CTI 相关的事件。如果想生成 benign 的对照报告，请添加 `--include-benign`。从简单的 SYN/DDoS 数据集生成报告： ``` python3 scripts/generate_cti_reports.py --dataset syn --max-incidents 5 python3 scripts/generate_cti_reports.py --dataset sdn-syn --max-incidents 5 python3 scripts/generate_cti_reports.py --dataset bccc --max-incidents 5 ``` 结果将生成在： ``` outputs/incidents.jsonl outputs/reports.md outputs/evaluation_rubric.csv ``` ## 可选的 LLM 集成默认情况下，脚本使用本地模板渲染器，以便在没有 API key 的情况下测试 pipeline。若要使用 OpenAI API： ``` export OPENAI_API_KEY="..." python3 scripts/generate_cti_reports.py --dataset uwf24 --use-llm --model gpt-4.1-mini ``` 输入给 LLM 的内容仅为事件抽象的 JSON，而不是原始遥测数据。 ## DistilBERT Baseline DistilBERT 被用作中间事件分类的定量 baseline，而不是用于生成 CTI 报告。 ``` python3 scripts/run_distilbert_baseline.py \ --task uwf24_tactic \ --uwf24-root datasets/UWF-ZeekData24-csv \ --output-dir distilbert_outputs \ --epochs 2 ``` 对于云环境： ``` python3 scripts/run_distilbert_baseline.py \ --task uwf24_tactic \ --uwf24-root ~/cti_project/dataset/UWF-ZeekData24-csv \ --dataset-root ~/datasets \ --output-dir ~/cti_project/distilbert_outputs ``` 默认的 DistilBERT baseline 避免了标签泄露，剔除了模糊的文本抽象，并使用了 class weighting。对于不平衡的任务，请使用 macro-F1 和 per-class F1 作为主要评估指标。

标签：DLL 劫持, IP 地址批量处理, Petitpotam, 可解释AI, 大语言模型, 威胁情报, 库, 应急响应, 开发者工具, 网络安全, 网络遥测, 逆向工具, 配置错误, 隐私保护