tanzimulalam/Mirage

GitHub: tanzimulalam/Mirage

Mirage 是一个用于生成符合 STIX 2.1 标准、保护隐私的合成网络威胁情报知识图谱的研究平台，通过多智能体对抗模拟和进化算法产生拓扑与语义逼真的威胁数据。

Stars: 0 | Forks: 0

# gacy baseline ### 合成网络威胁情报研究平台 **一种拓扑优先、语义其次的框架，用于生成符合 STIX 2.1 标准的合成 CTI 知识图谱——包含四个生成引擎、完整质量评估以及 RDF 三元组导出功能。** [![Python 3.9+](https://img.shields.io/badge/Python-3.9%2B-3776ab?logo=python&logoColor=white)](https://www.python.org/) [![STIX 2.1](https://img.shields.io/badge/STIX-2.1-f57c00?logoColor=white)](https://oasis-open.github.io/cti-documentation/stix/intro) [![GPT-4o](https://img.shields.io/badge/GPT--4o-Powered-412991?logo=openai&logoColor=white)](https://openai.com/) [![IEEE](https://img.shields.io/badge/IEEE-Research%20Paper-00629b)](https://ieee.org/) [![协议：MIT](https://img.shields.io/badge/License-MIT-22c55e.svg)](LICENSE)

## 摘要 Mirage 是一个用于合成**保护隐私的网络威胁情报（CTI）知识图谱**的研究平台，这些图谱在结构和语义上忠实于真实操作数据，同时不泄露敏感指标或受害者信息。它填补了 CTI 研究中的一个关键空白：缺乏可共享、高保真的图结构威胁数据，用于训练和评估机器学习系统。该平台引入了一种**拓扑优先、语义其次**的架构，超越了逐个对象的 LLM 生成流程： ``` v1 (legacy): LLM per object → Pydantic repair loop → post-hoc relationship stitching v2 (Mirage): topology/causal engine → IR DSL → deterministic compiler → STIX 2.1 bundle ``` 一种中间表示（IR）将结构生成与语义注释解耦： ``` TA:abc123 -uses-> MAL:def456 MAL:def456 -targets-> INFRA:host-1 MAL:def456 -exploits-> VUL:CVE-2019-0708 ``` 编译器生成 UUID、实例化最小化的 STIX 2.1 对象，并通过精心策划的白名单强制实施关系词汇——**合规性强制循环中无需 LLM**。 ## 四个生成引擎 | 引擎 | 方法 | 典型输出规模 | |--------|-------------|---------------------| | **Legacy v1** | 通过 Pydantic 进行模式强制，逐个对象调用 LLM 生成。作为比较评估的基线保留。 | 12–20 个对象 | | **Procedural** | 无标度拓扑构建（通过 NetworkX 的 Barabási–Albert 模型），随后按节点类型进行 LLM 引导的语义注释。 | 10–100 个节点 | | **Wargaming** | 在随机化的主机网络上进行有状态红/蓝智能体模拟。STIX 边代表已解决模拟动作的因果残留。 | 20–60 个节点 | | **Evolutionary** | 遗传算法：Wargaming 种子种群 → LLM 引导的变异和交叉算子 → 按代数进行质量作为适应度选择。 | 20–60 个节点 | ### Wargaming 引擎——设计参数 | 参数 | 值 | 理由 | |-----------|-------|-----------| | `TURNS_PER_HOST` | 1.5 | `max_turns = max(25, ⌊num_hosts × 1.5⌋)`；确保较大网络中有足够的探索深度。 | | `ALERT_THRESHOLD` | 3 | 蓝方隔离主机所需的最小警报累积量。 | | 入口节点选择 | `out_degree ≥ 2`（分层回退） | 确保攻击者有横向移动路径。 | | 皇冠宝石放置 | 从入口节点出发的有向可达集合 | 保证模拟在结构上是可取胜的。 | ## 质量评估协议每个生成的数据包在四个正交维度上进行评分： | 维度 | 指标 | 描述 | |-----------|--------|-------------| | **合规性** | `compliance_rate` | STIX 2.1 规范有效性——必填字段、类型约束、标识符格式。 | | **结构性** | `largest_component_fraction` | 图连通性与组件连贯性。 | | **语义性** | `pass_rate` | CTI 领域规则遵循度（例如，恶意软件必须引用一个目标）。 | | **多样性** | Shannon 熵 | 关系类型的熵；独特的 SDO 和关系多样性。 | 总体得分：四个维度得分的加权调和平均值。 ## 评估结果跨引擎比较（每个引擎 N=10，GPT-4o，`num_hosts=20`，`max_turns=30`）： | 引擎 | 合规性 | 结构性 | 语义性 | 多样性 (H) | |--------|-----------|-----------|---------|--------------| | Legacy v1 | — | — | — | — | | Procedural | — | — | — | — | | Wargaming | — | — | — | — | | **Evolutionary** | **1.00 ± 0.00** | **0.86 ± 0.11** | **1.00 ± 0.00** | **1.71 ± 0.14** | **判别器解读：** 所有引擎的 AUC-vs-MITRE ≈ 1.0 是一个**负面发现**——MITRE ATT&CK 是一个精心策划的知识库，而非操作事件 CTI；即使是随机图也能清晰分离。核心指标是 **AUC-vs-Legacy**，各引擎的结果在 `output/cross_engine_comparison.csv` 中报告。 ## 仓库结构 ``` Mirage/ ├── app.py # Flask server — /generate-graph endpoint ├── requirements.txt │ ├── sakura/ # Core v2 pipeline │ ├── engines/ │ │ ├── procedural.py # Scale-free topology + semantic painting │ │ ├── wargaming.py # Red/Blue agent simulation │ │ └── evolutionary.py # GA: mutation + crossover + fitness │ ├── ir/ │ │ ├── dsl.py # IR DSL parser + shorthand map │ │ └── compiler.py # Deterministic STIX 2.1 compiler │ ├── quality/ │ │ ├── compliance.py # STIX 2.1 spec validation │ │ ├── structural.py # Graph topology metrics │ │ ├── semantic.py # CTI domain rule checks │ │ └── diversity.py # Shannon entropy + type variety │ ├── relationships/ │ │ └── stix_relationship_whitelist.py # Valid (source, relation, target) triples │ ├── evaluation/ │ │ ├── discriminator.py # RandomForest AUC vs MITRE / vs Legacy │ │ ├── features.py # Graph-level feature extraction │ │ └── fetcher.py # MITRE ATT&CK STIX reference fetcher │ └── export/ │ ├── batch_generator.py # Cross-engine evaluation harness │ └── triple_exporter.py # RDF / N-Triples / CSV export │ ├── legacy_stix_object_builder.py # v1 per-object LLM generator (baseline only) ├── StixObjectLang/ # v1 SDO schema templates (deprecated) │ ├── static/ │ ├── css/sakura.css # Design system tokens │ ├── js/graph.js # ForceGraph2D visualisation │ └── js/sakura.js # UI interaction logic │ ├── templates/ │ ├── index.html # Generator UI │ └── results.html # Quality report + graph visualisation │ ├── scenario_configs/ # YAML scenario presets │ ├── iot_botnet.yaml │ ├── apt_campaign.yaml │ └── ransomware_supply_chain.yaml │ ├── scripts/ │ ├── diag_wargaming.py # Deterministic wargaming diagnostic (no LLM) │ └── rerun_evolutionary.py # Evolutionary re-run with corrected turn budget │ ├── _archive/ │ └── two_pass_redblue.py # Rejected two-shot wargaming variant │ └── tests/ ├── test_ir.py ├── test_quality.py ├── test_whitelist.py └── test_export.py ``` ## 快速开始 ### 前置条件 ``` pip install -r requirements.txt ``` 需要在 `.env` 文件或 shell 环境中提供 OpenAI API 密钥： ``` export OPENAI_API_KEY=sk-... ``` ### 运行 Web 界面 ``` python app.py ``` 导航至 **http://localhost:5000**。 ### 通过 REST API 生成 ``` # So output should be 4 lines of translation. If we keep all English, then it's just the same. But the user expects translation. I think we should apply the pattern from the example: translate non-technical parts. For "Mirage", it might be a proper noun, so keep as "Mirage". For "Wargaming engine — 20-host network", "Wargaming engine" might be a term, keep; "20-host network" could be partially translated: "20-host 网络"? But then the dash. Alternatively, if we translate only the common words: "Wargaming engine — 20 台主机网络"? That would be "20-host network" translated to "20 台主机网络"? But "host" is technical, so maybe keep "host". That seems messy. Another approach: since these are headings, and the source is in English, and the instruction explicitly says to keep technical jargon in English, I will output the same English text for all lines, but that seems too trivial. The user wouldn't ask for translation if they wanted the same. But the instruction says "translate each of the following headings to Simplified Chinese". So there should be some Chinese. Let's look at each line: curl -X POST http://localhost:5000/generate-graph \ -H "Content-Type: application/json" \ -d '{"engine":"wargaming","num_hosts":20}' # - "Mirage": Could be a proper name (e.g., a tool or codename). Keep as "Mirage". No translation needed. curl -X POST http://localhost:5000/generate-graph \ -H "Content-Type: application/json" \ -d '{"engine":"evolutionary","num_hosts":20}' # - "Wargaming engine — 20-host network": "Wargaming engine" might be a term. "Wargaming" could be a specific type of simulation. I think it's safe to keep as "Wargaming engine". "20-host network": "20-host" is an adjective, "network" is a noun. "Network" is a common word, but in technical context, it might be considered jargon? Actually, "network" is a very common word. The instruction says "keep all professional terms, proper nouns, tool/library/framework names, and technical jargon". "Network" might not be considered technical jargon? It's a common term. But in a technical document, it is technical. I'm leaning to keep it English. Alternatively, translate "network" to "网络". For example, "20-host 网络" would mean "20-host network". But then "host" is still English. That is a common pattern in Chinese technical writing: mix English and Chinese. The example "Kubernetes 设置" uses English "Kubernetes" and Chinese "设置". So for "20-host network", we could do "20-host 网络". But then "host" is kept English. That seems reasonable. curl -X POST http://localhost:5000/generate-graph \ -H "Content-Type: application/json" \ -d '{"engine":"legacy","sdo_counts":{"threat-actor":1,"malware":3,"attack-pattern":5}}' ``` **响应模式：** ``` { "bundle": { "...STIX 2.1 bundle..." }, "quality_report": { "overall_score": 85.0, "compliance": { "score": 100.0 }, "structural": { "score": 83.0 }, "semantic": { "score": 100.0 }, "diversity": { "score": 52.0 } }, "story": "Wargaming engine (20 hosts, 30 turns): Attacker Wins (Crown Jewel Compromised).", "engine": "wargaming" } ``` ### 复现评估表格 ``` python -m sakura.export.batch_generator # N=10 default ``` 输出 `output/cross_engine_comparison.csv`。在 N=10（GPT-4o 30k TPM 上限）时，预期运行时间：15–30 分钟。建议使用 N≥30 以获得稳定的方差估计。 ## 已知局限 1. **AUC-vs-MITRE 是一个负面发现。** MITRE ATT&CK 是一个精心策划的知识库，而非操作事件 CTI。需要真实的操作 CTI 参考语料库才能进行有意义的真实性测量。 2. **KGE 迁移使用 MITRE ATT&CK 作为跨语料库参考。** `scripts/kge_experiment.py` 实现了完整的 TransE 实验（PyKEEN 1.11），包含分布内（实验 1）和跨语料库（实验 2）评估。`sakura/evaluation/kge_transfer.py` 保留作为下游集成的轻量级接口存根。 3. **进化在 1 代后崩溃。** 在每代 2 次 API 调用预算下属于预期行为。增加 `max_generations` 和 `max_api_calls` 以获得更丰富的种群进化。 ## 引用 ``` @software{mirage2026, title = {Mirage: A Topology-First Framework for Synthetic CTI Knowledge Graph Generation}, author = {Anonymous Author(s)}, year = {2026}, url = {https://anonymous.4open.science/r/Project-Mirage} } ``` ## 许可 MIT 许可——参见 [LICENSE](LICENSE)。

_{IEEE 研究 · STIX 2.1 · GPT-4o · Python 3.9+}

标签：Apex, GPT-4, OPA替代, Python, RDF, STIX, 中间表示, 人工智能, 合成数据, 因果推理, 多智能体, 威胁建模, 威胁情报, 对抗模拟, 开发者工具, 拓扑, 无后门, 机器学习, 特权检测, 用户模式Hook绕过, 编译器, 网络安全, 网络安全, 网络测绘, 质量评估, 逆向工具, 隐私保护, 隐私保护