ait-aecid/alert-data-set

GitHub: ait-aecid/alert-data-set

用于生成和分析多入侵检测系统告警数据集的工具集，支持告警排序、过滤与聚合以评估检测器在多步骤攻击场景下的表现。

Stars: 31 | Forks: 1

# AIT Alert Data Set (AIT-ADS) 本仓库包含用于生成和分析 [AIT Alert Data Set (AIT-ADS)](https://zenodo.org/record/8263181) 的脚本。该数据集包含了应用于 [AIT Log Data Set V2.0 (AIT-LDSv2)](https://zenodo.org/record/5789064) 的三个入侵检测系统（AMiner、Wazuh 和 Suricata）产生的告警。在下文中，我们将解释如何生成告警数据集，以防您需要更改检测器的配置。当然，您不需要自己生成数据；如果您只想分析数据并将其用于评估，请从 [Zenodo](https://zenodo.org/record/8263181) 下载 AIT-ADS，或者直接前往[分析](#analysis)章节。如果您使用了 AIT-ADS 或本仓库中提供的任何资源，请引用以下出版物： * Landauer, M., Skopik, F., Wurzenberger, M. (2024): [Introducing a New Alert Data Set for Multi-Step Attack Analysis.](https://dl.acm.org/doi/abs/10.1145/3675741.3675748) Proceedings of the 17th Cyber Security Experimentation and Test Workshop. \[[PDF](https://dl.acm.org/doi/pdf/10.1145/3675741.3675748)\] ## 生成 ### Wazuh 和 Suricata 要生成告警，首先请确保您的系统上已经安装并设置好了 [Wazuh](https://wazuh.com/)。然后从 [AIT-LDSv2](https://zenodo.org/record/8263181) 下载其中一个场景，并将其解压到 `/home/ubuntu/aitldsv2/` 路径下（如果您使用了其他路径，请确保在后续使用的 python 脚本中修改相应的路径）。由于 Wazuh 需要实时摄取日志，我们准备了一个脚本，用于读取日志事件的时间戳并对其进行重放——这意味着生成告警所需的时间与 AIT-LDSv2 的时间跨度一样长，每个场景大约需要 4-6 天。创建一个用于存放生成日志的目录；我们在下文中使用 `/var/log/replay/`（如果您使用了其他路径，请确保在 python 脚本和 ossec.conf 文件中修改相应的路径）。最后，启动 Wazuh 客户端，克隆本仓库，在 `replay_logs.py` 脚本中指定场景名称，并运行该脚本来重放日志数据。以下代码片段总结了 fox 场景的这些步骤。 ``` ubuntu@ubuntu:~$ mkdir aitldsv2 ubuntu@ubuntu:~$ cd aitldsv2/ ubuntu@ubuntu:~/aitldsv2$ wget https://zenodo.org/record/5789064/files/fox.zip ubuntu@ubuntu:~/aitldsv2$ unzip fox.zip -d fox ubuntu@ubuntu:~/aitldsv2$ cd .. ubuntu@ubuntu:~$ systemctl restart wazuh-agent.service ubuntu@ubuntu:~$ git clone https://github.com/ait-aecid/alert-data-set.git ubuntu@ubuntu:~$ cd alert-data-set/ ubuntu@ubuntu:~/alert-data-set$ vim replay_logs.py ubuntu@ubuntu:~/alert-data-set$ python3 replay_logs.py ``` 脚本运行完成后，从 Wazuh Manager 收集告警。请注意，存储在 elastic 数据库中的告警也包含了来自 Suricata 的告警。原因是在运行 AIT-LDSv2 的模拟时，Suricata 已经部署了；很方便的是，Wazuh 会收集这些告警，并将其与自身规则触发的告警一起存储在数据库中。将告警从数据库复制到本地文件的一种方法是使用 [elasticdump](https://github.com/elasticsearch-dump/elasticsearch-dump)： ``` ubuntu@ubuntu:~$ export NODE_TLS_REJECT_UNAUTHORIZED=0 ubuntu@ubuntu:~$ npx elasticdump --input=https:// --output=/home/ubuntu/fox_wazuh.json --type=data --limit=5000 ``` ### AMiner 由于 `replay_logs.py` 脚本将所有轮转的日志存储在单个文件中，并将它们重命名为统一的方案，我们利用这些日志通过 AMiner 进行异常检测。将日志从 `/var/log/replay/` 复制到 `/home/ubuntu/replay//`（如果您使用了其他路径，请确保在 `aminer_config.yml` 文件中修改相应的路径）。请确保您的系统上已经建立并运行了 [AMiner](https://github.com/ait-aecid/logdata-anomaly-miner) 实例。然后在 `aminer_config.yml` 文件中指定输入文件的路径以及包含异常的输出文件路径（默认为 `/tmp/aminer_out.log`），并运行 AMiner。请注意，AMiner 能够以取证方式处理日志，因此应该能在几分钟内完成。 ``` ubuntu@ubuntu:~/alert-data-set$ cp -r /var/log/replay /home/ubuntu/replay/fox ubuntu@ubuntu:~/alert-data-set$ vim aminer_config.yml ubuntu@ubuntu:~/alert-data-set$ aminer -C -c aminer_config.yml ``` 对所有其他场景执行相同的操作。 ## 分析 ### 下载 AIT-ADS 在本仓库中创建一个目录，并按如下方式下载 AIT-ADS。 ``` ubuntu@ubuntu:~/alert-data-set$ mkdir alerts_raw ubuntu@ubuntu:~/alert-data-set$ cd alerts_raw ubuntu@ubuntu:~/alert-data-set/alerts_raw$ wget https://zenodo.org/record/8263181/files/ait_ads.zip ubuntu@ubuntu:~/alert-data-set/alerts_raw$ unzip ait_ads.zip ubuntu@ubuntu:~/alert-data-set/alerts_raw$ cd .. ``` ### 告警优先级排序运行以下脚本来分析数据。这将 (i) 应用优先级排序并输出一个包含所有检测器的（LaTeX 格式的）表格，并且 (ii) 在 `alerts_csv` 目录中创建告警出现次数的 csv 文件以供进一步分析。这些 csv 文件将包含基于时间间隔（`time_label`）和独立事件（`event_label`）的标签，但是，后者要求 AIT-LDSv2 和 AIT-NDS 位于 `analyze.py` 中指定的路径下，并且 `do_event_labeling` 设置为 True。 ``` ubuntu@ubuntu:~/alert-data-set$ python3 analyze.py & network\_scans & service\_scans & wpscan & dirb & webshell & cracking & reverse\_shell & privilege\_escalation & service\_stop & dnsteal & false\_positive\_test & robustness & detection \\ \hline W-Aut-Ssh2 & & 8 & & & & & & & & & & 1.0 & 1.0 \\ \hline W-Err-Fbd2 & & 5 & 3 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline W-All-Mul3 & & 5 & 8 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline W-Acc-Sus & & & 6 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline ... ubuntu@ubuntu:~/alert-data-set$ head alerts_csv/russellmitchell_alerts.txt time,name,ip,host,short,time_label,event_label 1642723347,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723352,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723357,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723362,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723367,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723368,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,- 1642723432,Wazuh: ClamAV database update,192.168.231.56,davey_mail,W-Sys-Cav,false_positive,- 1642724061,Suricata: Alert - ET POLICY GNU/Linux APT User-Agent Outbound likely related to package management,10.143.0.103,internal_share,S-Flw-Apt,false_positive,- 1642724061,Wazuh: First time this IDS alert is generated.,10.143.0.103,internal_share,W-All-Ids,false_positive,- ``` ### 告警聚合将告警聚合为元告警是通过 [aecid-alert-aggregation](https://github.com/ait-aecid/aecid-alert-aggregation) 工具实现的。在本仓库中，我们提供了 `attacktimes.py` 和 `aggregate_config.py`，它们需要与 aecid-alert-aggregation 结合使用来处理 AIT-ADS。此外，为了仅聚合相关的告警，我们提供了一个 `filter.py` 脚本，它会根据前述的告警优先级排序移除噪音告警，并通过仅选择在攻击阶段发生的告警来移除误报。运行以下命令以使用 aecid-alert-aggregation 工具生成元告警。 ``` ubuntu@ubuntu:~/alert-data-set$ mkdir alerts_filtered ubuntu@ubuntu:~/alert-data-set$ python3 filter.py ubuntu@ubuntu:~/alert-data-set$ git clone https://github.com/ait-aecid/aecid-alert-aggregation.git ubuntu@ubuntu:~/alert-data-set$ cp attacktimes.py aecid-alert-aggregation/ ubuntu@ubuntu:~/alert-data-set$ cp aggregate_config.py aecid-alert-aggregation/ ubuntu@ubuntu:~/alert-data-set$ cd aecid-alert-aggregation/ ubuntu@ubuntu:~/alert-data-set/aecid-alert-aggregation$ python3 aggregate.py delta = 2: 18 groups in ['../alerts_filtered/fox_aminer.json', '../alerts_filtered/fox_wazuh.json'] delta = 2: 24 groups in ['../alerts_filtered/harrison_aminer.json', '../alerts_filtered/harrison_wazuh.json'] delta = 2: 18 groups in ['../alerts_filtered/russellmitchell_aminer.json', '../alerts_filtered/russellmitchell_wazuh.json'] delta = 2: 19 groups in ['../alerts_filtered/santos_aminer.json', '../alerts_filtered/santos_wazuh.json'] delta = 2: 17 groups in ['../alerts_filtered/shaw_aminer.json', '../alerts_filtered/shaw_wazuh.json'] delta = 2: 17 groups in ['../alerts_filtered/wardbeck_aminer.json', '../alerts_filtered/wardbeck_wazuh.json'] delta = 2: 15 groups in ['../alerts_filtered/wheeler_aminer.json', '../alerts_filtered/wheeler_wazuh.json'] delta = 2: 22 groups in ['../alerts_filtered/wilson_aminer.json', '../alerts_filtered/wilson_wazuh.json'] Now processing file 1/8... Processing groups with delta = 2 Processed group 1/18 from {'service_stop'} phase with 2 alerts. New meta-alert 0 generated. (sim=-1.0) Processed group 2/18 from {'service_scans'} phase with 39 alerts. New meta-alert 1 generated. (sim=0.0) Processed group 3/18 from {'service_scans'} phase with 22 alerts. New meta-alert 2 generated. (sim=0.0) Processed group 4/18 from {'service_scans'} phase with 154 alerts. New meta-alert 3 generated. (sim=0.0) Processed group 5/18 from {'service_scans'} phase with 24 alerts. New meta-alert 4 generated. (sim=0.0) Processed group 6/18 from {'wpscan'} phase with 28 alerts. New meta-alert 5 generated. (sim=0.21) Processed group 7/18 from {'wpscan'} phase with 5 alerts. New meta-alert 6 generated. (sim=0.0) Processed group 8/18 from {'wpscan'} phase with 9482 alerts. New meta-alert 7 generated. (sim=0.21) Processed group 9/18 from {'dirb'} phase with 410333 alerts. New meta-alert 8 generated. (sim=0.11) Processed group 10/18 from {'webshell'} phase with 1 alerts. New meta-alert 9 generated. (sim=0.0) Processed group 11/18 from {'webshell'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell'} Processed group 12/18 from {'cracking'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell', 'cracking'} Processed group 13/18 from {'cracking'} phase with 1 alerts. New meta-alert 10 generated. (sim=0.0) Processed group 14/18 from {'cracking'} phase with 1 alerts. New meta-alert 11 generated. (sim=0.0) Processed group 15/18 from {'reverse_shell'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell', 'cracking', 'reverse_shell'} Processed group 16/18 from {'privilege_escalation'} phase with 10 alerts. New meta-alert 12 generated. (sim=0.05) Processed group 17/18 from {'privilege_escalation'} phase with 4 alerts. New meta-alert 13 generated. (sim=0.0) Processed group 18/18 from {'privilege_escalation'} phase with 3 alerts. Add group to meta-alert 13 (sim=0.7) representing {'privilege_escalation'} ... Results: delta = 2: 42 meta-alerts generated Meta-alerts are stored in data/out/aggregate/meta_alerts.txt ``` 如果您使用了 AIT-ADS，请引用以下出版物： * Landauer, M., Skopik, F., Wurzenberger, M. (2024): [Introducing a New Alert Data Set for Multi-Step Attack Analysis.](https://dl.acm.org/doi/abs/10.1145/3675741.3675748) Proceedings of the 17th Cyber Security Experimentation and Test Workshop. \[[PDF](https://dl.acm.org/doi/pdf/10.1145/3675741.3675748)\] * Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): [Maintainable Log Datasets for Evaluation of Intrusion Detection Systems.](https://ieeexplore.ieee.org/abstract/document/9866880) IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. \[[PDF](https://arxiv.org/pdf/2203.08580.pdf)\]

标签：CISA项目, Metaprompt, Python, Suricata, Wazuh, 告警分析, 安全数据集, 无后门, 现代安全运营, 逆向工具, 防御框架