ait-aecid/alert-data-set
GitHub: ait-aecid/alert-data-set
用于生成和分析多入侵检测系统告警数据集的工具集,支持告警排序、过滤与聚合以评估检测器在多步骤攻击场景下的表现。
Stars: 31 | Forks: 1
# AIT Alert Data Set (AIT-ADS)
本仓库包含用于生成和分析 [AIT Alert Data Set (AIT-ADS)](https://zenodo.org/record/8263181) 的脚本。该数据集包含了应用于 [AIT Log Data Set V2.0 (AIT-LDSv2)](https://zenodo.org/record/5789064) 的三个入侵检测系统(AMiner、Wazuh 和 Suricata)产生的告警。在下文中,我们将解释如何生成告警数据集,以防您需要更改检测器的配置。当然,您不需要自己生成数据;如果您只想分析数据并将其用于评估,请从 [Zenodo](https://zenodo.org/record/8263181) 下载 AIT-ADS,或者直接前往[分析](#analysis)章节。如果您使用了 AIT-ADS 或本仓库中提供的任何资源,请引用以下出版物:
* Landauer, M., Skopik, F., Wurzenberger, M. (2024): [Introducing a New Alert Data Set for Multi-Step Attack Analysis.](https://dl.acm.org/doi/abs/10.1145/3675741.3675748) Proceedings of the 17th Cyber Security Experimentation and Test Workshop. \[[PDF](https://dl.acm.org/doi/pdf/10.1145/3675741.3675748)\]
## 生成
### Wazuh 和 Suricata
要生成告警,首先请确保您的系统上已经安装并设置好了 [Wazuh](https://wazuh.com/)。然后从 [AIT-LDSv2](https://zenodo.org/record/8263181) 下载其中一个场景,并将其解压到 `/home/ubuntu/aitldsv2/` 路径下(如果您使用了其他路径,请确保在后续使用的 python 脚本中修改相应的路径)。由于 Wazuh 需要实时摄取日志,我们准备了一个脚本,用于读取日志事件的时间戳并对其进行重放——这意味着生成告警所需的时间与 AIT-LDSv2 的时间跨度一样长,每个场景大约需要 4-6 天。创建一个用于存放生成日志的目录;我们在下文中使用 `/var/log/replay/`(如果您使用了其他路径,请确保在 python 脚本和 ossec.conf 文件中修改相应的路径)。最后,启动 Wazuh 客户端,克隆本仓库,在 `replay_logs.py` 脚本中指定场景名称,并运行该脚本来重放日志数据。以下代码片段总结了 fox 场景的这些步骤。
```
ubuntu@ubuntu:~$ mkdir aitldsv2
ubuntu@ubuntu:~$ cd aitldsv2/
ubuntu@ubuntu:~/aitldsv2$ wget https://zenodo.org/record/5789064/files/fox.zip
ubuntu@ubuntu:~/aitldsv2$ unzip fox.zip -d fox
ubuntu@ubuntu:~/aitldsv2$ cd ..
ubuntu@ubuntu:~$ systemctl restart wazuh-agent.service
ubuntu@ubuntu:~$ git clone https://github.com/ait-aecid/alert-data-set.git
ubuntu@ubuntu:~$ cd alert-data-set/
ubuntu@ubuntu:~/alert-data-set$ vim replay_logs.py
ubuntu@ubuntu:~/alert-data-set$ python3 replay_logs.py
```
脚本运行完成后,从 Wazuh Manager 收集告警。请注意,存储在 elastic 数据库中的告警也包含了来自 Suricata 的告警。原因是在运行 AIT-LDSv2 的模拟时,Suricata 已经部署了;很方便的是,Wazuh 会收集这些告警,并将其与自身规则触发的告警一起存储在数据库中。将告警从数据库复制到本地文件的一种方法是使用 [elasticdump](https://github.com/elasticsearch-dump/elasticsearch-dump):
```
ubuntu@ubuntu:~$ export NODE_TLS_REJECT_UNAUTHORIZED=0
ubuntu@ubuntu:~$ npx elasticdump --input=https:// --output=/home/ubuntu/fox_wazuh.json --type=data --limit=5000
```
### AMiner
由于 `replay_logs.py` 脚本将所有轮转的日志存储在单个文件中,并将它们重命名为统一的方案,我们利用这些日志通过 AMiner 进行异常检测。将日志从 `/var/log/replay/` 复制到 `/home/ubuntu/replay//`(如果您使用了其他路径,请确保在 `aminer_config.yml` 文件中修改相应的路径)。请确保您的系统上已经建立并运行了 [AMiner](https://github.com/ait-aecid/logdata-anomaly-miner) 实例。然后在 `aminer_config.yml` 文件中指定输入文件的路径以及包含异常的输出文件路径(默认为 `/tmp/aminer_out.log`),并运行 AMiner。请注意,AMiner 能够以取证方式处理日志,因此应该能在几分钟内完成。
```
ubuntu@ubuntu:~/alert-data-set$ cp -r /var/log/replay /home/ubuntu/replay/fox
ubuntu@ubuntu:~/alert-data-set$ vim aminer_config.yml
ubuntu@ubuntu:~/alert-data-set$ aminer -C -c aminer_config.yml
```
对所有其他场景执行相同的操作。
## 分析
### 下载 AIT-ADS
在本仓库中创建一个目录,并按如下方式下载 AIT-ADS。
```
ubuntu@ubuntu:~/alert-data-set$ mkdir alerts_raw
ubuntu@ubuntu:~/alert-data-set$ cd alerts_raw
ubuntu@ubuntu:~/alert-data-set/alerts_raw$ wget https://zenodo.org/record/8263181/files/ait_ads.zip
ubuntu@ubuntu:~/alert-data-set/alerts_raw$ unzip ait_ads.zip
ubuntu@ubuntu:~/alert-data-set/alerts_raw$ cd ..
```
### 告警优先级排序
运行以下脚本来分析数据。这将 (i) 应用优先级排序并输出一个包含所有检测器的(LaTeX 格式的)表格,并且 (ii) 在 `alerts_csv` 目录中创建告警出现次数的 csv 文件以供进一步分析。这些 csv 文件将包含基于时间间隔(`time_label`)和独立事件(`event_label`)的标签,但是,后者要求 AIT-LDSv2 和 AIT-NDS 位于 `analyze.py` 中指定的路径下,并且 `do_event_labeling` 设置为 True。
```
ubuntu@ubuntu:~/alert-data-set$ python3 analyze.py
& network\_scans & service\_scans & wpscan & dirb & webshell & cracking & reverse\_shell & privilege\_escalation & service\_stop & dnsteal & false\_positive\_test & robustness & detection \\ \hline
W-Aut-Ssh2 & & 8 & & & & & & & & & & 1.0 & 1.0 \\ \hline
W-Err-Fbd2 & & 5 & 3 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline
W-All-Mul3 & & 5 & 8 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline
W-Acc-Sus & & & 6 & 8 & & & & & & & & 1.0 & 1.0 \\ \hline
...
ubuntu@ubuntu:~/alert-data-set$ head alerts_csv/russellmitchell_alerts.txt
time,name,ip,host,short,time_label,event_label
1642723347,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723352,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723357,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723362,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723367,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723368,Wazuh: ClamAV database update,172.19.130.4,mail,W-Sys-Cav,false_positive,-
1642723432,Wazuh: ClamAV database update,192.168.231.56,davey_mail,W-Sys-Cav,false_positive,-
1642724061,Suricata: Alert - ET POLICY GNU/Linux APT User-Agent Outbound likely related to package management,10.143.0.103,internal_share,S-Flw-Apt,false_positive,-
1642724061,Wazuh: First time this IDS alert is generated.,10.143.0.103,internal_share,W-All-Ids,false_positive,-
```
### 告警聚合
将告警聚合为元告警是通过 [aecid-alert-aggregation](https://github.com/ait-aecid/aecid-alert-aggregation) 工具实现的。在本仓库中,我们提供了 `attacktimes.py` 和 `aggregate_config.py`,它们需要与 aecid-alert-aggregation 结合使用来处理 AIT-ADS。此外,为了仅聚合相关的告警,我们提供了一个 `filter.py` 脚本,它会根据前述的告警优先级排序移除噪音告警,并通过仅选择在攻击阶段发生的告警来移除误报。运行以下命令以使用 aecid-alert-aggregation 工具生成元告警。
```
ubuntu@ubuntu:~/alert-data-set$ mkdir alerts_filtered
ubuntu@ubuntu:~/alert-data-set$ python3 filter.py
ubuntu@ubuntu:~/alert-data-set$ git clone https://github.com/ait-aecid/aecid-alert-aggregation.git
ubuntu@ubuntu:~/alert-data-set$ cp attacktimes.py aecid-alert-aggregation/
ubuntu@ubuntu:~/alert-data-set$ cp aggregate_config.py aecid-alert-aggregation/
ubuntu@ubuntu:~/alert-data-set$ cd aecid-alert-aggregation/
ubuntu@ubuntu:~/alert-data-set/aecid-alert-aggregation$ python3 aggregate.py
delta = 2: 18 groups in ['../alerts_filtered/fox_aminer.json', '../alerts_filtered/fox_wazuh.json']
delta = 2: 24 groups in ['../alerts_filtered/harrison_aminer.json', '../alerts_filtered/harrison_wazuh.json']
delta = 2: 18 groups in ['../alerts_filtered/russellmitchell_aminer.json', '../alerts_filtered/russellmitchell_wazuh.json']
delta = 2: 19 groups in ['../alerts_filtered/santos_aminer.json', '../alerts_filtered/santos_wazuh.json']
delta = 2: 17 groups in ['../alerts_filtered/shaw_aminer.json', '../alerts_filtered/shaw_wazuh.json']
delta = 2: 17 groups in ['../alerts_filtered/wardbeck_aminer.json', '../alerts_filtered/wardbeck_wazuh.json']
delta = 2: 15 groups in ['../alerts_filtered/wheeler_aminer.json', '../alerts_filtered/wheeler_wazuh.json']
delta = 2: 22 groups in ['../alerts_filtered/wilson_aminer.json', '../alerts_filtered/wilson_wazuh.json']
Now processing file 1/8...
Processing groups with delta = 2
Processed group 1/18 from {'service_stop'} phase with 2 alerts. New meta-alert 0 generated. (sim=-1.0)
Processed group 2/18 from {'service_scans'} phase with 39 alerts. New meta-alert 1 generated. (sim=0.0)
Processed group 3/18 from {'service_scans'} phase with 22 alerts. New meta-alert 2 generated. (sim=0.0)
Processed group 4/18 from {'service_scans'} phase with 154 alerts. New meta-alert 3 generated. (sim=0.0)
Processed group 5/18 from {'service_scans'} phase with 24 alerts. New meta-alert 4 generated. (sim=0.0)
Processed group 6/18 from {'wpscan'} phase with 28 alerts. New meta-alert 5 generated. (sim=0.21)
Processed group 7/18 from {'wpscan'} phase with 5 alerts. New meta-alert 6 generated. (sim=0.0)
Processed group 8/18 from {'wpscan'} phase with 9482 alerts. New meta-alert 7 generated. (sim=0.21)
Processed group 9/18 from {'dirb'} phase with 410333 alerts. New meta-alert 8 generated. (sim=0.11)
Processed group 10/18 from {'webshell'} phase with 1 alerts. New meta-alert 9 generated. (sim=0.0)
Processed group 11/18 from {'webshell'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell'}
Processed group 12/18 from {'cracking'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell', 'cracking'}
Processed group 13/18 from {'cracking'} phase with 1 alerts. New meta-alert 10 generated. (sim=0.0)
Processed group 14/18 from {'cracking'} phase with 1 alerts. New meta-alert 11 generated. (sim=0.0)
Processed group 15/18 from {'reverse_shell'} phase with 1 alerts. Add group to meta-alert 9 (sim=0.71) representing {'webshell', 'cracking', 'reverse_shell'}
Processed group 16/18 from {'privilege_escalation'} phase with 10 alerts. New meta-alert 12 generated. (sim=0.05)
Processed group 17/18 from {'privilege_escalation'} phase with 4 alerts. New meta-alert 13 generated. (sim=0.0)
Processed group 18/18 from {'privilege_escalation'} phase with 3 alerts. Add group to meta-alert 13 (sim=0.7) representing {'privilege_escalation'}
...
Results:
delta = 2: 42 meta-alerts generated
Meta-alerts are stored in data/out/aggregate/meta_alerts.txt
```
如果您使用了 AIT-ADS,请引用以下出版物:
* Landauer, M., Skopik, F., Wurzenberger, M. (2024): [Introducing a New Alert Data Set for Multi-Step Attack Analysis.](https://dl.acm.org/doi/abs/10.1145/3675741.3675748) Proceedings of the 17th Cyber Security Experimentation and Test Workshop. \[[PDF](https://dl.acm.org/doi/pdf/10.1145/3675741.3675748)\]
* Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): [Maintainable Log Datasets for Evaluation of Intrusion Detection Systems.](https://ieeexplore.ieee.org/abstract/document/9866880) IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. \[[PDF](https://arxiv.org/pdf/2203.08580.pdf)\]
标签:CISA项目, Metaprompt, Python, Suricata, Wazuh, 告警分析, 安全数据集, 无后门, 现代安全运营, 逆向工具, 防御框架