Recon53/ransomware-ml-static-features
GitHub: Recon53/ransomware-ml-static-features
使用静态 PE 特征与监督机器学习模型,实现无需执行文件即可检测勒索软件。
Stars: 0 | Forks: 0
ROC 曲线(左)和混淆矩阵(右)对应 SVM(RBF)模型
## 快速开始 ``` pip install -r requirements.txt python src/train_models.py ``` ### 使用您自己的数据集(CSV)运行 ``` python src/train_models.py --data data/your_dataset.csv --label-col label --- # 数据集 **Ransomware Dataset 2024** - **21,752 samples** - 10,876 benign - 10,876 ransomware - Numeric PE‑based features only - Preprocessing removed: - Hashes - Filenames - Non‑numeric identifiers > “These values describe file behavior without needing to run the malware.” --- # 实现的模型 Four supervised learning models were trained and evaluated: - Logistic Regression - Random Forest - Support Vector Machine (RBF kernel) - K‑Nearest Neighbors (k = 5) Evaluation metrics: - Accuracy - Precision - Recall - F1‑score - Confusion Matrix - ROC‑AUC --- ## 结果 ### 模型性能比较 | Model | Accuracy | ROC-AUC | | ------------------- | --------- | ------- | | Logistic Regression | 0.74–0.83 | N/A | | Random Forest | 0.95 | N/A | | SVM (RBF) | 0.94 | 0.9738 | | K-Nearest Neighbors | ~0.93 | N/A | ### 关键发现 * **SVM (RBF)** achieved the best overall performance (ROC-AUC: 0.9738) * **Random Forest** achieved the highest accuracy and strong generalization * Static features demonstrated strong effectiveness for ransomware detection without requiring file execution ### ROC 曲线高亮 The ROC analysis showed that **SVM (RBF)** delivered the strongest overall class-separation performance, achieving a **ROC-AUC of 0.9738**. This indicates excellent discrimination between benign and ransomware samples using static PE-based features alone. ### 重要特征(随机森林) - `registry_total` - `registry_read` - `total_processes` - `network_dns` - `EntryPoint` > These results confirm that both ensemble and kernel-based models are highly effective for ransomware detection using static PE features. --- # 仓库结构 ``` ransomware-ml-static-features/ ├── src/ # 训练 + 评估脚本 ├── report/ # 最终报告(DOCX/PDF) ├── results/ # 混淆矩阵、ROC 曲线、特征重要性 ├── presentation/ # 最终幻灯片 ├── assets/ # 图像、图表 ├── data/ # 数据集占位符 ├── requirements.txt ├── LICENSE └── README.md ``` --- # 演示(CAP 5610) This repository includes the final course presentation and written report for the Machine Learning project **“Detection of Ransomware Using Static Features.”** ### 包含的文件 - **Slide Deck (PowerPoint):** `presentation/Ransomware_ML_Presentation_Miguel.pptx` - **Final Report (Word/PDF):** `report/Ransomware_Static_Features_Report.docx` ### 关键收获 **Random Forest consistently outperformed Logistic Regression**, supporting ensemble‑based approaches for ransomware detection using static features. --- # 安装 ### 1)克隆仓库 ```bash git clone https://github.com/Recon53/ransomware-ml-static-features.git cd ransomware-ml-static-features ``` ### 2) 安装依赖 ``` pip install -r requirements.txt ``` 依赖包括: - numpy - pandas - scikit-learn - matplotlib # 如何运行 ### 1) 演示模式(无需数据集) ``` python src/train_models.py ``` ### 2) 使用您的数据集(CSV)运行 ``` python src/train_models.py --data path/to/your_dataset.csv --label-col label ``` # 预期输出 脚本会打印评估指标,例如: - 准确率 - 精确率 - 召回率 - F1 分数 它还会将结果图像保存到 `results/` 文件夹中,包括: - `results/confusion_matrix_random_forest.png` - `results/model_accuracy_random_forest.png` - `results/feature_importance_random_forest.png` # 结果(截图) ### 混淆矩阵(逻辑回归)
### 混淆矩阵(随机森林)
### 模型准确率(随机森林)
### 重要特征(随机森林)
# 引用 / 感谢
本项目是为学术课程作业和实验开发的,使用了 scikit‑learn 等公开机器学习库。
# 引用
如果您使用本仓库,请引用 Zenodo 记录:
```
@software{guadalupe_ransomware_ml_static_features_2026,
author = {Guadalupe, Miguel},
title = {ransomware-ml-static-features},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18209938},
url = {https://doi.org/10.5281/zenodo.18209938}
}
```
标签:Apex, API导入, CAP 5610, DLL导入, K近邻, PE头, PE文件, ROC-AUC, scikit-learn, SVM, Windows可执行文件, 云安全监控, 分类模型, 勒索软件检测, 可执行文件特征, 可扩展检测, 快速检测, 支持向量机, 无执行检测, 机器学习, 注册表活动计数, 混淆矩阵, 特征提取, 网络安全, 网络统计, 节元数据, 逆向工具, 逻辑回归, 随机森林, 隐私保护, 静态分析, 静态指标