Recon53/ransomware-ml-static-features

GitHub: Recon53/ransomware-ml-static-features

使用静态 PE 特征与监督机器学习模型，实现无需执行文件即可检测勒索软件。

Stars: 0 | Forks: 0

Banner

# ransomware-ml-static-features [![Python](https://img.shields.io/badge/Python-3.x-blue.svg)](https://www.python.org/) [![scikit-learn](https://img.shields.io/badge/scikit--learn-ML-orange.svg)](https://scikit-learn.org/stable/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) [![Release](https://img.shields.io/badge/Release-v1.0-blueviolet.svg)](https://github.com/Recon53/ransomware-ml-static-features/releases/tag/v1.0) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18209938.svg)](https://doi.org/10.5281/zenodo.18209938) ![Stars](https://img.shields.io/github/stars/Recon53/ransomware-ml-static-features) ![Forks](https://img.shields.io/github/forks/Recon53/ransomware-ml-static-features) ![Issues](https://img.shields.io/github/issues/Recon53/ransomware-ml-static-features) # 概述本项目使用监督机器学习模型，通过从 Windows 可移植可执行文件（PE）中提取的静态特征来检测勒索软件。目标是评估静态指标在不执行文件的情况下区分勒索软件与良性软件的有效性。使用的静态特征包括： PE 头部值节元数据注册表活动计数 API/DLL 导入计数网络相关统计这些特性使得检测安全、快速且可扩展。 ## 作为机器学习课程 CAP 5610 的一部分开发 ## 结果预览 ### SVM（RBF）性能 SVM（RBF）模型整体性能最强（ROC-AUC：0.9738），使用静态 PE 特征在良性样本与勒索软件样本之间实现了出色的分离。

ROC 曲线（左）和混淆矩阵（右）对应 SVM（RBF）模型

## 快速开始 ``` pip install -r requirements.txt python src/train_models.py ``` ### 使用您自己的数据集（CSV）运行 ``` python src/train_models.py --data data/your_dataset.csv --label-col label --- # 数据集 **Ransomware Dataset 2024** - **21,752 samples** - 10,876 benign - 10,876 ransomware - Numeric PE‑based features only - Preprocessing removed: - Hashes - Filenames - Non‑numeric identifiers > “These values describe file behavior without needing to run the malware.” --- # 实现的模型 Four supervised learning models were trained and evaluated: - Logistic Regression - Random Forest - Support Vector Machine (RBF kernel) - K‑Nearest Neighbors (k = 5) Evaluation metrics: - Accuracy - Precision - Recall - F1‑score - Confusion Matrix - ROC‑AUC --- ## 结果 ### 模型性能比较 | Model | Accuracy | ROC-AUC | | ------------------- | --------- | ------- | | Logistic Regression | 0.74–0.83 | N/A | | Random Forest | 0.95 | N/A | | SVM (RBF) | 0.94 | 0.9738 | | K-Nearest Neighbors | ~0.93 | N/A | ### 关键发现 * **SVM (RBF)** achieved the best overall performance (ROC-AUC: 0.9738) * **Random Forest** achieved the highest accuracy and strong generalization * Static features demonstrated strong effectiveness for ransomware detection without requiring file execution ### ROC 曲线高亮 The ROC analysis showed that **SVM (RBF)** delivered the strongest overall class-separation performance, achieving a **ROC-AUC of 0.9738**. This indicates excellent discrimination between benign and ransomware samples using static PE-based features alone. ### 重要特征（随机森林） - `registry_total` - `registry_read` - `total_processes` - `network_dns` - `EntryPoint` > These results confirm that both ensemble and kernel-based models are highly effective for ransomware detection using static PE features. --- # 仓库结构 ``` ransomware-ml-static-features/ ├── src/ # 训练 + 评估脚本 ├── report/ # 最终报告（DOCX/PDF） ├── results/ # 混淆矩阵、ROC 曲线、特征重要性 ├── presentation/ # 最终幻灯片 ├── assets/ # 图像、图表 ├── data/ # 数据集占位符 ├── requirements.txt ├── LICENSE └── README.md ``` --- # 演示（CAP 5610） This repository includes the final course presentation and written report for the Machine Learning project **“Detection of Ransomware Using Static Features.”** ### 包含的文件 - **Slide Deck (PowerPoint):** `presentation/Ransomware_ML_Presentation_Miguel.pptx` - **Final Report (Word/PDF):** `report/Ransomware_Static_Features_Report.docx` ### 关键收获 **Random Forest consistently outperformed Logistic Regression**, supporting ensemble‑based approaches for ransomware detection using static features. --- # 安装 ### 1）克隆仓库 ```bash git clone https://github.com/Recon53/ransomware-ml-static-features.git cd ransomware-ml-static-features ``` ### 2) 安装依赖 ``` pip install -r requirements.txt ``` 依赖包括： - numpy - pandas - scikit-learn - matplotlib # 如何运行 ### 1) 演示模式（无需数据集） ``` python src/train_models.py ``` ### 2) 使用您的数据集（CSV）运行 ``` python src/train_models.py --data path/to/your_dataset.csv --label-col label ``` # 预期输出脚本会打印评估指标，例如： - 准确率 - 精确率 - 召回率 - F1 分数它还会将结果图像保存到 `results/` 文件夹中，包括： - `results/confusion_matrix_random_forest.png` - `results/model_accuracy_random_forest.png` - `results/feature_importance_random_forest.png` # 结果（截图） ### 混淆矩阵（逻辑回归）

### 混淆矩阵（随机森林）

### 模型准确率（随机森林）

### 重要特征（随机森林）

# 引用 / 感谢本项目是为学术课程作业和实验开发的，使用了 scikit‑learn 等公开机器学习库。 # 引用如果您使用本仓库，请引用 Zenodo 记录： ``` @software{guadalupe_ransomware_ml_static_features_2026, author = {Guadalupe, Miguel}, title = {ransomware-ml-static-features}, year = {2026}, publisher = {Zenodo}, doi = {10.5281/zenodo.18209938}, url = {https://doi.org/10.5281/zenodo.18209938} } ```

标签：Apex, API导入, CAP 5610, DLL导入, K近邻, PE头, PE文件, ROC-AUC, scikit-learn, SVM, Windows可执行文件, 云安全监控, 分类模型, 勒索软件检测, 可执行文件特征, 可扩展检测, 快速检测, 支持向量机, 无执行检测, 机器学习, 注册表活动计数, 混淆矩阵, 特征提取, 网络安全, 网络统计, 节元数据, 逆向工具, 逻辑回归, 随机森林, 隐私保护, 静态分析, 静态指标