PedroSct/malware-detection-honeypot
GitHub: PedroSct/malware-detection-honeypot
Stars: 0 | Forks: 0
# Honeypot-IDS: Malware Detection with Machine Learning
## Overview
This project integrates a **Honeypot** with a **Machine Learning classifier** to automatically detect and quarantine malware in a controlled environment. Files entering the honeypot trap are analyzed in real time by a trained Random Forest model, which routes them to either a safe folder or quarantine — no human intervention required.
The research was approved by the academic evaluation board at FATEC Ourinhos and is part of the undergraduate program in Information Security Technology.
## Architecture
The system runs across **3 virtual machines** in an isolated network:
┌─────────────────┐ (1) sends files ┌─────────────────────┐
│ VM Windows │ ───────────────────────► │ VM Debian │
│ Client/Attacker│ │ Honeypot Observer │
└─────────────────┘ │ (watcher.py) │
└──────────┬──────────┘
│ (2) forwards for analysis
▼
┌─────────────────────┐
│ VM Ubuntu │
│ Analyzer API │
│ (Random Forest) │
└──────────┬──────────┘
│ (3) returns verdict
▼
┌──────────────────────────────┐
│ VM Debian routes file to: │
│ ✅ /safe (benign) │
│ 🔴 /quarantine (malware) │
└──────────────────────────────┘
│ (5)
▼
📊 Performance Report (FP/FN/TP/TN)
**Flow:**
1. Windows VM sends files continuously to the honeypot trap folder
2. `watcher.py` detects new files and forwards them to the analyzer API
3. Ubuntu VM classifies each file using the trained Random Forest model
4. Debian VM routes files to `/safe` or `/quarantine` based on the verdict
5. A performance report is generated with full TP/TN/FP/FN metrics
## Dataset
After benchmarking three datasets, **CIC-MalMem-2022** was selected as the primary dataset due to its superior performance across all classifiers.
| Dataset | Best Algorithm | Accuracy |
|---|---|---|
| **CIC-MalMem-2022** | Random Forest | **99.99% (binary)** |
| DikeDataset | Random Forest | 96.00% |
| Malware Datasets (Adep) | Random Forest | ~95–97% |
The CIC-MalMem-2022 dataset covers memory analysis of malware families including **Ransomware, Spyware, and Trojans** in Windows environments.
## Algorithm Comparison
All classifiers were tested on the CIC-MalMem-2022 dataset for binary classification:
| Algorithm | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| **Random Forest** | 99.99% | 100% | 100% | 100% |
| Decision Tree | 99.99% | 100% | 100% | 100% |
| SVM | 99.95% | ~99.9% | ~99.9% | ~99.9% |
| KNN | 99.95% | ~99.9% | ~99.9% | ~99.9% |
| Random Forest (multi-class) | ~88.70% | ~88% | ~88% | ~88% |
Random Forest was selected for the practical implementation due to its robustness and interpretability in multi-class scenarios.
## Final Results (Practical Simulation — 1 Week)
A total of **1,542 files** were processed during the simulation:
| Metric | Count | Description |
|---|---|---|
| Total files analyzed | 1,542 | Full sample volume |
| Real benign files | 1,215 | Legitimate files sent to the trap |
| Real malware files | 327 | Malware samples from multiple families |
| True Positives (TP) | 318 | Malware correctly quarantined |
| True Negatives (TN) | 1,198 | Benign files correctly passed |
| False Positives (FP) | 17 | Benign files incorrectly quarantined |
| False Negatives (FN) | 9 | Malware incorrectly passed as safe |
### Performance Metrics
| Metric | Result |
|---|---|
| **Overall Accuracy** | **98.31%** |
| Precision | 94.93% |
| **Recall (Detection Rate)** | **97.25%** |
| F1-Score | 96.08% |
## Failure Analysis
**False Positives (17 files):** Mostly software installers using packers similar to those found in malware, and system administration tools whose behavior pattern overlaps with spyware signatures. Expected behavior for a static analysis model.
**False Negatives (9 files):** All 9 undetected threats were either **polymorphic malware** or **zero-day variants** specifically designed to evade static analysis by altering their structure. This highlights the known limitation of static-only approaches and reinforces the need for multi-layer defense strategies (e.g., dynamic/memory analysis as a second layer).
## Tech Stack
| Layer | Technology |
|---|---|
| Virtualization | VirtualBox — 3 VMs (Debian, Ubuntu, Windows) |
| Honeypot Observer | Python (`watcher.py`) |
| Analyzer API | Python + Flask (`analyzer_api.py`) |
| ML Model | Scikit-learn — Random Forest |
| Dataset | CIC-MalMem-2022 |
| Traffic Monitoring | Wireshark |
| ML Benchmarking | RapidMiner, Jupyter Notebook |
## Source Code
The paper (in Portuguese) is available in [`TG_Honeypot.pdf`](./TG_Honeypot.pdf).
## Authors
| Name | Contact |
|---|---|
| Pedro Augusto Scoton Alves | [linkedin.com/in/pedroscoton](https://linkedin.com/in/pedroscoton) |
| Pedro Lucas de Souza | pedro.souza92@fatec.sp.gov.br |
| Gian Luca Monticeli | gian.monticeli@fatec.sp.gov.br |
**Advisor:** Prof. Dr. Thiago José Lucas — thiago@fatecourinhos.edu.br
**Institution:** FATEC Ourinhos — Faculdade de Tecnologia de Ourinhos
**Program:** Tecnólogo em Segurança da Informação
**Year:** 2025
## Related Work
This research builds on 10 peer-reviewed papers (2022–2025) from IEEE Xplore, Wiley, and ACM, comparing approaches including ML-IDHIF, reinforcement learning honeypots (DQN), generative honeypots (GPT-3.5), and IoT-focused detection systems.
The Random Forest algorithm appeared in the majority of surveyed works as the most consistent performer across different datasets and attack scenarios.