ThisPlatypus/The-intrinsic-convenience-of-FD

GitHub: ThisPlatypus/The-intrinsic-convenience-of-FD

Stars: 1 | Forks: 0

# The intrinsic convenience of federated learning in malware IoT detection. ## Overview This repository explores machine learning approaches for malware detection across heterogeneous devices. The motivation is to improve detection accuracy under privacy and communication constraints typical of distributed and edge environments. ## Research Context Relation to IoT, cybersecurity, edge AI, and communication constraints: - IoT: Many resource-constrained endpoints generating telemetry with varying quality and distribution. - Cybersecurity: Focus on detecting malicious behaviors and artifacts while preserving data privacy. - Edge AI: Training and inference close to data sources to reduce latency and bandwidth usage. - Communication constraints: Methods that minimize model/data transfer, including distillation and federated-style aggregation. ## Methodology Models, data, and evaluation approach: - Baselines: Classical ML baselines (e.g., `Baseline/SVM.ipynb`, `Baseline/NAive_B.ipynb`). - Representation learning: Autoencoder-based anomaly modeling (`Autoencoder/AE_main.py`). - Knowledge distillation / aggregation: Distillation-based training and client/server orchestration (`Distillation/main.py`, `Distillation/client.py`, `Distillation/server.py`). - Imbalanced sampling: Utilities for handling class imbalance (`*/torchsampler`). - Evaluation: AUROC and recall; plots and metrics stored under `PLOT/` and `results/`. ## System Architecture (Optional but recommended for systems work): - Clients: `Autoencoder/client.py`, `Distillation/client.py` to train models on local splits. - Server: `Autoencoder/server.py`, `Distillation/server.py` to coordinate aggregation/distillation. - Utilities: Shared helpers in `Autoencoder/utilities.py` and `Distillation/utilities.py`. - Data prep: Scripts under `Util/` (e.g., `Clean_Data_script.R`). ## Results Summary of experiments or findings: - Aggregated metrics and checkpoints are saved in `results/` (e.g., files containing global loss and recall). - Visualizations and correlation analyses live in `PLOT/`. - Text logs and runs are available in `runs_prove_txt/`. ## Limitations Explicit constraints and assumptions: - Data availability and exact preprocessing steps may vary; ensure alignment with scripts. - Reproducibility depends on random seeds, data splits, and environment versions. - Communication and system constraints may be simulated rather than measured on real networks. - Hardware resource constraints (e.g., IoT-grade devices) are approximated. ## References - Camerota, Chiara; Pecorella, Tommaso; Bagdanov, Andrew D. (2024). "The intrinsic convenience of federated learning in malware IoT detection." 2024 20th International Conference on Network and Service Management (CNSM), pp. 1–7. IEEE. [IEEE abstract](https://ieeexplore.ieee.org/abstract/document/10814605/) • [PDF](https://flore.unifi.it/bitstream/2158/1406137/1/The_intrinsic_convenience_of_federated_learning_in_malware_IoT_detection.pdf)