lehasaS/Malware-Analysis-and-Detection
GitHub: lehasaS/Malware-Analysis-and-Detection
Stars: 1 | Forks: 1
# Malware-Analysis-and-Detection
## Introduction
Malware research is a very dynamic field given the ever-changing security landscape in the modern day. Security against malicious software such as viruses, worms, Trojan horses, etc. requires continual improvement or even novel methods to improve the detection of such software. Several proposed mechanisms have been implemented, but often significantly lack automation capability. This has motivated researchers over the years to look into implementations leveraging areas of machine learning such as Deep learning. In this project, two convolutional neural networks were implemented to study their detection accuracy given differences in their depths and hyperparameters.
### Preliminary remarks
This repository contains live Windows portable executable malware samples in the password-protected archive named samples.7z, the password is "infected". I will not be held liable for any damage that may occur from mishandling the samples. You have been warned! The are 4000 samples in the archive, to extract them you can use the following command:
7z x samples.7z -pinfected
### Thesis
The thesis writeup for this project can be found here , the writeup begins by introducing concepts explored throughout the project and builds up to the experiments done in this repository.
### Data processing
Scripts written in bash and python are provided in the scripts directory to convert the malware binaries into images and split the images into training, validation and testing datasets. [imauto.sh](https://github.com/lehasaS/Malware-Analysis-and-Detection/blob/master/scripts/imauto.sh) is provided for automating the conversion, and [split.sh](https://github.com/lehasaS/Malware-Analysis-and-Detection/blob/master/scripts/split.sh) is provided for automating splitting the dataset.
### Running the program
#### Using make
A [Makefile](https://github.com/lehasaS/Malware-Analysis-and-Detection/blob/master/Makefile) is provided should you wish to use it to run the program. Just so you know, calling makes creating a Python environment for you assuming you do not have one, and also installs the necessary packages specified in the requirements.txt file. training, testing, and clean commands are provided in the file and can be executed by:
make
#### Using command line
You are required to first install the necessary packages needed to run the program, this can be done with the following command:
pip install -r requirements.txt
You can then execute the following command to train or test the models:
python CNN_Malware_Train_Test.py
Where flag must either be:
- --train (For training a model, requires a train_output directory in the root directory to save all the files generated, including the state dict of the model)
- --test (For testing a model, requires a test_output directory in the root directory to save all the files generated)
Where model_name must either be:
- Model_One
- Model_Two
### Hyperparameter Tuning
Hyperparameter tuning is done on the platform weights and biases, if you wish to do this yourself. The notebook [CNN_Malware_Hyperparameter_Study.ipynb](https://github.com/lehasaS/Malware-Analysis-and-Detection/blob/master/CNN_Malware_Hyperparameter_Study.ipynb) is provided.