aparnna-c/malware-detection

GitHub: aparnna-c/malware-detection

Stars: 0 | Forks: 0

# 🛡️ MalDetect — Android Malware Detection ![Python](https://img.shields.io/badge/Python-3.10-blue?style=flat-square&logo=python&logoColor=white) ![Streamlit](https://img.shields.io/badge/Streamlit-App-FF4B4B?style=flat-square&logo=streamlit&logoColor=white) ![PyTorch](https://img.shields.io/badge/PyTorch-GNN-EE4C2C?style=flat-square&logo=pytorch&logoColor=white) ![scikit-learn](https://img.shields.io/badge/scikit--learn-Random%20Forest-F7931E?style=flat-square&logo=scikit-learn&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-green?style=flat-square) ## 📸 What It Looks Like The app runs in the browser with a dark-themed dashboard. You can browse the dataset, upload a `.edgelist` graph file, or upload a real `.apk` file for live analysis. ## 🧠 How It Works Android apps are converted into **call graphs** — a map of how functions inside the app call each other. Malware tends to have different graph patterns than safe apps. MalDetect uses two different approaches to classify these graphs: | Model | Approach | |---|---| | **GIN** (Graph Isomorphism Network) | Deep learning — learns structural patterns directly from the graph | | **Random Forest** | Traditional ML — uses hand-crafted graph features (nodes, edges, degree, density) | | **Ensemble** | Combines both by averaging their probability outputs for a stronger verdict | ### 5 Classes Detected `Addisplay` · `Adware` · `Benign` · `Downloader` · `Trojan` ## ✨ Features - **Browse Dataset** — pick any graph file from the test/train split and analyse it instantly - **Upload .edgelist** — drag and drop your own call graph file - **Upload APK** — upload a real Android APK; the app extracts the call graph using Androguard and runs the full ensemble analysis - **Side-by-side comparison** — GIN and Random Forest results shown together with probability charts - **Model Evaluation** — run accuracy, precision, recall, and F1-score on the full dataset ## 🗂️ Project Structure malware-detection/ ├── app.py # Main Streamlit app ├── main.py # Training entry point ├── rf_model.py # Random Forest training ├── apk_to_graph.py # APK → call graph (Androguard) ├── requirements.txt # Dependencies ├── models/ │ ├── best_gin_model.pth # Trained GIN model │ ├── random_forest.pkl # Trained RF model │ └── rf_scaler.pkl # Feature scaler ├── src/ │ ├── gin_model.py # GIN architecture │ ├── data_loader.py # Dataset loading │ ├── feature_extractor.py# Graph feature extraction │ └── train.py # GIN training loop ├── dataset/ # MalNet-Tiny graph dataset ├── split_info/ # Train/test file lists └── graphs/ # Processed graph files ## 🚀 Run Locally **1. Clone the repo** git clone https://github.com/aparnna-c/malware-detection.git cd malware-detection **2. Create a virtual environment** python -m venv venv source venv/bin/activate **3. Install dependencies** pip install -r requirements.txt **4. Run the app** streamlit run app.py Open `http://localhost:8501` in your browser. ## 🛠️ Tech Stack - **Python 3.10** - **Streamlit** — web app framework - **PyTorch + PyTorch Geometric** — GIN model - **scikit-learn** — Random Forest classifier - **Androguard** — APK static analysis and call graph extraction - **Plotly** — interactive probability charts - **MalNet-Tiny** — dataset of Android app call graphs ## 📊 Dataset This project uses the [MalNet-Tiny](https://mal-net.org/) dataset — a collection of Android application call graphs labelled across multiple malware families. ## 👩‍💻 Author **Aparnna C** MCA Student — Government Engineering College, Thrissur [LinkedIn](https://linkedin.com/in/aparnna-c) · [GitHub](https://github.com/aparnna-c) ## 📄 License This project is licensed under the MIT License.