aparnna-c/malware-detection
GitHub: aparnna-c/malware-detection
Stars: 0 | Forks: 0
# 🛡️ MalDetect — Android Malware Detection





## 📸 What It Looks Like
The app runs in the browser with a dark-themed dashboard. You can browse the dataset, upload a `.edgelist` graph file, or upload a real `.apk` file for live analysis.
## 🧠 How It Works
Android apps are converted into **call graphs** — a map of how functions inside the app call each other. Malware tends to have different graph patterns than safe apps.
MalDetect uses two different approaches to classify these graphs:
| Model | Approach |
|---|---|
| **GIN** (Graph Isomorphism Network) | Deep learning — learns structural patterns directly from the graph |
| **Random Forest** | Traditional ML — uses hand-crafted graph features (nodes, edges, degree, density) |
| **Ensemble** | Combines both by averaging their probability outputs for a stronger verdict |
### 5 Classes Detected
`Addisplay` · `Adware` · `Benign` · `Downloader` · `Trojan`
## ✨ Features
- **Browse Dataset** — pick any graph file from the test/train split and analyse it instantly
- **Upload .edgelist** — drag and drop your own call graph file
- **Upload APK** — upload a real Android APK; the app extracts the call graph using Androguard and runs the full ensemble analysis
- **Side-by-side comparison** — GIN and Random Forest results shown together with probability charts
- **Model Evaluation** — run accuracy, precision, recall, and F1-score on the full dataset
## 🗂️ Project Structure
malware-detection/
├── app.py # Main Streamlit app
├── main.py # Training entry point
├── rf_model.py # Random Forest training
├── apk_to_graph.py # APK → call graph (Androguard)
├── requirements.txt # Dependencies
├── models/
│ ├── best_gin_model.pth # Trained GIN model
│ ├── random_forest.pkl # Trained RF model
│ └── rf_scaler.pkl # Feature scaler
├── src/
│ ├── gin_model.py # GIN architecture
│ ├── data_loader.py # Dataset loading
│ ├── feature_extractor.py# Graph feature extraction
│ └── train.py # GIN training loop
├── dataset/ # MalNet-Tiny graph dataset
├── split_info/ # Train/test file lists
└── graphs/ # Processed graph files
## 🚀 Run Locally
**1. Clone the repo**
git clone https://github.com/aparnna-c/malware-detection.git
cd malware-detection
**2. Create a virtual environment**
python -m venv venv
source venv/bin/activate
**3. Install dependencies**
pip install -r requirements.txt
**4. Run the app**
streamlit run app.py
Open `http://localhost:8501` in your browser.
## 🛠️ Tech Stack
- **Python 3.10**
- **Streamlit** — web app framework
- **PyTorch + PyTorch Geometric** — GIN model
- **scikit-learn** — Random Forest classifier
- **Androguard** — APK static analysis and call graph extraction
- **Plotly** — interactive probability charts
- **MalNet-Tiny** — dataset of Android app call graphs
## 📊 Dataset
This project uses the [MalNet-Tiny](https://mal-net.org/) dataset — a collection of Android application call graphs labelled across multiple malware families.
## 👩💻 Author
**Aparnna C**
MCA Student — Government Engineering College, Thrissur
[LinkedIn](https://linkedin.com/in/aparnna-c) · [GitHub](https://github.com/aparnna-c)
## 📄 License
This project is licensed under the MIT License.