dorcasjames/pdf-malware-analysis-toolkit
GitHub: dorcasjames/pdf-malware-analysis-toolkit
Stars: 0 | Forks: 0
# 🔍 PDF Malware Analysis Toolkit
**Author:** Adebamigbe Dorcas Adeyemi
**Facilitator:** Mr Gbaminiyi Elijah
**Date:** May 2026
**Environment:** Python 3.13 | Termux on Android
## 📌 Project Overview
PDF files are one of the most commonly exploited document formats used by attackers for:
- Phishing campaigns
- JavaScript-based exploits
- Embedded malware delivery
- Social engineering attacks
This toolkit performs **static analysis** of PDF files — meaning the PDF is never executed or opened. Instead, the tool inspects its internal structure to detect malicious content.
## 🛠️ Tools Used
| Tool | Purpose |
|------|---------|
| `pdfid.py` | Keyword-based PDF scanning (detects /JavaScript, /OpenAction, /Launch) |
| `pdf-parser.py` | Deep object-level parsing — traces full attack chain |
| `strings` | Raw binary string extraction — finds hidden URLs and IOCs |
| `pdf_analyze.py` | **Custom automated toolkit** — runs full analysis in one command |
## 📁 Repository Structure
pdf-malware-analysis-toolkit/
│
├── pdf_analyze.py # Main automated analysis script
├── pdfid.py # Didier Stevens PDF keyword scanner
├── pdf-parser.py # Didier Stevens PDF deep parser
│
├── samples/
│ ├── test.pdf # Sample 1 — Basic JS + OpenAction
│ ├── test2.pdf # Sample 2 — JS + Malicious URL
│ ├── test3.pdf # Sample 3 — Launch action (malware.exe)
│ ├── test4.pdf # Sample 4 — URL encoding obfuscation
│ └── real1.pdf # Sample 5 — Real Maxis phishing PDF
│
├── reports/
│ ├── report_test.pdf.txt
│ ├── report_test2.pdf.txt
│ ├── report_test3.pdf.txt
│ ├── report_test4.pdf.txt
│ └── report_real1.pdf.txt
│
└── README.md
## ⚙️ Installation
### Requirements
- Python 3.x
- Termux (Android) or any Linux environment
### Step 1 — Clone the repository
git clone https://github.com/dorcasjames/pdf-malware-analysis-toolkit.git
cd pdf-malware-analysis-toolkit
### Step 2 — Install dependencies
pip install pdfminer.six requests colorama
### Step 3 — Download analysis tools
curl -O https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/pdfid.py
curl -O https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/pdf-parser.py
## 🚀 Usage
### Run full automated analysis
python pdf_analyze.py
**Example:**
python pdf_analyze.py samples/real1.pdf
### Run keyword scan only
python pdfid.py samples/test.pdf
### Run deep object parser
python pdf-parser.py samples/test.pdf
### Search for JavaScript objects
python pdf-parser.py --search javascript samples/test.pdf
### Extract IOCs manually
strings samples/test.pdf | grep -Ei "http|ftp|eval|unescape|exe|powershell"
## 📊 Sample Output
==================================================
ANALYZING: samples/real1.pdf
Date: 2026-05-21 12:35:59
==================================================
[PDFID SCAN]
PDF Header: %PDF-1.6
obj 94
stream 39
/Page 3
[IOCs FOUND]
- /URI (https://www.maxis.com.my/en/payment/)
- /URI (https://care.maxis.com.my/)
- >ftp:Z:0d
[RISK SCORE]: 125/100 — HIGH
[REASONS]:
- JavaScript detected
- OpenAction detected
- Embedded file found
- Launch action found
- 3 IOCs found
[REPORT SAVED]: report_real1.pdf.txt
## 🔴 IOCs Detected Across All Samples
| Sample | IOC Type | Value | Severity |
|--------|----------|-------|----------|
| test.pdf | JavaScript | `app.alert("Hello");` | Medium |
| test2.pdf | Malicious URL | `http://malware-c2.ru/payload.exe` | High |
| test3.pdf | Executable | `malware.exe via /Launch` | High |
| test4.pdf | Obfuscated URL | `http://evil.com (encoded)` | High |
| real1.pdf | Phishing URL | `maxis.com.my/en/payment/` | High |
## 📈 Risk Scoring System
| Score | Level | Meaning |
|-------|-------|---------|
| 0–24 | 🟢 LOW | Minimal threat indicators |
| 25–49 | 🟡 MEDIUM | Some suspicious elements |
| 50+ | 🔴 HIGH | Multiple active threats detected |
**Scoring breakdown:**
- `/JavaScript` detected → +30 points
- `/OpenAction` detected → +25 points
- `/EmbeddedFile` detected → +20 points
- `/Launch` action found → +30 points
- IOCs found → +20 points
## 🛡️ Mitigation Recommendations
1. **Disable JavaScript** in PDF reader settings
2. **Never open PDFs** from unknown or untrusted senders
3. **Upload suspicious PDFs** to VirusTotal before opening
4. **Block /Launch and /EmbeddedFile** via email gateway policies
5. **Use sandbox analysis** (Any.run, Cuckoo) for behavioral analysis
6. **Run pdfid.py** on all received PDF attachments
## ⚠️ Disclaimer
This toolkit is developed strictly for **educational and ethical security research purposes**.
All malware samples used are analyzed in a **controlled environment**.
Real malware samples were obtained from **MalwareBazaar** for academic analysis only.
**Never use these tools on systems you do not own or have permission to test.**
## 📚 References
- [Didier Stevens PDF Tools](https://blog.didierstevens.com/programs/pdf-tools/)
- [MalwareBazaar](https://bazaar.abuse.ch)
- [VirusTotal](https://www.virustotal.com)
- [PDF Specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
*PDF Malware Analysis Toolkit — Adebamigbe Dorcas Adeyemi | May 2026*
# pdf-malware-analysis-toolkit