dorcasjames/pdf-malware-analysis-toolkit

GitHub: dorcasjames/pdf-malware-analysis-toolkit

Stars: 0 | Forks: 0

# 🔍 PDF Malware Analysis Toolkit **Author:** Adebamigbe Dorcas Adeyemi **Facilitator:** Mr Gbaminiyi Elijah **Date:** May 2026 **Environment:** Python 3.13 | Termux on Android ## 📌 Project Overview PDF files are one of the most commonly exploited document formats used by attackers for: - Phishing campaigns - JavaScript-based exploits - Embedded malware delivery - Social engineering attacks This toolkit performs **static analysis** of PDF files — meaning the PDF is never executed or opened. Instead, the tool inspects its internal structure to detect malicious content. ## 🛠️ Tools Used | Tool | Purpose | |------|---------| | `pdfid.py` | Keyword-based PDF scanning (detects /JavaScript, /OpenAction, /Launch) | | `pdf-parser.py` | Deep object-level parsing — traces full attack chain | | `strings` | Raw binary string extraction — finds hidden URLs and IOCs | | `pdf_analyze.py` | **Custom automated toolkit** — runs full analysis in one command | ## 📁 Repository Structure pdf-malware-analysis-toolkit/ │ ├── pdf_analyze.py # Main automated analysis script ├── pdfid.py # Didier Stevens PDF keyword scanner ├── pdf-parser.py # Didier Stevens PDF deep parser │ ├── samples/ │ ├── test.pdf # Sample 1 — Basic JS + OpenAction │ ├── test2.pdf # Sample 2 — JS + Malicious URL │ ├── test3.pdf # Sample 3 — Launch action (malware.exe) │ ├── test4.pdf # Sample 4 — URL encoding obfuscation │ └── real1.pdf # Sample 5 — Real Maxis phishing PDF │ ├── reports/ │ ├── report_test.pdf.txt │ ├── report_test2.pdf.txt │ ├── report_test3.pdf.txt │ ├── report_test4.pdf.txt │ └── report_real1.pdf.txt │ └── README.md ## ⚙️ Installation ### Requirements - Python 3.x - Termux (Android) or any Linux environment ### Step 1 — Clone the repository git clone https://github.com/dorcasjames/pdf-malware-analysis-toolkit.git cd pdf-malware-analysis-toolkit ### Step 2 — Install dependencies pip install pdfminer.six requests colorama ### Step 3 — Download analysis tools curl -O https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/pdfid.py curl -O https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/pdf-parser.py ## 🚀 Usage ### Run full automated analysis python pdf_analyze.py **Example:** python pdf_analyze.py samples/real1.pdf ### Run keyword scan only python pdfid.py samples/test.pdf ### Run deep object parser python pdf-parser.py samples/test.pdf ### Search for JavaScript objects python pdf-parser.py --search javascript samples/test.pdf ### Extract IOCs manually strings samples/test.pdf | grep -Ei "http|ftp|eval|unescape|exe|powershell" ## 📊 Sample Output ================================================== ANALYZING: samples/real1.pdf Date: 2026-05-21 12:35:59 ================================================== [PDFID SCAN] PDF Header: %PDF-1.6 obj 94 stream 39 /Page 3 [IOCs FOUND] - /URI (https://www.maxis.com.my/en/payment/) - /URI (https://care.maxis.com.my/) - >ftp:Z:0d [RISK SCORE]: 125/100 — HIGH [REASONS]: - JavaScript detected - OpenAction detected - Embedded file found - Launch action found - 3 IOCs found [REPORT SAVED]: report_real1.pdf.txt ## 🔴 IOCs Detected Across All Samples | Sample | IOC Type | Value | Severity | |--------|----------|-------|----------| | test.pdf | JavaScript | `app.alert("Hello");` | Medium | | test2.pdf | Malicious URL | `http://malware-c2.ru/payload.exe` | High | | test3.pdf | Executable | `malware.exe via /Launch` | High | | test4.pdf | Obfuscated URL | `http://evil.com (encoded)` | High | | real1.pdf | Phishing URL | `maxis.com.my/en/payment/` | High | ## 📈 Risk Scoring System | Score | Level | Meaning | |-------|-------|---------| | 0–24 | 🟢 LOW | Minimal threat indicators | | 25–49 | 🟡 MEDIUM | Some suspicious elements | | 50+ | 🔴 HIGH | Multiple active threats detected | **Scoring breakdown:** - `/JavaScript` detected → +30 points - `/OpenAction` detected → +25 points - `/EmbeddedFile` detected → +20 points - `/Launch` action found → +30 points - IOCs found → +20 points ## 🛡️ Mitigation Recommendations 1. **Disable JavaScript** in PDF reader settings 2. **Never open PDFs** from unknown or untrusted senders 3. **Upload suspicious PDFs** to VirusTotal before opening 4. **Block /Launch and /EmbeddedFile** via email gateway policies 5. **Use sandbox analysis** (Any.run, Cuckoo) for behavioral analysis 6. **Run pdfid.py** on all received PDF attachments ## ⚠️ Disclaimer This toolkit is developed strictly for **educational and ethical security research purposes**. All malware samples used are analyzed in a **controlled environment**. Real malware samples were obtained from **MalwareBazaar** for academic analysis only. **Never use these tools on systems you do not own or have permission to test.** ## 📚 References - [Didier Stevens PDF Tools](https://blog.didierstevens.com/programs/pdf-tools/) - [MalwareBazaar](https://bazaar.abuse.ch) - [VirusTotal](https://www.virustotal.com) - [PDF Specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf) *PDF Malware Analysis Toolkit — Adebamigbe Dorcas Adeyemi | May 2026* # pdf-malware-analysis-toolkit