anpa1200/Unpacker
GitHub: anpa1200/Unpacker
Stars: 9 | Forks: 4
# Unpacker
Packer detection and unpacking workflow for malware analysts: detect UPX, ASPack, Themida, VMProtect, and related packing patterns so static analysis can reach real code and strings.
## CTI Use
Use this before strings, imports, YARA review, or deeper reverse engineering when a sample appears packed. The output supports malware-family triage, detection engineering, and analyst notes, but it must be validated before claims are made.
## Defender Outputs
| Output | Use |
|---|---|
| Packer detection | Triage and analyst routing |
| Unpacked sample | Follow-on static analysis |
| Entropy/validation notes | Confidence review |
| Multi-layer workflow | Packed malware handling |
| Integration path | Feed String Analyzer, PE Import Analyzer, AIDebug |
**Modular malware packer detection and unpacking (UPX, ASPack, Themida, VMProtect). PE and ELF. One command: detect → unpack → validate.**
[](https://www.python.org/)
[](https://github.com/anpa1200/Unpacker/actions/workflows/tests.yml)
[](https://opensource.org/licenses/MIT)
[](https://github.com/anpa1200/Unpacker)
## What it does
Packed malware hides real code behind compression or encryption. Unpacker:
1. **Detects** the packer using section names, entropy, heuristics, and optional path/content hints (PE and ELF).
2. **Dispatches** to the matching unpacker (UPX native; ASPack/Themida/VMProtect via [Unipacker](https://github.com/unipacker/unipacker) for 32-bit, or [Qiling](https://github.com/qilingframework/qiling) for 64-bit VMProtect).
3. **Outputs** an unpacked file you can analyze or validate with tools like [String Analyzer](https://github.com/anpa1200/String-Analyzer) and [Basic File Information Gathering Script](https://github.com/anpa1200/Basic-File-Information-Gathering-Script).
One command, one pipeline; supports **multi-layer** unpacking (e.g. several VMProtect layers).
## Features
| Feature | Description |
|--------|-------------|
| **Multi-method detection** | Section names (UPX0/UPX1, .aspack, .vmp0, Themida, …), entropy, heuristics; PE + ELF. |
| **Pluggable unpackers** | UPX (native), ASPack/Themida/VMProtect (Unipacker for PE32; Qiling for PE32+ VMProtect), MPRESS/generic (stub). |
| **Path/content hints** | Samples in `.../vmprotect/` or `.../themida/` get the right unpacker even without section match. |
| **Multi-layer** | Re-detect and unpack up to N layers (configurable). |
| **Validation-friendly** | Output is static dumps; prove unpack with entropy/size/strings (see [Real-life example](#real-life-example-with-proof) below). |
## Repository
- **GitHub:** [https://github.com/anpa1200/Unpacker](https://github.com/anpa1200/Unpacker)
- **Clone:**
git clone https://github.com/anpa1200/Unpacker.git
cd Unpacker
## Install
**Requirements:** Python 3.10+, optional system UPX and Unipacker for full unpacker coverage.
cd Unpacker
pip install -e .
# Or: pip install -r requirements.txt
- **UPX (for UPX unpacking):** install system UPX, e.g. `apt install upx-ucl` or [upx.github.io](https://upx.github.io/).
- **ASPack / Themida / VMProtect (32-bit):** `pip install unipacker`. On Python 3.12+ you may need `pip install 'setuptools<70'` for `pkg_resources`.
- **VMProtect (64-bit):** `pip install qiling` and set **QILING_ROOTFS** to a directory containing a Windows x64 rootfs (e.g. `x8664_windows` with DLLs). See [Qiling rootfs](https://github.com/qilingframework/rootfs). Optional: `pip install -e ".[emulation]"` to pull in Qiling.
## Usage
# Unpack one sample (output under ./unpacked by default)
python scripts/run_unpacker.py /path/to/sample.exe -o ./unpacked
# With timeout (recommended for Themida/VMProtect)
python scripts/run_unpacker.py /path/to/sample.exe -o ./unpacked --timeout 180
# After pip install -e . you can use:
unpacker /path/to/sample.exe -o ./unpacked
**Options:** `--max-layers`, `--confidence`, `--timeout`.
**Example output:**
Detected: aspack (confidence=0.9, method=sections)
Layer 1: packer=aspack -> ok
Final output: /path/to/unpacked/aspack/NotePad_aspack.unpacked.aspack.exe
## Unpacking techniques (how this tool works)
The tool uses different **unpacking techniques** depending on the packer and binary format.
### Detection (before unpacking)
- **Section names** — Known section name patterns map to packers (e.g. `UPX0`/`UPX1` → UPX, `.aspack`/`.adata` → ASPack, `.vmp0`/`.vmp1` → VMProtect, `Themida` → Themida). Case-insensitive.
- **File content fallback** — If sections don’t match, the file is scanned for packer-related strings (e.g. `ASPack`, `VMProtect`, `.vmp0`) to still assign a packer.
- **Path hint** — If the sample path contains `vmprotect` or `themida`, that packer is preferred so samples in known folders get the right unpacker.
- **Entropy** — High section entropy suggests packed/compressed data; can yield a generic “packed” or “unknown” result.
- **Heuristics** — Entry point in last section, few imports, etc., often reported as “unknown.”
Detection supports **PE and ELF**; the best matching packer (by confidence) is chosen and the corresponding unpacker is run.
### Unpacking by technique
| Technique | Used for | How it works |
|-----------|----------|--------------|
| **Native decompression** | **UPX** (PE & ELF) | Calls system `upx -d`. The packer format is known; UPX decodes in place and writes the decompressed image. No emulation. |
| **Emulation + dump (Unipacker)** | **ASPack, Themida, VMProtect** (PE32 only) | Loads the PE in [Unicorn](https://www.unicorn-engine.org/) via [Unipacker](https://github.com/unipacker/unipacker). Emulates from the **entry point**; the engine runs until it detects an “unpacking done” condition (e.g. section hop, write+execute region, or packer-specific logic). Then it **dumps** the process memory (image base + size) to a new PE file. Unipacker knows ASPack; for Themida/VMProtect it uses a **generic “unknown”** strategy (emulate until heuristic trigger, then dump). The tool applies **patches** to Unipacker: safe page-by-page memory read (avoids crashes on unmapped regions) and robust dump (if import fix fails, zero IAT and still write the dump). |
| **Emulation + dump (Qiling)** | **VMProtect** (PE32+ / 64-bit only) | Used when the sample is 64-bit (Unipacker is 32-bit only). Loads the PE in [Qiling](https://github.com/qilingframework/qiling) with a Windows **rootfs** (emulated DLLs). Runs emulation with a **timeout**. After run (or timeout), reads the **loaded image** from emulated memory (base + `SizeOfImage`) and writes it to disk. No packer-specific logic—generic “run then dump” so heavy protectors may only partially unpack. |
| **Stub** | **MPRESS, generic** | Detection may identify the packer, but the unpacker module is not implemented; the pipeline returns an error or “generic unpacker stub.” |
#### 1. Native decompression (UPX)
#### 2. Emulation + dump — Unipacker (ASPack, Themida, VMProtect, PE32 only)
Many packers place a **stub** at the entry point that allocates memory, decompresses/decrypts the real code into it, then jumps to it (the **original entry point**, OEP). We **run** that stub in a CPU emulator until the real code is in memory, then **dump** that memory to a file.
#### 3. Emulation + dump — Qiling (VMProtect 64-bit only)
#### 4. Stub (MPRESS, generic)
No unpacker implemented; the pipeline returns a clear error.
### Summary
- **UPX:** Direct decompression; fast and deterministic.
- **ASPack / Themida / VMProtect (32-bit):** Emulation in Unicorn via Unipacker; dump on section hop / W+X or packer logic; patches for safe read and robust dump.
- **VMProtect (64-bit):** Emulation in Qiling with rootfs; timed run then dump; no IAT fix.
- **Multi-layer:** Re-detect and repeat up to `max_layers` (default 5).
### Is it safe to run real (packed) code in the emulator?
**Short answer:** The packed code runs **inside the emulator**, not natively on your CPU, so it is much safer than executing the sample on the host—but you should still run the tool in an isolated environment (e.g. a VM or a dedicated analysis machine).
**Why emulation is relatively safe:**
- **Unipacker (Unicorn):** The sample’s instructions are **interpreted** by the emulator. They do not run on the host processor. When the code “calls” Windows APIs (e.g. `VirtualAlloc`, `CreateFileA`), Unipacker’s **stubs** run instead of the real OS: they typically only update the emulator’s internal state (e.g. allocate emulated memory, return a fake handle). So the packed code cannot directly access your real filesystem, network, or hardware unless a stub explicitly forwards to the host—and in Unipacker’s design, stubs are meant to simulate behavior, not to perform real dangerous operations.
- **Qiling:** Same idea (emulated CPU + emulated APIs), but Qiling is a full system emulator and can be configured to map host paths into the emulated environment. If you map a host directory into the rootfs or the emulated “C:\”, writes could affect the host. **Best practice:** use a self-contained rootfs (e.g. only DLLs and a minimal layout) and do not map sensitive host directories. Run in a VM so that even a misconfiguration has limited impact.
**Recommendations:**
- Treat all samples as hostile. Run the unpacker in a **VM**, **sandbox**, or **dedicated analysis machine**, not on a production or personal system.
- Do not rely on emulation as a perfect sandbox: stub bugs or design choices could, in theory, expose the host. Isolation (VM + no sensitive mounts for Qiling) keeps risk low.
- **UPX** does not run the sample at all; it only decompresses. So UPX unpacking is safe from a “running code” perspective (apart from trusting the `upx` binary and the decompressed output).
## Real-life example with proof
Using an **ASPack-packed** sample (`NotePad_aspack.exe`), we show that unpacking is correct by comparing **entropy** and **file size** before and after.
### 1. Run the unpacker
python scripts/run_unpacker.py samples_by_packer/aspack/NotePad_aspack.exe -o unpacked/aspack
Result: `unpacked/aspack/NotePad_aspack.unpacked.aspack.exe`.
### 2. Proof: entropy and size
| Metric | Packed (`NotePad_aspack.exe`) | Unpacked (`NotePad_aspack.unpacked.aspack.exe`) |
|--------|-------------------------------|-------------------------------------------------|
| **File size** | 33,792 bytes (33 KB) | 180,224 bytes (176 KB) |
| **Entropy** | 6.25 | 2.38 |
Unpacked file is **larger** (compression removed) and has **lower entropy** (real code/data instead of compressed blob). That is the expected signature of successful unpacking.
### 3. How to reproduce the proof
**String Analyzer** (categorized strings + entropy):
# From String Analyzer project
string-analyzer /path/to/NotePad_aspack.exe -o packed_report.txt
string-analyzer /path/to/NotePad_aspack.unpacked.aspack.exe -o unpacked_report.txt
Compare reports: packed shows **File Entropy: 6.25**, unpacked **File Entropy: 2.38**.
**Basic File Information Gathering Script** (hashes, size, entropy):
# From Basic-File-Information-Gathering-Script project
python3 fileinfo.py /path/to/NotePad_aspack.exe
python3 fileinfo.py /path/to/NotePad_aspack.unpacked.aspack.exe
You get `file_size` and `entropy` for both; unpacked has higher size and lower entropy. With `--full` or `--json` you can compare sections, imports, and entropy blocks.
These tools are **read-only** (no execution); see the [Article](#article--validation-guide) for full validation workflow and links to their Medium guides.
## Project layout
Unpacker/
├── README.md # This file
├── PROJECT_SCENARIO.md # Research and design
├── pyproject.toml
├── requirements.txt
├── config/config.yaml # Detector and orchestrator settings
├── data/signatures/ # Optional signature DB (empty by default)
├── docs/
│ └── MEDIUM_ARTICLE_UNPACKER_GUIDE.md # Full guide (Medium-style)
├── scripts/
│ ├── run_unpacker.py # Main CLI
│ ├── step0_find_and_download_samples.py # Malware Bazaar download by packer
│ └── verify_unpacking.py # Check unpacked format/size/detection
├── src/unpacker/
│ ├── orchestrator.py # detect → unpack → optional rebuild
│ ├── detector/ # Signatures, sections, entropy, heuristics
│ ├── unpackers/ # UPX, ASPack, Themida, VMProtect, MPRESS, generic
│ └── pe_rebuilder/ # Optional IAT fix (stub)
└── tests/
Samples and unpacked output (`samples_by_packer/`, `unpacked/`) are **not** in the repo; use your own or the download script (see below).
## Getting samples
Use the provided script to fetch samples by packer tag from [Malware Bazaar](https://bazaar.abuse.ch/) (requires API key):
export MALWARE_BAZAAR_API_KEY='your-key'
python scripts/step0_find_and_download_samples.py
Samples are saved under `samples_by_packer//` and named like `{name}_{packer}.exe` or `{hash}_{packer}.bin`.
## Validation and verification
- **Manual:** Compare packed vs unpacked with [String Analyzer](https://medium.com/@1200km/a-practical-guide-to-string-analyzer-extract-and-analyze-strings-from-binaries-without-the-875dc74e4868) (entropy, string categories) and [Basic File Information Gathering Script](https://medium.com/@1200km/one-tool-to-rule-them-all-file-metadata-static-analysis-for-malware-analysts-and-soc-teams-c6dba1f5b7de) (size, entropy, PE metadata).
- **In-repo:** For UPX outputs, `python scripts/verify_unpacking.py` checks format, size growth, and that the unpacked file is no longer detected as packed.
## Article & validation guide
**📖 [Unpacker: A Practical Guide to Modular Malware Packer Detection and Unpacking](https://medium.com/@1200km/unpacker-a-practical-guide-to-modular-malware-packer-detection-and-unpacking-cf8ba924f25b)** — Published on Medium.
The same content is in the repo as **[docs/MEDIUM_ARTICLE_UNPACKER_GUIDE.md](docs/MEDIUM_ARTICLE_UNPACKER_GUIDE.md)** (Markdown). The article covers:
- Git repository and clone/install from GitHub
- Each unpacker (UPX, ASPack, MPRESS, Themida, VMProtect, generic) with real usage
- Validation with String Analyzer and fileinfo, with **real output** (entropy 6.25 → 2.38, 33 KB → 180 KB)
- End-to-end workflow and limitations
## Status
| Component | Status |
|-----------|--------|
| Orchestrator, detector (sections, entropy, heuristics), dispatcher | Done |
| UPX (native) | Done |
| ASPack, Themida, VMProtect (Unipacker / Qiling) | Done (PE32 via Unipacker; PE32+ VMProtect via Qiling when rootfs set) |
| MPRESS, generic unpacker | Stub (detection only / error) |
| PE rebuilder (IAT) | Stub |
| Signature DB | Empty (optional) |
## License
MIT License. See [LICENSE](LICENSE).
## Related repositories & articles
| Resource | Link |
|----------|------|
| **Unpacker (this repo)** | [GitHub](https://github.com/anpa1200/Unpacker) · [Medium: Unpacker Guide](https://medium.com/@1200km/unpacker-a-practical-guide-to-modular-malware-packer-detection-and-unpacking-cf8ba924f25b) |
| **Static-malware-Analysis-Orchestrator** | [GitHub](https://github.com/anpa1200/Static-malware-Analysis-Orchestrator) — runs triage, strings, PE imports, and Unpacker in one pipeline · [Medium: Full workflow](https://medium.com/@1200km/basic-static-malware-analysis-from-triage-to-unpacking-explained-and-automated-9442ef3b11b8) |
| **PE-Import-Analyzer** | [GitHub](https://github.com/anpa1200/PE-Import-Analyzer) · [Medium: PE Import Analyzer Guide](https://medium.com/@1200km/pe-import-analyzer-a-practical-guide-for-malware-analysts-and-reverse-engineers-29b8b98aeaf3) |
| **String-Analyzer** | [GitHub](https://github.com/anpa1200/String-Analyzer-) · [Medium: String Analyzer Guide](https://medium.com/@1200km/a-practical-guide-to-string-analyzer-extract-and-analyze-strings-from-binaries-without-the-875dc74e4868) |
| **Basic-File-Information-Gathering-Script** | [GitHub](https://github.com/anpa1200/Basic-File-Information-Gathering-Script) · [Medium: File Metadata & Static Analysis](https://medium.com/@1200km/one-tool-to-rule-them-all-file-metadata-static-analysis-for-malware-analysts-and-soc-teams-c6dba1f5b7de) |
| **PROJECT_SCENARIO.md** | Research, design, and links to PackHero, PISEP, Qiling, CAPE, PE-sieve, packing-box, etc. |
| **Author** | [Medium @1200km](https://medium.com/@1200km) |