RDP Cache Parser
Every pixel tells a story.
A Windows DFIR tool for parsing and reconstructing screen images from
Remote Desktop Protocol (RDP) bitmap cache files
(Cache####.bin, bcache##.bmc).
## Features
| Category | Capability |
|---|---|
| **Parsing** | `.bin` (Win 7+, 32-bit BGRA) and `.bmc` (Vista/2008, 8/16/24-bit) |
| **Reconstruction** | Manual edge-match stitch canvas (authoritative) + automatic edge-matched triage hypothesis |
| **GUI** | Dark-theme tile viewer, drag-and-drop, manual stitch canvas |
| **OCR** | Optional text extraction from tiles + forensic keyword detection |
| **Reporting** | HTML + JSON forensic report with SHA-256 chain-of-custody |
| **Export** | Individual tiles or full collage (PNG/BMP) |
| **Session** | Save / load manual stitch canvas state |
### Reconstruction — what it can and cannot do
**RDP cache files store no screen coordinates.** Each 64×64 tile is saved with
only an 8-byte content hash (for deduplication); the screen x/y positions exist
only in transient `MemBlt` drawing orders during the live session and are never
written to disk. Fully-automatic, accurate screen reconstruction is therefore
**not achievable** — it is, by consensus in the DFIR community, an
analyst-assisted task.
This tool reflects that reality:
- The **manual stitch canvas** is the authoritative reconstruction path. It
uses pixel edge-matching to rank candidate tiles for each cell, and every
placement is the analyst's documented choice.
- **Automatic reconstruction** is provided for *triage only*. It is a best-
effort hypothesis — every output image carries an "INFERRED — NOT
TIMESTAMPED" caveat banner and a confidence grade, and must never be
presented as a literal screenshot.
The automatic path works as follows:
- **Strip detection** — consecutive tiles in `.bin` files often correspond to
adjacent screen positions; tile order within a strip is verified by
edge-matching
- **Temporal grouping** — separates tiles from different screen states using
file-index proximity (gap > 150 → new snapshot)
- **Edge-matched block placement** — blocks are positioned by matching their
border pixels to one another (A's bottom edge ↔ B's top edge); blocks with no
confident edge match fall back to file-order estimation
- **Auto-resolution** — detects actual screen width (1280 / 1366 / 1440 / 1920 /
2560 / 3840 px) from block widths; no manual configuration needed
- **OCR-driven slides** — when OCR text is available, screen states are ordered
into a numbered slide sequence with a `slides_manifest.json` index
### OCR Scanning (optional)
Runs optical character recognition on every non-blank tile and reports:
- All detected text with source file, tile index, and byte offset
- IOC hits — tiles matching a built-in forensic keyword list
(mimikatz, certutil, psexec, powershell, password, ntds.dit, and 40+ more)
- Live keyword filter in the GUI for rapid triage
OCR requires the **optional** `easyocr` library (see Installation).
### Forensic Report
The *Generate Report* toolbar button produces two files:
- **`report.html`** — human-readable report with case metadata, source file hashes,
parse statistics, OCR IOC hits, and reconstructed screen thumbnails
- **`report.json`** — machine-readable structured output for SIEM / SOAR integration
Chain-of-custody fields included: source file path, size, SHA-256 hash, tool
version and run timestamp, tile counts, OCR findings with exact byte offsets.
## Requirements
- Python 3.10+
- PySide6 >= 6.5.0
- Pillow >= 9.0.0
- NumPy >= 1.21.0
- *(Optional)* easyocr >= 1.7.0 — required only for OCR scanning
## Installation
# Core tool (no OCR)
pip install -r requirements.txt
# With OCR support
pip install easyocr
Or install as a package:
pip install . # core only
pip install ".[ocr]" # with OCR
pip install ".[dev]" # with test suite
## Usage
python main.py
1. Drag-and-drop the `Cache` folder onto the window, or use **Open File(s)** /
**Open Folder** / **Auto-Detect**
2. The tool parses all files and immediately runs smart reconstruction
3. Browse extracted tiles in the grid (zoom, filter blanks, click to inspect)
4. Click **Reconstruct Screen** to open the manual stitch canvas
5. Click **OCR Scan** to extract text from all tiles (easyocr required)
6. Click **Generate Report** to export an HTML + JSON forensic report
7. Use **Export Tiles** / **Export Collage** to save tile images
Reconstructed screens are written to `smart_reconstruction/` next to the cache
files automatically after each parse.
## Supported File Types
| File | Windows version | Colour depth |
|---|---|---|
| `Cache0000.bin` – `Cache0005.bin` | Windows 7+ | 32-bit BGRA |
| `bcache2.bmc` | Vista / Server 2008 | 8-bit indexed |
| `bcache22.bmc` | Vista / Server 2008 | 16-bit RGB565 |
| `bcache24.bmc` | Vista / Server 2008 | 24-bit or 32-bit |
Cache files are located at:
C:\Users\
\AppData\Local\Microsoft\Terminal Server Client\Cache
## Important Limitation — EGFX / H.264 Sessions
RDP sessions using **H.264/AVC encoding** (the EGFX pipeline, enabled by
RemoteFX or modern "Experience" quality settings) **do not produce traditional
bitmap cache tiles**. Azure Virtual Desktop and many cloud-hosted desktops also
disable bitmap caching.
An empty cache file (or zero tiles parsed) does **not** rule out RDP activity —
it may simply mean the session used the EGFX pipeline.
Traditional bitmap caching is used by `mstsc.exe` with *Legacy* graphics settings
(uncheck "Use hardware graphics acceleration" and lower visual quality settings).
## How It Works
## Running the Tests
pip install ".[dev]"
pytest
The test suite (155 passing, 1 skipped, no easyocr or GUI required) covers
parsers, tile model, RLE decoder, edge matcher, cluster, smart reconstruct,
OCR engine, OCR confidence filter, audit log, packaging contract, GUI
pipeline helpers, run-state reset, sidecar JSON, and forensic report.
The skipped test only runs when easyocr is *not* installed (confirms the
auto-install fallback path); it is automatically skipped once easyocr is
present.
## License
**Apache License 2.0** — see [LICENSE](LICENSE) and [NOTICE](NOTICE).
The tool is **free to use for everyone, including enterprises**. But under
Section 4(d) of the Apache License, if you redistribute it — or integrate its
code or its reconstruction / edge-matching / OCR logic into your own tool —
you **must keep the attribution** from the [NOTICE](NOTICE) file:
In short: **use it freely, but credit the author.** Apache 2.0 also grants you
an explicit patent license.
### Third-Party Libraries
| Library | License | Usage |
|---|---|---|
| [PySide6](https://www.qt.io/qt-for-python) | LGPL v3 | GUI framework |
| [Pillow](https://python-pillow.org/) | HPND | Image processing |
| [NumPy](https://numpy.org/) | BSD 3-Clause | Vectorised edge matching |
| [easyocr](https://github.com/JaidedAI/EasyOCR) *(optional)* | Apache 2.0 | Tile text extraction |