whitedevil1026/Rdp-Cache-parser

GitHub: whitedevil1026/Rdp-Cache-parser

Stars: 1 | Forks: 0

RDP Cache Parser logo

RDP Cache Parser

Every pixel tells a story.
A Windows DFIR tool for parsing and reconstructing screen images from Remote Desktop Protocol (RDP) bitmap cache files (Cache####.bin, bcache##.bmc).

Python 3.10+ Apache 2.0 License Tests PySide6

## Features | Category | Capability | |---|---| | **Parsing** | `.bin` (Win 7+, 32-bit BGRA) and `.bmc` (Vista/2008, 8/16/24-bit) | | **Reconstruction** | Manual edge-match stitch canvas (authoritative) + automatic edge-matched triage hypothesis | | **GUI** | Dark-theme tile viewer, drag-and-drop, manual stitch canvas | | **OCR** | Optional text extraction from tiles + forensic keyword detection | | **Reporting** | HTML + JSON forensic report with SHA-256 chain-of-custody | | **Export** | Individual tiles or full collage (PNG/BMP) | | **Session** | Save / load manual stitch canvas state | ### Reconstruction — what it can and cannot do **RDP cache files store no screen coordinates.** Each 64×64 tile is saved with only an 8-byte content hash (for deduplication); the screen x/y positions exist only in transient `MemBlt` drawing orders during the live session and are never written to disk. Fully-automatic, accurate screen reconstruction is therefore **not achievable** — it is, by consensus in the DFIR community, an analyst-assisted task. This tool reflects that reality: - The **manual stitch canvas** is the authoritative reconstruction path. It uses pixel edge-matching to rank candidate tiles for each cell, and every placement is the analyst's documented choice. - **Automatic reconstruction** is provided for *triage only*. It is a best- effort hypothesis — every output image carries an "INFERRED — NOT TIMESTAMPED" caveat banner and a confidence grade, and must never be presented as a literal screenshot. The automatic path works as follows: - **Strip detection** — consecutive tiles in `.bin` files often correspond to adjacent screen positions; tile order within a strip is verified by edge-matching - **Temporal grouping** — separates tiles from different screen states using file-index proximity (gap > 150 → new snapshot) - **Edge-matched block placement** — blocks are positioned by matching their border pixels to one another (A's bottom edge ↔ B's top edge); blocks with no confident edge match fall back to file-order estimation - **Auto-resolution** — detects actual screen width (1280 / 1366 / 1440 / 1920 / 2560 / 3840 px) from block widths; no manual configuration needed - **OCR-driven slides** — when OCR text is available, screen states are ordered into a numbered slide sequence with a `slides_manifest.json` index ### OCR Scanning (optional) Runs optical character recognition on every non-blank tile and reports: - All detected text with source file, tile index, and byte offset - IOC hits — tiles matching a built-in forensic keyword list (mimikatz, certutil, psexec, powershell, password, ntds.dit, and 40+ more) - Live keyword filter in the GUI for rapid triage OCR requires the **optional** `easyocr` library (see Installation). ### Forensic Report The *Generate Report* toolbar button produces two files: - **`report.html`** — human-readable report with case metadata, source file hashes, parse statistics, OCR IOC hits, and reconstructed screen thumbnails - **`report.json`** — machine-readable structured output for SIEM / SOAR integration Chain-of-custody fields included: source file path, size, SHA-256 hash, tool version and run timestamp, tile counts, OCR findings with exact byte offsets. ## Requirements - Python 3.10+ - PySide6 >= 6.5.0 - Pillow >= 9.0.0 - NumPy >= 1.21.0 - *(Optional)* easyocr >= 1.7.0 — required only for OCR scanning ## Installation # Core tool (no OCR) pip install -r requirements.txt # With OCR support pip install easyocr Or install as a package: pip install . # core only pip install ".[ocr]" # with OCR pip install ".[dev]" # with test suite ## Usage python main.py 1. Drag-and-drop the `Cache` folder onto the window, or use **Open File(s)** / **Open Folder** / **Auto-Detect** 2. The tool parses all files and immediately runs smart reconstruction 3. Browse extracted tiles in the grid (zoom, filter blanks, click to inspect) 4. Click **Reconstruct Screen** to open the manual stitch canvas 5. Click **OCR Scan** to extract text from all tiles (easyocr required) 6. Click **Generate Report** to export an HTML + JSON forensic report 7. Use **Export Tiles** / **Export Collage** to save tile images Reconstructed screens are written to `smart_reconstruction/` next to the cache files automatically after each parse. ## Supported File Types | File | Windows version | Colour depth | |---|---|---| | `Cache0000.bin` – `Cache0005.bin` | Windows 7+ | 32-bit BGRA | | `bcache2.bmc` | Vista / Server 2008 | 8-bit indexed | | `bcache22.bmc` | Vista / Server 2008 | 16-bit RGB565 | | `bcache24.bmc` | Vista / Server 2008 | 24-bit or 32-bit | Cache files are located at: C:\Users\\AppData\Local\Microsoft\Terminal Server Client\Cache ## Important Limitation — EGFX / H.264 Sessions RDP sessions using **H.264/AVC encoding** (the EGFX pipeline, enabled by RemoteFX or modern "Experience" quality settings) **do not produce traditional bitmap cache tiles**. Azure Virtual Desktop and many cloud-hosted desktops also disable bitmap caching. An empty cache file (or zero tiles parsed) does **not** rule out RDP activity — it may simply mean the session used the EGFX pipeline. Traditional bitmap caching is used by `mstsc.exe` with *Legacy* graphics settings (uncheck "Use hardware graphics acceleration" and lower visual quality settings). ## How It Works ## Running the Tests pip install ".[dev]" pytest The test suite (155 passing, 1 skipped, no easyocr or GUI required) covers parsers, tile model, RLE decoder, edge matcher, cluster, smart reconstruct, OCR engine, OCR confidence filter, audit log, packaging contract, GUI pipeline helpers, run-state reset, sidecar JSON, and forensic report. The skipped test only runs when easyocr is *not* installed (confirms the auto-install fallback path); it is automatically skipped once easyocr is present. ## License **Apache License 2.0** — see [LICENSE](LICENSE) and [NOTICE](NOTICE). The tool is **free to use for everyone, including enterprises**. But under Section 4(d) of the Apache License, if you redistribute it — or integrate its code or its reconstruction / edge-matching / OCR logic into your own tool — you **must keep the attribution** from the [NOTICE](NOTICE) file: In short: **use it freely, but credit the author.** Apache 2.0 also grants you an explicit patent license. ### Third-Party Libraries | Library | License | Usage | |---|---|---| | [PySide6](https://www.qt.io/qt-for-python) | LGPL v3 | GUI framework | | [Pillow](https://python-pillow.org/) | HPND | Image processing | | [NumPy](https://numpy.org/) | BSD 3-Clause | Vectorised edge matching | | [easyocr](https://github.com/JaidedAI/EasyOCR) *(optional)* | Apache 2.0 | Tile text extraction |