zrckr/xcompress
GitHub: zrckr/xcompress
Stars: 0 | Forks: 0
# `xcompress`
The reverse-engineering project of the `xcompress` dynamic library - a Microsoft
Xbox LZX compression codec.
## Status
Both decompression and compression have been decompiled and implemented.
## Byte-matching behavior
XNB assets from FEZ 1.12 (`Other.pak`) were used as test samples, extracted with
[FEZRepacker](https://github.com/FEZModding/FEZRepacker) 1.3.0. Files were
tested in pairs - compressed and uncompressed versions.
| Test | Result | Diff |
| ---------------------- | --------- | ---- |
| Decompression | 2028/2028 | 0 |
| Compression | 1816/2028 | 212 |
| Compress -> Decompress | 2028/2028 | 0 |
| Decompress -> Compress | 1816/2028 | 212 |
## Compression issues
Although the code can decompress both original data and data produced by the
decompressed version of `xcompress`, there is a difference in how the
compression bytes are written. The encoder is functionally correct - round-trip
decompression always succeeds - but it does not produce byte-for-byte identical
output compared to the original library.
## Test suite
### Arguments
--compressed Path to compressed XNB file(s)
--decompressed Path to uncompressed XNB file(s)
--verbose, -v Print detailed mismatch info (byte diffs, chunk analysis)
--output-failed-compressed Save failed compression attempts as .fail files
### Building
Build the `xcompress` DLL with Visual Studio (solution: `xcompress.sln`). The
post-build step automatically copies `xcompress.dll` into
`testsuite/bin//net10.0/`. Then run the test suite:
dotnet run --project testsuite -- --compressed --decompressed
## Implementation details
**Codec**: LZX (Lempel-Ziv-X)
**Parameters used by the test suite:**
| Parameter | Value |
| -------------------------- | ------ |
| Window size | 64 KB |
| Compression partition size | 256 KB |
| Block size | 32 KB |
**Huffman trees per block:**
- Main tree - 256 literals + position slots
- Length tree - 249 symbols
- Aligned offset tree - 8 symbols
**Block types:**
1. Verbatim - Huffman-coded literals and matches
2. Aligned - Huffman + aligned offset encoding
3. Uncompressed - stored verbatim for incompressible data
**Other features:** repeated match offset caching (3 slots), E8 translation (x86
preprocessing), sliding window pattern matching via binary search trees.
## License
See `LICENSE.txt`.