Ravenz16/dow-de-replay-parser
GitHub: Ravenz16/dow-de-replay-parser
Stars: 1 | Forks: 0
# DoW Replay Analyzer
*Read this in other languages: [Русский](README.ru.md)*
A Java tool that parses **Dawn of War** replay files (`.rec`) and reconstructs what each player did — build orders, point captures, unit orders, abilities, and a per-player strategic summary — directly from the binary command stream.
The replay format is undocumented, so the command stream was reverse-engineered byte by byte. This project is the result of that work.
## What it does
Given a `.rec` file, the analyzer prints a structured report:
- **Header** — map, duration (ticks → seconds), and the player/observer list.
- **Command stream summary** — events parsed, packets read, sub-commands decoded.
- **CMD_IDs found** — every command opcode encountered, with counts.
- **Timeline** — chronological list of decoded events (tick, time, command, entity, position).
- **Build / capture / generator events** — filtered views of structures and point captures.
- **Per-player timelines** — events split between the two human players.
- **Build order** — the key economic and strategic actions per player, in order.
- **Strategic analysis** — economy (barracks, generators, LP posts, captures, tech) and army (orders, attacks, abilities) per player.
## How it works
A `.rec` file is a Relic Chunky container. The pipeline is:
1. `BinaryReader` — low-level little-endian reader over the raw bytes.
2. `RecursiveRelicChunkParser` + `RelicChunkNode` + `ChunkSearch` — walk the chunk tree and locate the metadata (map, players) and the **`FOLDINFO`** chunk.
3. `CommandStreamParser` — the command stream begins right after `FOLDINFO`. This class decodes it into a list of `GameEvent`s and attributes each to a player.
4. `test.Main` — entry point; runs the pipeline and prints the report.
### Command stream layout (reverse-engineered)
The stream is a flat sequence of **packets**, each representing one game tick for one player:
| Offset | Field | Notes |
|-------:|-------|-------|
| `-4` | packet size | 4-byte LE prefix before the payload |
| `0` | `0x50` marker | identifies a command packet |
| `1..4` | tick | uint32 LE; `tick / 8.0` = seconds |
| `13` | command count | number of sub-commands in the packet |
| `25..28` | inner size | covers only the **first** sub-command block |
| `30+` | sub-commands | first block here; later blocks recovered by tail-scan |
Each **sub-command** carries the entity id (`+2..5`), a phase byte at `+27`, and — for the 40-byte form — a position (`x@+34`, `y@+36`, `z@+38`, int16 LE). Sub-command sizes:
- `28 / 33 / 35` — progress / completion markers (no coordinates).
- `40` — a real placement / order with coordinates.
- `45` — a compound command embedding several opcodes.
The opcode (`cmd_id`) is read at `+29` for 40-byte commands and `+23` otherwise; the 40-byte build form with phase `3` encodes the type as `0xC300 | byte[28]`.
## Project structure
com.dow.replay.parser.BinaryReader low-level binary reader
com.dow.replay.chunk.RelicChunkNode chunk tree node
com.dow.replay.chunk.ChunkSearch chunk lookup helpers
com.dow.replay.chunk.RecursiveRelicChunkParser Relic Chunky parser
com.dow.replay.parser.CommandStreamParser core: command stream → events + attribution
test.Main entry point / report printer
## Build & run
Requires **JDK 17** (developed on Amazon Corretto 17). Source is UTF-8.
In IntelliJ IDEA: open the project, set an SDK of 17, and run `test.Main` with the replay path as the program argument:
test.Main scr\replays\.rec
From the command line:
javac -encoding UTF-8 -d out $(find src -name "*.java")
java -cp out test.Main scr/replays/.rec
## Known limitations
This is honest reverse-engineering of an undocumented format, and some things are not recoverable from the command stream alone:
- **Only state-changing commands are recorded.** Dawn of War logs builds, moves, attacks, abilities, captures, and reinforcements — but **not** selections, camera moves, control-group clicks, or idle orders (the bulk of raw APM). The command count is therefore far lower than a player's real APM, by design.
- **There is no per-command player field.** Investigated and ruled out: the phase byte tracks build *side of the map*, the entity id is a global time-ordered counter (player ranges overlap), and the packet header bytes are sync hashes. Player attribution is therefore a **heuristic**: race-exclusive command families + home-base position + nearest-neighbour propagation. It is reliable for abilities, race-specific orders, and base structures, but **shared commands (capture / move / attack) can be misattributed** in windows where both players act in alternation.
- **`cmd_id` meanings are matchup-specific.** Opcodes are race/build-menu blueprints rather than universal action codes — the same opcode can mean different things in different match-ups. Labels are generic where the command is shared and `TAU_*` / `ORK_*` only where confidently race-exclusive.
- **Opening base buildings can be invisible.** Some early structures are issued via 28-byte commands that merge into completion noise and are not individually resolved.
- **AI / bot commands are not recorded** in the replay command stream.
- **Observers** appear as extra player slots (a 1v1 can carry up to 6 spectators) but issue no commands.
## Status
Actively reverse-engineered across several match-ups (Tau vs Dark Eldar, Ork vs Tau). The command-stream layout and event decoding are stable; player attribution is a best-effort heuristic and continues to improve as more ground-truth replays are analysed.
标签:域名枚举