lordmilko/PESpy
GitHub: lordmilko/PESpy
Stars: 3 | Forks: 0
# PESpy
[](https://ci.appveyor.com/project/lordmilko/pespy)
[](https://www.nuget.org/packages/PESpy/)
[](https://liberapay.com/lordmilko/donate)
PESpy is a C#/PowerShell library for reverse engineering, analyzing and visualizing Microsoft compiler generated file formats.
Given a file, PESpy aims to
* Understand the meaning of *every single byte within that file*
* Support parsing *all known entities*, no matter how obscure
* Minimize abstractions, and mirror native type names wherever possible
* Be highly performant while still being ergonomic. Allocations need be as low as possible!
* Support *all known symbol formats*; COFF, OMF, CodeView, SYM, DBG, PDB files, DNDRB, NB00-NB10, RSDS - if symbols exist, PESpy will read and show them to you
* Unironically provide *information at your fingertips*. The whole entire file hierarchy is exposed via properties; simply open a file, and then poke around in the Locals window
* Support reading PE Files out of a remote debug target where the size of the PE File isn't known upfront
* Provide tools for performing various file operations, including
* Detecting file types
* Locating symbol files (no more `symsrv.dll`!)
* Resolving RPC Servers
* Manipulating Symbol Keys
* Parsing vftables
* Undecorating symbol names
* Reading and decompressing files contained in Windows installation media
* Be highly NativeAOT friendly
PESpy is capable of interfacing with the following file types
| Name | Description
|--------|---------------
| PE | Portable Executable files, first seen in Windows NT 3.1 |
| PDB | "Old Style" (JG 1.0), MSF (JG 2.0, DS 7.0) and Portable PDB files |
| OBJ | Principally we are interested in `*.obj` files, but strictly speaking anything that uses COFF (such as `*.exp`, `*.iobj`, etc) can be opened |
| DOS | Simple DOS files with an `IMAGE_DOS_HEADER` and possible trailing CodeView data |
| NE | 16-bit New Executable files, as seen in 16-bit Windows and to a lesser extent in Windows 9x |
| LE | 32-bit Linear Executable files; specifically, the format used by VxD driver files |
| DBG | COFF based files containing debug metadata that has been split out of the main executable file |
| LIB | COFF based Archive libraries used by the linker, that potentially contain object files embedded within them |
| OMF | `*.obj`` files emitted by older compiler toolchains from the DOS era that use the Object Module Format, a precursor to COFF |
| OMFLIB | `*.lib` files emitted and consumed by older compiler toolchains from the DOS era that use OMF |
| OMFDBG | An older style `*.dbg` file whose entire contents is the raw OMF style CodeView section |
| SYM | `*.sym` files generated by `mapsym.exe` or by the compiler from parsing a `*.map` file |
## Installation
Install-Package PESpy
PESpy is available on both [nuget.org](https://www.nuget.org/packages/PESpy/) and [PowerShell Gallery](https://www.powershellgallery.com/packages/PESpy/). PESpy provides targets for both .NET 9.0 and .NET Standard, and is SourceLink compatible. In order to install PESpy from the PowerShell Gallery you must be running PowerShell 5.1+. PESpy is compatible with both Windows PowerShell and PowerShell Core.
## Getting Started
PESpy's major selling point is, wherever it can, it tries to show you the true shape of the data that resides within a file. The following snippets show the various entry points to PESpy's key functionality. For extremely thorough documentation on all that PESpy has to offer, please see the [wiki](https://github.com/lordmilko/PESpy/wiki).
### Enumerate All Imports
/* Retrieving locals in native code involves traversing the IMAGE_IMPORT_DESCRIPTOR entities, resolving various RVAs
* traversing a list of IMAGE_THUNK_DATA entities followed, checking various bit fields, resolving
* even more RVAs, before finally retrieving the strings you're after. That is what the data looks like. PESpy provides
* many mechanisms to simplify complex lookups, but it will never hide the underlying shape of the data to "make it easy" */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");
ImageImportDescriptor[]? importTable = peFile.ImportTable;
if (importTable != null)
{
foreach (var imageImportDescriptor in importTable)
{
/* Any field that is an RVA to another entity is modelled as a field of type RVA. This type
* provides access to the original RVA that was listed in the field, whether the RVA could actually
* be resolved to a valid address, and the actual value that was read from that address */
RVA dllName = imageImportDescriptor.Name;
if (!dllName.IsValid)
continue;
RVA originalFirstThunk = imageImportDescriptor.OriginalFirstThunk;
if (!originalFirstThunk.IsValid)
continue;
/* A custom collection type prevents us from having to allocate a large array to access all
* of the thunks in the section. Note that the trailing "null" IMAGE_THUNK_DATA is also included
* as the last item in this list */
foreach (ImageThunkData entry in originalFirstThunk.Value)
{
//IMAGE_THUNK_DATA is defined as a union of four possible fields. PESpy tries to figure out
//which logical type the thunk represents, and stores this in an added Kind field
if (entry.Value == 0)
continue; //This is the trailing "null" entry which marks the end of this import's thunks
if (entry.Kind == ImageThunkData.DataKind.Name)
{
RVA thunkName = entry.Name;
if (!thunkName.IsValid)
Console.WriteLine($"{dllName}: Invalid Name (0x{thunkName.ListedOffset})");
else
Console.WriteLine($"{dllName}: {thunkName}");
}
}
}
}
### Locate Symbol Files
PESpy's `Locator` class provides a manged implementation of the `LOCATOR` class found in mspdbcore, which also powers DIA
* `Locator` can locate all kinds of symbols; PDBs (be they regular, Portable, Embedded or NGEN), `*.dbg` files (that may in turn point to `*.pdb` files) and even legacy `*.sym` files
* It knows how to read your symbol path; if `_NT_SYMBOL_PATH` isn't set, it'll automatically use a symbol path that includes `msdl.microsoft.com`
* It can download symbols from remote HTTP servers and cascade them down your symbol path
* Provides various entry points for all kinds of different scenarios, with both synchronous and asynchronous modes available
* Allows specifying a callback to receive progress notifications
* Jumps through various hoops to be as low allocation as possible
* Fully portable, with zero reliance on `symsrv.dll`
var pdbPath = Locator.LocatePDB("C:\\Windows\\system32\\ntdll.dll");
`Locator` is such a small part of PESpy's surface area, but I'm amazed how often I use this; this has surprisingly become one of PESpy's best features for me!
### Enumerate All Symbols
/* PEFile provides various members (SymStoreKeys, GetSymStoreKey()) that provide identifiers for files
* that you can lookup on a symbol server. If you're writing unit tests for a diagnostic application that analyzes
* a certain DLL, you can potentialy "bookmark" that DLL by hardcoding its SymStoreKey, and then have your test re-download
* that file as needed so your test always produces the same result! */
var key = new SymStoreKey("coreclr.pdb/75099299D3D948A68B594FC4439DFA521/coreclr.pdb");
var pdbPath = Locator.LocatePDB(key);
/* The PDBFile class provides access to every single piece of functionality you might see in an MSF based PDB File.
* Every hash, every lookup, every struct since the introduction of MSF in Visual C++ 2.0 (1994) */
using var pdbFile = PDBFile.FromFile(pdbPath);
/* The native representation of a symbol is a SYMTYPE*. SymType is a zero cost abstraction over a pointer, but unlike
* a native SYMTYPE*, SymType uses insane debugger magic to show you all of the symbol's fields in the Locals window
* without you having to write any code */
foreach (SymType symType in pdbFile.EnumerateSymbols())
{
/* A SymType can be cast to a more specific symbol type (e.g. ProcSym32) based on the `SYM_ENUM_e` of its `rectyp`,
* or you can use extension methods that replicate the behavior of the various getters seen on `IDiaSymbol` */
if (symType.TryGetFramePointerPresent(out var framePointerPresent))
{
if (symType.rectyp == SYM_ENUM_E.S_GPROC32)
{
var pubSym32 = (PubSym32) symType;
/* Modern PDBs contain UTF-8 null terminated strings. But older PDBs use length prefixed "ST" strings.
* PESpy can use magic to figure out that the expected string format is, or you can just provide the PDBFile.
* ProcSym32's "name" property provides easy access to the symbol's name, but for high performance access
* you'll want to use the GetName method */
SymString name = pubSym32.GetName(pdbFile);
}
}
}
### Visualize A File
/* In two lines of code, you can visualize the entire contents of a file: view all sections, the regions
* within those sections, how code and data intertwine, and the xrefs between everything. Explore
* the entire structure of a file right from within your debugger. Query offsets, RVAs and VAs to find
* exactly what is located at that address. Strings are automatically detected, and an interface is provided
* to facilitate tagging disassembled code */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");
/* Unless you say otherwise, GetView will automatically attempt to download symbols,
* so the first time you call this you may need to wait while symbols are downloaded.
* Secify a progress callback to receive notice of what is going on. See the wiki for
* more information on interfacing with views */
var view = peFile.GetView();
For much more information on the usage of PESpy, please see the [wiki](https://github.com/lordmilko/PESpy)