lordmilko/PESpy

GitHub: lordmilko/PESpy

Stars: 3 | Forks: 0

# PESpy [![Build status](https://ci.appveyor.com/api/projects/status/n3o1pfy22ki6saoj?svg=true)](https://ci.appveyor.com/project/lordmilko/pespy) [![NuGet](https://img.shields.io/nuget/v/PESpy.svg)](https://www.nuget.org/packages/PESpy/) [![Donate](http://img.shields.io/liberapay/patrons/lordmilko.svg?logo=liberapay)](https://liberapay.com/lordmilko/donate) PESpy is a C#/PowerShell library for reverse engineering, analyzing and visualizing Microsoft compiler generated file formats. Given a file, PESpy aims to * Understand the meaning of *every single byte within that file* * Support parsing *all known entities*, no matter how obscure * Minimize abstractions, and mirror native type names wherever possible * Be highly performant while still being ergonomic. Allocations need be as low as possible! * Support *all known symbol formats*; COFF, OMF, CodeView, SYM, DBG, PDB files, DNDRB, NB00-NB10, RSDS - if symbols exist, PESpy will read and show them to you * Unironically provide *information at your fingertips*. The whole entire file hierarchy is exposed via properties; simply open a file, and then poke around in the Locals window * Support reading PE Files out of a remote debug target where the size of the PE File isn't known upfront * Provide tools for performing various file operations, including * Detecting file types * Locating symbol files (no more `symsrv.dll`!) * Resolving RPC Servers * Manipulating Symbol Keys * Parsing vftables * Undecorating symbol names * Reading and decompressing files contained in Windows installation media * Be highly NativeAOT friendly PESpy is capable of interfacing with the following file types | Name | Description |--------|--------------- | PE | Portable Executable files, first seen in Windows NT 3.1 | | PDB | "Old Style" (JG 1.0), MSF (JG 2.0, DS 7.0) and Portable PDB files | | OBJ | Principally we are interested in `*.obj` files, but strictly speaking anything that uses COFF (such as `*.exp`, `*.iobj`, etc) can be opened | | DOS | Simple DOS files with an `IMAGE_DOS_HEADER` and possible trailing CodeView data | | NE | 16-bit New Executable files, as seen in 16-bit Windows and to a lesser extent in Windows 9x | | LE | 32-bit Linear Executable files; specifically, the format used by VxD driver files | | DBG | COFF based files containing debug metadata that has been split out of the main executable file | | LIB | COFF based Archive libraries used by the linker, that potentially contain object files embedded within them | | OMF | `*.obj`` files emitted by older compiler toolchains from the DOS era that use the Object Module Format, a precursor to COFF | | OMFLIB | `*.lib` files emitted and consumed by older compiler toolchains from the DOS era that use OMF | | OMFDBG | An older style `*.dbg` file whose entire contents is the raw OMF style CodeView section | | SYM | `*.sym` files generated by `mapsym.exe` or by the compiler from parsing a `*.map` file | ## Installation Install-Package PESpy PESpy is available on both [nuget.org](https://www.nuget.org/packages/PESpy/) and [PowerShell Gallery](https://www.powershellgallery.com/packages/PESpy/). PESpy provides targets for both .NET 9.0 and .NET Standard, and is SourceLink compatible. In order to install PESpy from the PowerShell Gallery you must be running PowerShell 5.1+. PESpy is compatible with both Windows PowerShell and PowerShell Core. ## Getting Started PESpy's major selling point is, wherever it can, it tries to show you the true shape of the data that resides within a file. The following snippets show the various entry points to PESpy's key functionality. For extremely thorough documentation on all that PESpy has to offer, please see the [wiki](https://github.com/lordmilko/PESpy/wiki). ### Enumerate All Imports /* Retrieving locals in native code involves traversing the IMAGE_IMPORT_DESCRIPTOR entities, resolving various RVAs * traversing a list of IMAGE_THUNK_DATA entities followed, checking various bit fields, resolving * even more RVAs, before finally retrieving the strings you're after. That is what the data looks like. PESpy provides * many mechanisms to simplify complex lookups, but it will never hide the underlying shape of the data to "make it easy" */ using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll"); ImageImportDescriptor[]? importTable = peFile.ImportTable; if (importTable != null) { foreach (var imageImportDescriptor in importTable) { /* Any field that is an RVA to another entity is modelled as a field of type RVA. This type * provides access to the original RVA that was listed in the field, whether the RVA could actually * be resolved to a valid address, and the actual value that was read from that address */ RVA dllName = imageImportDescriptor.Name; if (!dllName.IsValid) continue; RVA originalFirstThunk = imageImportDescriptor.OriginalFirstThunk; if (!originalFirstThunk.IsValid) continue; /* A custom collection type prevents us from having to allocate a large array to access all * of the thunks in the section. Note that the trailing "null" IMAGE_THUNK_DATA is also included * as the last item in this list */ foreach (ImageThunkData entry in originalFirstThunk.Value) { //IMAGE_THUNK_DATA is defined as a union of four possible fields. PESpy tries to figure out //which logical type the thunk represents, and stores this in an added Kind field if (entry.Value == 0) continue; //This is the trailing "null" entry which marks the end of this import's thunks if (entry.Kind == ImageThunkData.DataKind.Name) { RVA thunkName = entry.Name; if (!thunkName.IsValid) Console.WriteLine($"{dllName}: Invalid Name (0x{thunkName.ListedOffset})"); else Console.WriteLine($"{dllName}: {thunkName}"); } } } } ### Locate Symbol Files PESpy's `Locator` class provides a manged implementation of the `LOCATOR` class found in mspdbcore, which also powers DIA * `Locator` can locate all kinds of symbols; PDBs (be they regular, Portable, Embedded or NGEN), `*.dbg` files (that may in turn point to `*.pdb` files) and even legacy `*.sym` files * It knows how to read your symbol path; if `_NT_SYMBOL_PATH` isn't set, it'll automatically use a symbol path that includes `msdl.microsoft.com` * It can download symbols from remote HTTP servers and cascade them down your symbol path * Provides various entry points for all kinds of different scenarios, with both synchronous and asynchronous modes available * Allows specifying a callback to receive progress notifications * Jumps through various hoops to be as low allocation as possible * Fully portable, with zero reliance on `symsrv.dll` var pdbPath = Locator.LocatePDB("C:\\Windows\\system32\\ntdll.dll"); `Locator` is such a small part of PESpy's surface area, but I'm amazed how often I use this; this has surprisingly become one of PESpy's best features for me! ### Enumerate All Symbols /* PEFile provides various members (SymStoreKeys, GetSymStoreKey()) that provide identifiers for files * that you can lookup on a symbol server. If you're writing unit tests for a diagnostic application that analyzes * a certain DLL, you can potentialy "bookmark" that DLL by hardcoding its SymStoreKey, and then have your test re-download * that file as needed so your test always produces the same result! */ var key = new SymStoreKey("coreclr.pdb/75099299D3D948A68B594FC4439DFA521/coreclr.pdb"); var pdbPath = Locator.LocatePDB(key); /* The PDBFile class provides access to every single piece of functionality you might see in an MSF based PDB File. * Every hash, every lookup, every struct since the introduction of MSF in Visual C++ 2.0 (1994) */ using var pdbFile = PDBFile.FromFile(pdbPath); /* The native representation of a symbol is a SYMTYPE*. SymType is a zero cost abstraction over a pointer, but unlike * a native SYMTYPE*, SymType uses insane debugger magic to show you all of the symbol's fields in the Locals window * without you having to write any code */ foreach (SymType symType in pdbFile.EnumerateSymbols()) { /* A SymType can be cast to a more specific symbol type (e.g. ProcSym32) based on the `SYM_ENUM_e` of its `rectyp`, * or you can use extension methods that replicate the behavior of the various getters seen on `IDiaSymbol` */ if (symType.TryGetFramePointerPresent(out var framePointerPresent)) { if (symType.rectyp == SYM_ENUM_E.S_GPROC32) { var pubSym32 = (PubSym32) symType; /* Modern PDBs contain UTF-8 null terminated strings. But older PDBs use length prefixed "ST" strings. * PESpy can use magic to figure out that the expected string format is, or you can just provide the PDBFile. * ProcSym32's "name" property provides easy access to the symbol's name, but for high performance access * you'll want to use the GetName method */ SymString name = pubSym32.GetName(pdbFile); } } } ### Visualize A File /* In two lines of code, you can visualize the entire contents of a file: view all sections, the regions * within those sections, how code and data intertwine, and the xrefs between everything. Explore * the entire structure of a file right from within your debugger. Query offsets, RVAs and VAs to find * exactly what is located at that address. Strings are automatically detected, and an interface is provided * to facilitate tagging disassembled code */ using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll"); /* Unless you say otherwise, GetView will automatically attempt to download symbols, * so the first time you call this you may need to wait while symbols are downloaded. * Secify a progress callback to receive notice of what is going on. See the wiki for * more information on interfacing with views */ var view = peFile.GetView(); For much more information on the usage of PESpy, please see the [wiki](https://github.com/lordmilko/PESpy)