Vith0r/StackSentry
GitHub: Vith0r/StackSentry
Stars: 25 | Forks: 2

## The Idea StackSentry is primarily focused on fast triage of code running in memory. The reasoning behind it is practical: a C2, RAT, or fileless loader can hide the file on disk, encrypt itself while sleeping, and build a nice-looking call stack, but at some point it still needs to load or talk through a network DLL. On Windows, that usually means DLLs like `ws2_32.dll`, `wininet.dll`, `winhttp.dll`, `dnsapi.dll`, or APIs exported by them. If those DLLs are loaded or used from a strange origin, it is worth stopping and looking closer. This pattern did not come out of nowhere. It lines up directly with ideas already used in behavioral detection, such as these Elastic rules: - [`defense_evasion_library_loaded_via_a_callback_function.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_library_loaded_via_a_callback_function.toml): identifies a library load through a callback, possibly to hide the real origin of the `LoadLibrary` call from the call stack. - [`defense_evasion_network_module_loaded_from_suspicious_unbacked_memory.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_network_module_loaded_from_suspicious_unbacked_memory.toml): identifies a network module load when the thread stack contains frames outside known executable images, a common pattern in in-memory execution. - [`defense_evasion_library_loaded_from_a_spoofed_call_stack.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_library_loaded_from_a_spoofed_call_stack.toml): detects a library load from a potentially altered call stack used to conceal the real source of the call. Because detections like these exist, some loaders now avoid the obvious path. Instead of calling `LoadLibrary` directly from private memory, they try to hide the origin with callbacks, gadgets inside legitimate modules, threadpool chains, cross-thread dispatch, execution from modified images, or even manipulated unwind metadata. StackSentry takes that idea and pushes it further in a lab setting: not only saying that a sensitive DLL was loaded, but also showing the probable origin, the memory involved, the stack state, useful dumps, and, when possible, the path the loader tried to hide. ## What It Looks For ## Detection Gallery The sample commands and expected call-stack summaries are documented in [samples/README.md](samples/README.md).
Quick warning: the samples are rough lab builds I made for local testing, so do not expect polished showcase binaries.
Below are a few patterns StackSentry can show in the terminal without needing a kernel driver. Some images still show output captured around the `v0.8` phase. Since then I improved console rendering, stack compaction, and noise reduction, but I did not think it was honest to call that `v0.9` just because the output got cleaner. If a formatting detail differs slightly from the current version, that is why. ### SilentMoonwalk With Synthetic Stack This test uses a modified variant of [klezVirus/SilentMoonwalk](https://github.com/klezVirus/SilentMoonwalk) to load a network DLL with a synthetic stack. All visible frames look like legitimate modules, but callsite validation and origin tracing still tie the DLL load back to the code that prepared the call.  ### BYOUD / Unwind Metadata Spoofing   ### Threadpool Callback Chain This sample is based on the idea behind [klezvirus/ThreadPoolExecChain](https://github.com/klezvirus/ThreadPoolExecChain): a threadpool/proxy chain makes the load happen in a context that looks more natural. The report preserves the chain context and marks the modified frames that appear along the path.  ### Image `.text` Proxy Because it is PIC, the code can live inside the loader `.text` and call `LoadLibrary` through a proxy, adding another stack element through an existing gadget in `nvwgf2umx.dll`. The final stack points to a seemingly clean place, but register/origin tracing connects the DLL load back to the `.text` region that started the flow. This proves the detection can hold up even in very specific patterns like this one.  ### Code Cave / Modified Image This pattern is also inspired by a simple experimental variant of my [RefinedPool](https://github.com/Vith0r/RefinedPool/tree/main/RefinedPool) project, converted into PIC shellcode. The sensitive load passes through *bytes written into an image-backed code cave*. The useful part is that StackSentry preserves both the modified module and the changed-byte map in the dump output, which I think is genuinely valuable during triage.  ### SilentMoonwalk RDI With Synthetic Stack Here, the [klezVirus/SilentMoonwalk](https://github.com/klezVirus/SilentMoonwalk) variant is packed as a [Donut](https://github.com/TheWover/donut)/RDI payload. The bootstrap may load DLLs such as `wininet.dll` and `mscoree.dll`, but the relevant stage is the `ws2_32.dll` load with a synthetic stack. The first image shows the DLL-load alerts; the second shows the probable origin leading back to the "hidden" executable region.   ### MassDriver-Style Dispatch Inspired by the dispatch pattern from [Sizeable-Bingus/MassDriver](https://github.com/Sizeable-Bingus/MassDriver), a seemingly clean worker thread executes `LoadLibraryA`. `/dispatch-trace` ties the load back to the requester that posted the dispatch structure. It is a very specific detection, but I thought it was interesting enough to include.  ### Network Use Trace In A C2 Payload This example uses `/network-use-trace` to show the case where loading the network DLL is not the only interesting part. The payload also needs to use network APIs, and StackSentry tries to attribute who called `connect`, `WSAConnect`, `send`, `recv`, WinHTTP/WinINet, and related APIs. This can be useful because the output may reveal the real destination, such as domain, IP/port, and even third-party services used in the flow, like `pastebin.com` in this test. The stack in the image is compacted on purpose so it does not become a giant block of repeated `system.ni.dll` frames.
If you prefer the full stack in one line, like in the older screenshots, use `/inline-stack`; if you want frame-by-frame output with offsets, use `/full-stack`.  ### Stack View Modes Beyond the detections themselves, the current StackSentry console tries to make analysis less exhausting. The same stack can be shown in different ways depending on what you want to inspect: #### Compact Stack  #### Full Stack With Offsets With `/full-stack`, each frame is printed on its own line with the module offset. This is useful when you want to audit exactly where every return frame landed.  #### Inline Stack Without Compression With `/inline-stack`, the stack goes back to a single-line format without repeated-frame compression.  #### Clean Events With Verbose  #### BackedModified And Memory Audit When a return frame lands inside a real DLL, but the bytes in that region no longer match the file on disk, StackSentry does not treat it as a clean frame. It marks the stack as `BackedModified`/`captured-modified`, and Memory Audit records the module, region, and temporal detail of the change.  ## Build To keep the build simple, I included `build.ps1`. It uses Microsoft Visual Studio/MSVC to compile. If your environment is not exactly the same, it should not be hard to adapt: the script is short, and reading it makes the required compiler calls pretty clear. .\build.ps1 When using the script, build output is written to `build\`: - `StackSentry64.exe` - `CallstackMonitor.dll` ## Test Commands Recommended commands and exact sample commands live in [samples/README.md](samples/README.md).
If you are unsure where to start, begin there; it includes the first-pass command, the stronger profile, stack output modes, and the examples I use to validate the gallery screenshots. ## Master Profiles - `/quick`: stable DLL-load triage profile. Good for first runs, benign baselines, and lower-noise output. - `/deep`: DLL-load hunting profile. Enables stable callback/thread-start hooks, unwind table hooks, LDR integrity checks, dumps, carving, and a longer correlation window. Memory/API telemetry stays off unless `/mem` is passed. - `/max`: strongest practical DLL-load profile. Enables deep telemetry, stack audit, LDR checks, and `/auto-enter` by default, but leaves memory/API, wait, and threadpool hooks disabled unless explicitly requested. - `/profile`: selects a profile by name.
If you just want to test a loader and do not want to think too much about flags, I would start with `/max`. After that, add `/hunt`, `/network-use-trace`, or specific flags depending on the result.
Explicit hook flags are additive, so `/quick /mem`, `/deep /mem`, and `/max /mem` are valid.
Use `/max /tp /wait` only when you want the most aggressive experimental hook set.
## Command Groups
Use `/features` if you want to see all available arguments.
The full `/features` output is grouped by intent so new users do not have to treat every flag as equally important: - `Common options`: output directory, timeout, keep-alive, stdin automation, and verbosity. - `Output style`: quiet/plain/live/color controls, target output suppression, `/inline-stack`, and `/full-stack`. - `Origin / proxy analysis`: `/regtrace`, `/dispatch-trace`, and `/threadpool-chain-trace` for hidden-caller and proxy-loading cases. - `Network use analysis`: `/network-use-trace` and `/net-use-trace` for already-loaded network DLL reuse. - `Remote / multi-process`: `/follow-remote` and `/net-reset` for loaders that move execution into another process. - `Extra telemetry / integrity`: `/etw`, `/ldr-integrity`, `/unwind`, `/stack-audit`, `/memory-audit`, `/byoud-trace`, and `/shadow-stack`. - `Aggressive / low-level hooks`: `/mem`, `/tp`, `/wait`, and direct `/xhooks`. - `Advanced config`: custom rules/configuration. ## Origin Tracing Proxy DLL-load techniques can make the final call stack look clean, including by placing an existing image gadget between the real loader code and `LoadLibrary`. StackSentry keeps the default profiles focused and low-noise, but adds opt-in modes for those cases: .\build\StackSentry64.exe /run .\samples\sample_03_text_section_proxy.exe /max /origin-trace /no-target-output /timeout 9000 .\build\StackSentry64.exe /run .\samples\sample_03_text_section_proxy.exe /max /regtrace /no-target-output /timeout 9000 Technical note: `/regtrace` intentionally avoids full register tracing over non-main image modules larger than 32 MB. Private executable memory, thread starts, dynamic executable transitions, and origin correlation are still traced; this only avoids expensive full instrumentation of huge gadget-carrier images. I had to do this because tracing those modules was hurting normal analysis more than it helped. If you really need a higher ceiling for a lab case, it is just a small constant in the source. Threadpool hooks remain explicit. Add `/tp` only when you want `TpAllocWork/TpPostWork` telemetry. When `/regtrace` correlates a clean gadget load back to executable `MEM_IMAGE` code, StackSentry writes an artifact under `origin_regions\`. This is separate from `dumps\`: it is not an unbacked allocation, but a focused memory window around the traced `.text` origin, with a sidecar JSON containing origin VA/RVA, visible gadget caller, SHA256, entropy, and selected strings. The `LdrLoadDll` hook also records origin evidence from live `UNICODE_STRING` arguments and proxy parameter blocks. This covers simple bypasses that enter `LoadLibrary*` after the instrumented prologue but still reach `ntdll!LdrLoadDll`. ## LDR Integrity StackSentry also detects loader entrypoint hijacking. This covers techniques such as LdrShuffle/EPI, where a module still looks legitimate in the PEB loader lists, but its `LDR_DATA_TABLE_ENTRY.EntryPoint` is changed to attacker-controlled code. .\build\StackSentry64.exe /run target.exe /ldr-integrity /timeout 10000 `/deep` and `/max` enable this check. `/quick` keeps it disabled. The analyzer reports the hijacked module, current entrypoint, expected PE entrypoint, memory type/protection for the current entrypoint, and context such as suspicious `OriginalBase` when an entrypoint anomaly already exists. ## Network Use Trace .\build\StackSentry64.exe /run target.exe /max /network-use-trace /timeout 10000 .\build\StackSentry64.exe /run target.exe /max /hunt /network-use-trace /timeout 15000 The monitor installs hooks on a focused set of APIs such as `connect`, `WSAConnect`, `send`, `recv`, `getaddrinfo`, `DnsQuery_*`, `InternetOpenUrl*`, `InternetConnect*`, `HttpOpenRequest*`, `HttpSendRequest*`, and common WinHTTP request/read/write calls. The analyzer then classifies the caller address and stack like a DLL-load event. High-signal cases include network APIs called from executable `MEM_PRIVATE`, modified image-backed code, tampered unwind metadata, or spoofed/unusual stacks. Findings appear under `== Network Use Details ==` and are written to `network_trace.json`, `memory.json`, and `summary.json`. This mode is not part of `/hunt` by default: it is strong, but can be verbose, so I prefer making the analyst enable it explicitly when they want to prove real network API use. ## Memory Audit `/memory-audit` is an opt-in scan of live process memory inspired by Forrest Orr's excellent Moneta research on malicious memory artifacts. It complements StackSentry's event/call-stack model by asking what the process looks like in memory before StackSentry terminates or detaches from it: .\build\StackSentry64.exe /run target.exe /max /memory-audit /timeout 10000 .\build\StackSentry64.exe /run target.exe /max /regtrace /memory-audit /timeout 10000 High-confidence findings appear under `== Memory Audit ==` and may create focused dumps under `memory_audit\`. Lower-confidence hunting context stays in `memory_audit.json` without raising an alert by default, because modern Windows components and resident tools can legitimately create private or modified pages. This mode is intentionally not enabled by `/quick`, `/deep`, or `/max`. ## BYOUD And Shadow Stack Research `/byoud-trace` is a lab mode for DLL-load cases that manipulate Windows x64 unwind metadata instead of obvious return addresses. It observes unwind table APIs, memory protection changes around `.pdata`/`.xdata`/`.rdata`, and temporal metadata divergence before sensitive loader calls: .\build\StackSentry64.exe /run target.exe /max /byoud-trace /regtrace /timeout 12000 `/hunt` includes `/byoud-trace` because the current BYOUD test corpus gives repeatable proof through temporal unwind metadata divergence, and I found that too useful to hide behind a separate flag. `/shadow-stack` is different. It is exposed only as a research/testing switch for systems where Windows exposes user-mode CET/HSP shadow-stack state. It captures CET return frames, compares them against the classic stack as an ordered sequence, and reports hidden, missing, or out-of-order frames. It is not included in `/hunt`, is not counted as mature coverage, and may produce no findings when `XSTATE_CET_U` is unavailable. To be fully honest, I only had limited testing time with this on a friend's laptop, so expect possible rough edges: .\build\StackSentry64.exe /run target.exe /max /shadow-stack /stack-audit /regtrace /timeout 12000 When it works, findings appear under `== Shadow Stack Trace ==` and are written to `shadow_stack_trace.json`. When the platform does not expose the required CET state, the normal `/regtrace`, `/stack-audit`, `/memory-audit`, and `/byoud-trace` layers still carry the detection work. ## ETW Timeline `/etw` is an opt-in lab mode that starts a krabsetw kernel trace before the target's main thread resumes. It records process, thread, and image-load events for the primary PID and child PIDs whose parent is already being tracked. Realistically, it is not the most useful feature in the project, but I thought it was interesting to add: .\build\StackSentry64.exe /run target.exe /max /etw /timeout 10000 .\build\StackSentry64.exe /run loader.exe /max /follow-remote /etw /timeout 15000 This does not replace the monitor DLL detections. It provides a kernel-backed timeline to answer questions like which child process appeared, which DLL was mapped at that moment, and whether remote payload execution lines up with a suspicious loader stage. The timeline is written to `etw_timeline.json` and summarized in the final console output. Kernel ETW collection may require elevation; if Windows refuses the trace, StackSentry reports `/etw` as unavailable and continues normal analysis. ## Individual Hooks .\build\StackSentry64.exe /run target.exe /xhooks .\build\StackSentry64.exe /run target.exe /mem /unwind `/xhooks` enables the most stable callback/thread-start hooks (`CreateThread` and `QueueUserAPC`). `/origin-trace` adds origin correlation, and `/regtrace` adds the heavier target `.text` thread-start tracing path. `/mem` enables noisy memory/API stack telemetry (`NtAllocateVirtualMemory`, `NtProtectVirtualMemory`, `NtMapViewOfSection`, writes, thread creation, and APC queueing) and is intentionally not enabled by any main profile. `/etw` adds kernel process/thread/image-load timeline telemetry through krabsetw. `/tp` and `/wait` are separated because `Tp*` and `WaitFor*` hooks can destabilize some targets. And yes, they really can destabilize things, so use them according to the target. Legacy arguments (`-e`, `--out`, `--rules`, `--timeout-ms`, and `--experimental-hooks`) still work. ## Console Output Console output is grouped by analysis block (`DLL LOAD ANALYSIS`, `MEMORY API TELEMETRY`, `CALLBACK/THREAD ANALYSIS`, and related sections). By default, the console shows only alerts so DLL-load findings do not get buried under routine telemetry. Use `/verbose` when you also want non-alert events. `events.jsonl` still receives the complete event stream. In practice, the console tries not to become a novel. The summary stays readable, and raw detail remains in JSON files for anyone who wants to dig later. Useful output flags: - `/no-target-output`: does not mix target stdout/stderr into the StackSentry console. - `/inline-stack`: prints the full stack in one line, without compacting repeated frames. - `/full-stack`: prints frames one per line with module offsets and disables `[module xN]` compaction. - `/quiet`: writes artifacts and reduces console UI. - `/plain`, `/live`, and `/no-color`: tune animation/color because terminals have opinions. At the end, the `Memory` block lists each watched DLL load with the loaded module base address and the selected caller/origin address from the stack. Those addresses are also written to `memory.json` and `summary.json`. .\build\StackSentry64.exe .\build\StackSentry64.exe /help .\build\StackSentry64.exe /version .\build\StackSentry64.exe /run target.exe /live /verbose .\build\StackSentry64.exe /run target.exe /plain /no-color .\build\StackSentry64.exe /run target.exe /quiet ## Outputs Each run creates a per-process directory inside the selected `/out` path: out\loader\loader_binary.exe - 24216\ The console tries to show the important parts first, so you do not need to open every JSON file to understand a simple run. But when you want to validate a detection or compare StackSentry with another tool, the artifacts are worth checking: - `summary.json`, `memory.json`, and `events.jsonl` store the summary, events, and decisions that supported the alert. - `origin_regions\` stores focused memory windows around regions origin tracing linked to a hidden load. - `dumps\`, `memory_audit\`, `modified_modules\`, and `modified_network_modules\` store preserved bytes for later analysis. - `network_trace.json`, `byoud_trace.json`, `shadow_stack_trace.json`, and `etw_timeline.json` appear when their corresponding modes are used. - `children\` stores per-PID artifacts when `/follow-remote` follows execution into another process. One detail worth calling out: when StackSentry preserves a modified module, it also writes a `.tag` file next to the dump.
That `.tag` is a simple diff map with offsets/bytes changed relative to the file on disk. For code caves, module stomping, or temporarily modified images, this is often more useful than only having the full module dump. Exit codes: - `0`: no alerts. - `10`: at least one alert was generated. - `1`/`2`: runtime error, target crash, argument error, or configuration error. ## Rules `config\rules.json` shows the supported format: { "schema_version": 3, "network_modules": ["ws2_32.dll", "wininet.dll", "winhttp.dll", "dnsapi.dll", "iphlpapi.dll"], "dotnet_modules": ["clr.dll", "coreclr.dll", "mscoree.dll", "System.Management.Automation.dll"], "alert_on_unbacked_executable": true, "alert_on_backed_modified": true, "dump_suspicious_regions": true, "analyze_dumps": true, "carve_embedded_pe": true, "module_integrity_enabled": true, "enable_sleep_hooks": true, "enable_msgwait_hook": true, "enable_wait_object_hooks": false, "enable_thread_start_hooks": false, "enable_threadpool_hooks": false, "experimental_hooks": false, "memory_api_hooks": false, "unwind_integrity": true, "unwind_table_hooks": false, "origin_trace": false, "register_trace": false, "follow_remote": false, "net_reset": false, "etw_telemetry": false, "ldr_integrity": false, "dispatch_trace": false, "threadpool_chain_trace": false, "stack_audit": false, "memory_audit": false, "byoud_trace": false, "shadow_stack": false, "network_use_trace": false, "callsite_validation": true, "alert_on_unwind_tamper": true, "correlation_window_ms": 5000, "max_dump_bytes": 16777216, "long_sleep_ms": 1000 } Use another file with `--rules path\to\rules.json`. The old `watch_dlls` field is still accepted as a compatibility alias. You probably will not see it in use, because basically only I tested that path, and why not remove it? I am tired. ## Current Limits I could try to sell this project as if it solved everything, but that would be dishonest. After hundreds of tests, it is clearly strong at what it was built to do, but it still depends on the target. Like every tool in this area, it has real limitations; some obvious, some less so: - This is still user-mode instrumentation. A strong target can obviously detect/remove hooks or use execution paths the monitor does not observe. - `WaitForSingleObject/WaitForMultipleObjects` and `Tp*` hooks exist, but stay outside `/deep` and `/max` because they destabilized some test targets. - Memory API hooks can be noisy in environments with benign injection/hooking engines, so they are enabled only when `/mem` is passed. - `Register tracing` is opt-in lab telemetry. It was designed to reveal origins hidden by proxy/gadget loaders, not to produce a complete instruction trace. - `/follow-remote` depends on user-mode hooks observing remote-execution setup. Complete direct/indirect syscall chains can still bypass those hooks. - `/net-reset` can fail when a DLL is statically imported, held by refcount, or actively in use. When that happens, StackSentry reports the reset failure and continues normally. - `/etw` is context telemetry, not detection proof by itself. It may require elevation and can miss events that happened before the trace started. There is only so much to do there. - `/memory-audit` runs while the target is still alive, usually at timeout/keep-alive. If the target exits before that, there may be no live address space left to scan. - LDR integrity treats an `EntryPoint` outside the module image or outside `MEM_IMAGE` as the primary proof. `OriginalBase` is reported as context because that field is layout-sensitive across Windows builds and should not be trusted alone. I noticed that while testing on another laptop, so I kept this conservative. - Zydis improves instruction decoding, but callsite validation is still heuristic because a return address alone does not prove real control-flow history. - Module comparison is "PE-sieve-inspired", but clearly simplified: it compares executable and unwind sections against disk and does not try to model every legitimate relocation/hook case. - The memory audit is inspired by Moneta-style artifact classes, but conservative in the console: weak private-page evidence is written as hunting context unless tied to a stronger anomaly. - `/shadow-stack` is a research switch for CET/HSP experiments and may stay silent when the target or platform does not expose user-mode shadow-stack state. - Advanced stack spoofing is not impossible to bypass in ring3. The tool raises the cost by correlating callbacks, callsites, module integrity, captured callsite bytes, unwind metadata, and optional memory/API telemetry when `/mem` is enabled. ## License This project is distributed under the MIT License (Modify It Tonight). Use it, modify it, break it in a lab, fix it, compare it, publish results, do what you need.
If it saves you a few hours of analysis, I am already happy. Coffee is accepted, though.
`English`
•
`日本語` •
`Portuguese` •
`Türkçe`

StackSentry started from a simple idea that kept getting more stubborn: if a loader tries to hide where a `LoadLibrary` came from, the call stack probably still leaves a clue somewhere. This project is an x64 user-mode research tool for memory triage, loader analysis, and sensitive DLL-load detection. It starts a target process, injects a lightweight monitor DLL, and watches important events as they happen, trying to answer one direct question: **who actually caused this DLL load or network use?**
StackSentry started from a simple idea that kept getting more stubborn: if a loader tries to hide where a `LoadLibrary` came from, the call stack probably still leaves a clue somewhere. This project is an x64 user-mode research tool for memory triage, loader analysis, and sensitive DLL-load detection. It starts a target process, injects a lightweight monitor DLL, and watches important events as they happen, trying to answer one direct question: **who actually caused this DLL load or network use?**
## The Idea StackSentry is primarily focused on fast triage of code running in memory. The reasoning behind it is practical: a C2, RAT, or fileless loader can hide the file on disk, encrypt itself while sleeping, and build a nice-looking call stack, but at some point it still needs to load or talk through a network DLL. On Windows, that usually means DLLs like `ws2_32.dll`, `wininet.dll`, `winhttp.dll`, `dnsapi.dll`, or APIs exported by them. If those DLLs are loaded or used from a strange origin, it is worth stopping and looking closer. This pattern did not come out of nowhere. It lines up directly with ideas already used in behavioral detection, such as these Elastic rules: - [`defense_evasion_library_loaded_via_a_callback_function.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_library_loaded_via_a_callback_function.toml): identifies a library load through a callback, possibly to hide the real origin of the `LoadLibrary` call from the call stack. - [`defense_evasion_network_module_loaded_from_suspicious_unbacked_memory.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_network_module_loaded_from_suspicious_unbacked_memory.toml): identifies a network module load when the thread stack contains frames outside known executable images, a common pattern in in-memory execution. - [`defense_evasion_library_loaded_from_a_spoofed_call_stack.toml`](https://github.com/elastic/protections-artifacts/blob/6e9ee22c5a7f57b85b0cb063adba9a3c72eca348/behavior/rules/windows/defense_evasion_library_loaded_from_a_spoofed_call_stack.toml): detects a library load from a potentially altered call stack used to conceal the real source of the call. Because detections like these exist, some loaders now avoid the obvious path. Instead of calling `LoadLibrary` directly from private memory, they try to hide the origin with callbacks, gadgets inside legitimate modules, threadpool chains, cross-thread dispatch, execution from modified images, or even manipulated unwind metadata. StackSentry takes that idea and pushes it further in a lab setting: not only saying that a sensitive DLL was loaded, but also showing the probable origin, the memory involved, the stack state, useful dumps, and, when possible, the path the loader tried to hide. ## What It Looks For ## Detection Gallery The sample commands and expected call-stack summaries are documented in [samples/README.md](samples/README.md).
Quick warning: the samples are rough lab builds I made for local testing, so do not expect polished showcase binaries.
Below are a few patterns StackSentry can show in the terminal without needing a kernel driver. Some images still show output captured around the `v0.8` phase. Since then I improved console rendering, stack compaction, and noise reduction, but I did not think it was honest to call that `v0.9` just because the output got cleaner. If a formatting detail differs slightly from the current version, that is why. ### SilentMoonwalk With Synthetic Stack This test uses a modified variant of [klezVirus/SilentMoonwalk](https://github.com/klezVirus/SilentMoonwalk) to load a network DLL with a synthetic stack. All visible frames look like legitimate modules, but callsite validation and origin tracing still tie the DLL load back to the code that prepared the call.  ### BYOUD / Unwind Metadata Spoofing   ### Threadpool Callback Chain This sample is based on the idea behind [klezvirus/ThreadPoolExecChain](https://github.com/klezvirus/ThreadPoolExecChain): a threadpool/proxy chain makes the load happen in a context that looks more natural. The report preserves the chain context and marks the modified frames that appear along the path.  ### Image `.text` Proxy Because it is PIC, the code can live inside the loader `.text` and call `LoadLibrary` through a proxy, adding another stack element through an existing gadget in `nvwgf2umx.dll`. The final stack points to a seemingly clean place, but register/origin tracing connects the DLL load back to the `.text` region that started the flow. This proves the detection can hold up even in very specific patterns like this one.  ### Code Cave / Modified Image This pattern is also inspired by a simple experimental variant of my [RefinedPool](https://github.com/Vith0r/RefinedPool/tree/main/RefinedPool) project, converted into PIC shellcode. The sensitive load passes through *bytes written into an image-backed code cave*. The useful part is that StackSentry preserves both the modified module and the changed-byte map in the dump output, which I think is genuinely valuable during triage.  ### SilentMoonwalk RDI With Synthetic Stack Here, the [klezVirus/SilentMoonwalk](https://github.com/klezVirus/SilentMoonwalk) variant is packed as a [Donut](https://github.com/TheWover/donut)/RDI payload. The bootstrap may load DLLs such as `wininet.dll` and `mscoree.dll`, but the relevant stage is the `ws2_32.dll` load with a synthetic stack. The first image shows the DLL-load alerts; the second shows the probable origin leading back to the "hidden" executable region.   ### MassDriver-Style Dispatch Inspired by the dispatch pattern from [Sizeable-Bingus/MassDriver](https://github.com/Sizeable-Bingus/MassDriver), a seemingly clean worker thread executes `LoadLibraryA`. `/dispatch-trace` ties the load back to the requester that posted the dispatch structure. It is a very specific detection, but I thought it was interesting enough to include.  ### Network Use Trace In A C2 Payload This example uses `/network-use-trace` to show the case where loading the network DLL is not the only interesting part. The payload also needs to use network APIs, and StackSentry tries to attribute who called `connect`, `WSAConnect`, `send`, `recv`, WinHTTP/WinINet, and related APIs. This can be useful because the output may reveal the real destination, such as domain, IP/port, and even third-party services used in the flow, like `pastebin.com` in this test. The stack in the image is compacted on purpose so it does not become a giant block of repeated `system.ni.dll` frames.
If you prefer the full stack in one line, like in the older screenshots, use `/inline-stack`; if you want frame-by-frame output with offsets, use `/full-stack`.  ### Stack View Modes Beyond the detections themselves, the current StackSentry console tries to make analysis less exhausting. The same stack can be shown in different ways depending on what you want to inspect: #### Compact Stack  #### Full Stack With Offsets With `/full-stack`, each frame is printed on its own line with the module offset. This is useful when you want to audit exactly where every return frame landed.  #### Inline Stack Without Compression With `/inline-stack`, the stack goes back to a single-line format without repeated-frame compression.  #### Clean Events With Verbose  #### BackedModified And Memory Audit When a return frame lands inside a real DLL, but the bytes in that region no longer match the file on disk, StackSentry does not treat it as a clean frame. It marks the stack as `BackedModified`/`captured-modified`, and Memory Audit records the module, region, and temporal detail of the change.  ## Build To keep the build simple, I included `build.ps1`. It uses Microsoft Visual Studio/MSVC to compile. If your environment is not exactly the same, it should not be hard to adapt: the script is short, and reading it makes the required compiler calls pretty clear. .\build.ps1 When using the script, build output is written to `build\`: - `StackSentry64.exe` - `CallstackMonitor.dll` ## Test Commands Recommended commands and exact sample commands live in [samples/README.md](samples/README.md).
If you are unsure where to start, begin there; it includes the first-pass command, the stronger profile, stack output modes, and the examples I use to validate the gallery screenshots. ## Master Profiles - `/quick`: stable DLL-load triage profile. Good for first runs, benign baselines, and lower-noise output. - `/deep`: DLL-load hunting profile. Enables stable callback/thread-start hooks, unwind table hooks, LDR integrity checks, dumps, carving, and a longer correlation window. Memory/API telemetry stays off unless `/mem` is passed. - `/max`: strongest practical DLL-load profile. Enables deep telemetry, stack audit, LDR checks, and `/auto-enter` by default, but leaves memory/API, wait, and threadpool hooks disabled unless explicitly requested. - `/profile
The full `/features` output is grouped by intent so new users do not have to treat every flag as equally important: - `Common options`: output directory, timeout, keep-alive, stdin automation, and verbosity. - `Output style`: quiet/plain/live/color controls, target output suppression, `/inline-stack`, and `/full-stack`. - `Origin / proxy analysis`: `/regtrace`, `/dispatch-trace`, and `/threadpool-chain-trace` for hidden-caller and proxy-loading cases. - `Network use analysis`: `/network-use-trace` and `/net-use-trace` for already-loaded network DLL reuse. - `Remote / multi-process`: `/follow-remote` and `/net-reset` for loaders that move execution into another process. - `Extra telemetry / integrity`: `/etw`, `/ldr-integrity`, `/unwind`, `/stack-audit`, `/memory-audit`, `/byoud-trace`, and `/shadow-stack`. - `Aggressive / low-level hooks`: `/mem`, `/tp`, `/wait`, and direct `/xhooks`. - `Advanced config`: custom rules/configuration. ## Origin Tracing Proxy DLL-load techniques can make the final call stack look clean, including by placing an existing image gadget between the real loader code and `LoadLibrary`. StackSentry keeps the default profiles focused and low-noise, but adds opt-in modes for those cases: .\build\StackSentry64.exe /run .\samples\sample_03_text_section_proxy.exe /max /origin-trace /no-target-output /timeout 9000 .\build\StackSentry64.exe /run .\samples\sample_03_text_section_proxy.exe /max /regtrace /no-target-output /timeout 9000 Technical note: `/regtrace` intentionally avoids full register tracing over non-main image modules larger than 32 MB. Private executable memory, thread starts, dynamic executable transitions, and origin correlation are still traced; this only avoids expensive full instrumentation of huge gadget-carrier images. I had to do this because tracing those modules was hurting normal analysis more than it helped. If you really need a higher ceiling for a lab case, it is just a small constant in the source. Threadpool hooks remain explicit. Add `/tp` only when you want `TpAllocWork/TpPostWork` telemetry. When `/regtrace` correlates a clean gadget load back to executable `MEM_IMAGE` code, StackSentry writes an artifact under `origin_regions\`. This is separate from `dumps\`: it is not an unbacked allocation, but a focused memory window around the traced `.text` origin, with a sidecar JSON containing origin VA/RVA, visible gadget caller, SHA256, entropy, and selected strings. The `LdrLoadDll` hook also records origin evidence from live `UNICODE_STRING` arguments and proxy parameter blocks. This covers simple bypasses that enter `LoadLibrary*` after the instrumented prologue but still reach `ntdll!LdrLoadDll`. ## LDR Integrity StackSentry also detects loader entrypoint hijacking. This covers techniques such as LdrShuffle/EPI, where a module still looks legitimate in the PEB loader lists, but its `LDR_DATA_TABLE_ENTRY.EntryPoint` is changed to attacker-controlled code. .\build\StackSentry64.exe /run target.exe /ldr-integrity /timeout 10000 `/deep` and `/max` enable this check. `/quick` keeps it disabled. The analyzer reports the hijacked module, current entrypoint, expected PE entrypoint, memory type/protection for the current entrypoint, and context such as suspicious `OriginalBase` when an entrypoint anomaly already exists. ## Network Use Trace .\build\StackSentry64.exe /run target.exe /max /network-use-trace /timeout 10000 .\build\StackSentry64.exe /run target.exe /max /hunt /network-use-trace /timeout 15000 The monitor installs hooks on a focused set of APIs such as `connect`, `WSAConnect`, `send`, `recv`, `getaddrinfo`, `DnsQuery_*`, `InternetOpenUrl*`, `InternetConnect*`, `HttpOpenRequest*`, `HttpSendRequest*`, and common WinHTTP request/read/write calls. The analyzer then classifies the caller address and stack like a DLL-load event. High-signal cases include network APIs called from executable `MEM_PRIVATE`, modified image-backed code, tampered unwind metadata, or spoofed/unusual stacks. Findings appear under `== Network Use Details ==` and are written to `network_trace.json`, `memory.json`, and `summary.json`. This mode is not part of `/hunt` by default: it is strong, but can be verbose, so I prefer making the analyst enable it explicitly when they want to prove real network API use. ## Memory Audit `/memory-audit` is an opt-in scan of live process memory inspired by Forrest Orr's excellent Moneta research on malicious memory artifacts. It complements StackSentry's event/call-stack model by asking what the process looks like in memory before StackSentry terminates or detaches from it: .\build\StackSentry64.exe /run target.exe /max /memory-audit /timeout 10000 .\build\StackSentry64.exe /run target.exe /max /regtrace /memory-audit /timeout 10000 High-confidence findings appear under `== Memory Audit ==` and may create focused dumps under `memory_audit\`. Lower-confidence hunting context stays in `memory_audit.json` without raising an alert by default, because modern Windows components and resident tools can legitimately create private or modified pages. This mode is intentionally not enabled by `/quick`, `/deep`, or `/max`. ## BYOUD And Shadow Stack Research `/byoud-trace` is a lab mode for DLL-load cases that manipulate Windows x64 unwind metadata instead of obvious return addresses. It observes unwind table APIs, memory protection changes around `.pdata`/`.xdata`/`.rdata`, and temporal metadata divergence before sensitive loader calls: .\build\StackSentry64.exe /run target.exe /max /byoud-trace /regtrace /timeout 12000 `/hunt` includes `/byoud-trace` because the current BYOUD test corpus gives repeatable proof through temporal unwind metadata divergence, and I found that too useful to hide behind a separate flag. `/shadow-stack` is different. It is exposed only as a research/testing switch for systems where Windows exposes user-mode CET/HSP shadow-stack state. It captures CET return frames, compares them against the classic stack as an ordered sequence, and reports hidden, missing, or out-of-order frames. It is not included in `/hunt`, is not counted as mature coverage, and may produce no findings when `XSTATE_CET_U` is unavailable. To be fully honest, I only had limited testing time with this on a friend's laptop, so expect possible rough edges: .\build\StackSentry64.exe /run target.exe /max /shadow-stack /stack-audit /regtrace /timeout 12000 When it works, findings appear under `== Shadow Stack Trace ==` and are written to `shadow_stack_trace.json`. When the platform does not expose the required CET state, the normal `/regtrace`, `/stack-audit`, `/memory-audit`, and `/byoud-trace` layers still carry the detection work. ## ETW Timeline `/etw` is an opt-in lab mode that starts a krabsetw kernel trace before the target's main thread resumes. It records process, thread, and image-load events for the primary PID and child PIDs whose parent is already being tracked. Realistically, it is not the most useful feature in the project, but I thought it was interesting to add: .\build\StackSentry64.exe /run target.exe /max /etw /timeout 10000 .\build\StackSentry64.exe /run loader.exe /max /follow-remote /etw /timeout 15000 This does not replace the monitor DLL detections. It provides a kernel-backed timeline to answer questions like which child process appeared, which DLL was mapped at that moment, and whether remote payload execution lines up with a suspicious loader stage. The timeline is written to `etw_timeline.json` and summarized in the final console output. Kernel ETW collection may require elevation; if Windows refuses the trace, StackSentry reports `/etw` as unavailable and continues normal analysis. ## Individual Hooks .\build\StackSentry64.exe /run target.exe /xhooks .\build\StackSentry64.exe /run target.exe /mem /unwind `/xhooks` enables the most stable callback/thread-start hooks (`CreateThread` and `QueueUserAPC`). `/origin-trace` adds origin correlation, and `/regtrace` adds the heavier target `.text` thread-start tracing path. `/mem` enables noisy memory/API stack telemetry (`NtAllocateVirtualMemory`, `NtProtectVirtualMemory`, `NtMapViewOfSection`, writes, thread creation, and APC queueing) and is intentionally not enabled by any main profile. `/etw` adds kernel process/thread/image-load timeline telemetry through krabsetw. `/tp` and `/wait` are separated because `Tp*` and `WaitFor*` hooks can destabilize some targets. And yes, they really can destabilize things, so use them according to the target. Legacy arguments (`-e`, `--out`, `--rules`, `--timeout-ms`, and `--experimental-hooks`) still work. ## Console Output Console output is grouped by analysis block (`DLL LOAD ANALYSIS`, `MEMORY API TELEMETRY`, `CALLBACK/THREAD ANALYSIS`, and related sections). By default, the console shows only alerts so DLL-load findings do not get buried under routine telemetry. Use `/verbose` when you also want non-alert events. `events.jsonl` still receives the complete event stream. In practice, the console tries not to become a novel. The summary stays readable, and raw detail remains in JSON files for anyone who wants to dig later. Useful output flags: - `/no-target-output`: does not mix target stdout/stderr into the StackSentry console. - `/inline-stack`: prints the full stack in one line, without compacting repeated frames. - `/full-stack`: prints frames one per line with module offsets and disables `[module xN]` compaction. - `/quiet`: writes artifacts and reduces console UI. - `/plain`, `/live`, and `/no-color`: tune animation/color because terminals have opinions. At the end, the `Memory` block lists each watched DLL load with the loaded module base address and the selected caller/origin address from the stack. Those addresses are also written to `memory.json` and `summary.json`. .\build\StackSentry64.exe .\build\StackSentry64.exe /help .\build\StackSentry64.exe /version .\build\StackSentry64.exe /run target.exe /live /verbose .\build\StackSentry64.exe /run target.exe /plain /no-color .\build\StackSentry64.exe /run target.exe /quiet ## Outputs Each run creates a per-process directory inside the selected `/out` path: out\loader\loader_binary.exe - 24216\ The console tries to show the important parts first, so you do not need to open every JSON file to understand a simple run. But when you want to validate a detection or compare StackSentry with another tool, the artifacts are worth checking: - `summary.json`, `memory.json`, and `events.jsonl` store the summary, events, and decisions that supported the alert. - `origin_regions\` stores focused memory windows around regions origin tracing linked to a hidden load. - `dumps\`, `memory_audit\`, `modified_modules\`, and `modified_network_modules\` store preserved bytes for later analysis. - `network_trace.json`, `byoud_trace.json`, `shadow_stack_trace.json`, and `etw_timeline.json` appear when their corresponding modes are used. - `children\` stores per-PID artifacts when `/follow-remote` follows execution into another process. One detail worth calling out: when StackSentry preserves a modified module, it also writes a `.tag` file next to the dump.
That `.tag` is a simple diff map with offsets/bytes changed relative to the file on disk. For code caves, module stomping, or temporarily modified images, this is often more useful than only having the full module dump. Exit codes: - `0`: no alerts. - `10`: at least one alert was generated. - `1`/`2`: runtime error, target crash, argument error, or configuration error. ## Rules `config\rules.json` shows the supported format: { "schema_version": 3, "network_modules": ["ws2_32.dll", "wininet.dll", "winhttp.dll", "dnsapi.dll", "iphlpapi.dll"], "dotnet_modules": ["clr.dll", "coreclr.dll", "mscoree.dll", "System.Management.Automation.dll"], "alert_on_unbacked_executable": true, "alert_on_backed_modified": true, "dump_suspicious_regions": true, "analyze_dumps": true, "carve_embedded_pe": true, "module_integrity_enabled": true, "enable_sleep_hooks": true, "enable_msgwait_hook": true, "enable_wait_object_hooks": false, "enable_thread_start_hooks": false, "enable_threadpool_hooks": false, "experimental_hooks": false, "memory_api_hooks": false, "unwind_integrity": true, "unwind_table_hooks": false, "origin_trace": false, "register_trace": false, "follow_remote": false, "net_reset": false, "etw_telemetry": false, "ldr_integrity": false, "dispatch_trace": false, "threadpool_chain_trace": false, "stack_audit": false, "memory_audit": false, "byoud_trace": false, "shadow_stack": false, "network_use_trace": false, "callsite_validation": true, "alert_on_unwind_tamper": true, "correlation_window_ms": 5000, "max_dump_bytes": 16777216, "long_sleep_ms": 1000 } Use another file with `--rules path\to\rules.json`. The old `watch_dlls` field is still accepted as a compatibility alias. You probably will not see it in use, because basically only I tested that path, and why not remove it? I am tired. ## Current Limits I could try to sell this project as if it solved everything, but that would be dishonest. After hundreds of tests, it is clearly strong at what it was built to do, but it still depends on the target. Like every tool in this area, it has real limitations; some obvious, some less so: - This is still user-mode instrumentation. A strong target can obviously detect/remove hooks or use execution paths the monitor does not observe. - `WaitForSingleObject/WaitForMultipleObjects` and `Tp*` hooks exist, but stay outside `/deep` and `/max` because they destabilized some test targets. - Memory API hooks can be noisy in environments with benign injection/hooking engines, so they are enabled only when `/mem` is passed. - `Register tracing` is opt-in lab telemetry. It was designed to reveal origins hidden by proxy/gadget loaders, not to produce a complete instruction trace. - `/follow-remote` depends on user-mode hooks observing remote-execution setup. Complete direct/indirect syscall chains can still bypass those hooks. - `/net-reset` can fail when a DLL is statically imported, held by refcount, or actively in use. When that happens, StackSentry reports the reset failure and continues normally. - `/etw` is context telemetry, not detection proof by itself. It may require elevation and can miss events that happened before the trace started. There is only so much to do there. - `/memory-audit` runs while the target is still alive, usually at timeout/keep-alive. If the target exits before that, there may be no live address space left to scan. - LDR integrity treats an `EntryPoint` outside the module image or outside `MEM_IMAGE` as the primary proof. `OriginalBase` is reported as context because that field is layout-sensitive across Windows builds and should not be trusted alone. I noticed that while testing on another laptop, so I kept this conservative. - Zydis improves instruction decoding, but callsite validation is still heuristic because a return address alone does not prove real control-flow history. - Module comparison is "PE-sieve-inspired", but clearly simplified: it compares executable and unwind sections against disk and does not try to model every legitimate relocation/hook case. - The memory audit is inspired by Moneta-style artifact classes, but conservative in the console: weak private-page evidence is written as hunting context unless tied to a stronger anomaly. - `/shadow-stack` is a research switch for CET/HSP experiments and may stay silent when the target or platform does not expose user-mode shadow-stack state. - Advanced stack spoofing is not impossible to bypass in ring3. The tool raises the cost by correlating callbacks, callsites, module integrity, captured callsite bytes, unwind metadata, and optional memory/API telemetry when `/mem` is enabled. ## License This project is distributed under the MIT License (Modify It Tonight). Use it, modify it, break it in a lab, fix it, compare it, publish results, do what you need.
If it saves you a few hours of analysis, I am already happy. Coffee is accepted, though.