Fun with Exception Handlers
Intro:
This small blog post discusses and demonstrates various VEH related abuse primitives, and how these can be used against EDRs employing VEH hooks.
This blog IS NOT a direct attack against SentinelOne/CrowdStrike even though it might seem like one. Rather it showcases different reasons why VEH hooking should not be considered a "silver-bullet", and how using VEH hooks to gather various telemetry can backfire instead.
Basics
Vectored Exception Handling is a built-in Windows mechanism that allows an application to catch and handle exceptions with a custom handler function before SEH
is called. VEH
s are global to an application, unlike SEH
that is coupled with a thread's stack.
When an exception occurs in usermode, the program quickly transitions to ring-0 and then back to ring-3, eventually calling RtlpCallVectoredHandlers
. RtlpCallVectoredHandlers
will check if there are any entries in the Exception Handler
list, and if there are, will start parsing the linked list, calling each handler function until one returns EXCEPTION_CONTINUE_EXECUTION
. If a registered VEH
returns EXCEPTION_CONTINUE_EXECUTION
, RtlpCallVectoredHandlers
will parse the Continue Handler
list and call each registered VCH
next.
Exception Handler List Location
The handler list itself is located at an static offset from the base address of ntdll:
Win10: ntdll + 0x1813f0
Win11: ntdll + 0x199400
Considering the differences in offsets between Windows 10 and 11, we can search for a specific common instruction using the following function:
Searches for the second "lea rcx ntdll!..." instruction and gets the list location from it
Handler Entries & Encoded Pointers
Each entry is located on the heap. The main member we are after is VectoredHandler
, but others will also be used accordingly when needed.
The pointers to the actual handler functions registered, be it a VEH
or a VCH
, are encoded using a process cookie in combination with ROL
.
API-s that are used and their respective lower level calls are as follows:
EncodePointer
-> RtlEncodePointer
-> NtQueryInformationProcess
+ ROL
DecodePointer
-> RtlDecodePointer
-> NtQueryInformationProcess
+ ROR
EncodeRemotePointer
-> RtlEncodeRemotePointer
-> NtQueryInformationProcess
+ ROL
DecodeRemotePointer
-> RtlDecodeRemotePointer
-> NtQueryInformationProcess
+ ROR
Since NtQueryInformationProcess
gets called everytime, it is also possible to manually encode/decode pointers:
Abusing an already present VEH
With the basic info out of the way, we can start abusing VEHs.
All of the following abuse techniques were tested with SentinelOne and rely on the fact that a VEH has already been registered in our process. When dealing with CrowdStrike, it is important to modify the overall exception handling (dealing with HWBPs, multiple VEHs etc). For debugging and playing around, just register an empty function as a handler.
Free VEH anyone?
One of the easiest, and most powerful things to do is the "hijacking" of a VEH - By simply replacing the pointer to the actual handler function with a pointer to our own function, we can "handle" exceptions ourselves.
We can then easily:
Ignore guard page violations
Manually trigger exceptions and install HWBPs
Handle HWBPs
Do vectored sycalls
First off, we need to define our VEH function itself. We won't be tripping any guard pages for now, but we can manually trigger an exception and then handle it from our new VEH as a demonstration.
The overall logic of our OverWrite function is straight forward:
Get the handler list location
Copy the handler list contents
Overwrite the existing pointer in the first registered
VEH
We can verify that the pointer to the actual handler function was changed, and that our manually initiated exception was successfully handled.
Since we overwrote the first VEH
, we can basically do anything we want with exceptions. We in theory could even feed the EDR fake info by modifying register values and returning EXCEPTION_CONTINUE_SEARCH
(when there are other EDR VEHs behind us). It is also possible to edit the FirstExceptionHandler
in ntdll, but there are some caveats to this that will be shortly talked about in the end.
Exceptional Local Exec
For smaller payloads, we can simply make the handler function point to our shellcode instead. The only problem with this is that since we are running shellcode in an exception, we would need to handle the specific exception accordingly if we wanted to continue normal execution afterwards. There's also the likely possibilty of causing another exception while IN an exception, which would simply mean that RtlpCallVectoredHandlers
would try to execute our "handler" again.
Manual Addition
Structs related to added VEH
s are stored on the heap. This simply means that we have the privilege of adding our own handler into the list by allocating our own VEH_HANDLER_ENTRY
struct and modifying the Flink/Blinks of an existing entry.
First we allocate the struct on the heap:
Then update the members of our new entry:
It is important to copy the SyncRef
from an existing entry (or just provide a pointer to your own int). This value gets used later on for synchronization/reference counting (SyncRef
gets loaded into RAX
):
And finally we need to modify the Flink of the existing entry so it points to our struct instead of the list start:
We can see our manually added VEH in the list, and like before, we handle the exception accordingly.
This demonstrates that even if an EDR, let's say, adds integrity checks for its VEH
function pointers, we can silently add our own VEH into the list. One obvious downside is that the VEH registered by an EDR is located in front of our VEH
, and can therefore check register values before passing it on to our VEH (it might even handle some exceptions, so it is important to choose what sort of exceptions to trigger for HWBP
installation etc).
Combining the pieces
If we do not want to unhook the hooking dll/modify our C2, we need to combine the techniques together to successfully pop a C2 payload. We will modify our VEH
function so we can ignore GUARD_PAGE
exceptions and do vectored syscalls by causing an ACCESS_VIOLATION
.
It should be clearly noted that the first VEH
needs to handle every exception that is caused:
By the hooking dll
By us doing vectored syscalls
The second "VEH", that we will manually add, will point to our C2 shellcode instead. We should only reach there once by causing an exception the first VEH
will not handle.
Here's a flowchart to better visualize this:
Now when we trigger an INT_DIVIDE_BY_ZERO
, the first VEH will not handle it. When RtlpCallVectoredHandlers
parses the linked list and calls our second "VEH" to check if it's programmed to handle the exception, it will execute our C2 shellcode instead.
Now running in an exception, the first VEH will handle all GUARD_PAGE
exceptions by returning EXCEPTION_CONTINUE_EXECUTION
- this is critical, as executing our second VEH again would be fatal.
The binary in the video is just a less shitty private version of the certified hood classic SentinelBruh with some modifications. Pretty much default Havoc was used.
It should be also noted that the payloads should be tested thoroughly before executing them in an exception. When shellcode is ran in an exception, the stack would be something like this:
However if you are using stack spoof or similar with CobaltStrike, your stack could look like this: (this is after overwriting the FIRST VEH function pointer and causing an exception):
(Obviously different configuration will play a big role, this is just an example what could happen)
This is the stack after overwriting the SECOND VEH function pointer (with the same config):
Stack after executing a demon payload as can be seen in the video:
(Ekko using "jmp rbx" and stack duplication)
The biggest drawback of running a C2 agent in an exception is the stability - inline-ExecuteAssembly
/dotnet inline-execute
can easily crash your beacon/demon. Same thing can happen when executing BOFs.
When I did CRTL, I only used stageless runners based on the VEH overwrite abuse primitive (I hated that webproxy). The biggest downside to it was that with my config I couldn't inline execute C# tooling, as that would've simply killed my beacon. However, running a beacon in an exception seemed to have its benefits - I was able to use execute-assembly
, inject
etc without ever needing an inject kit.
Ghetto Syscalls
One additional thing to note about Vectored Exception Handling is the possibilty of registering "callbacks".
When a Vectored EXCEPTION Handler is set up, it is also possible to register a Vectored CONTINUE Handler. VCH
is somewhat similar to a VEH
- Function prototypes are identical and we have the privilege of updating our CONTEXT
. The only difference is that all VCHs are called by RtlpCallVectoredHandlers
AFTER a VEH
has returned EXCEPTION_CONTINUE_EXECUTION
(RtlpCallVectoredHandlers
returned TRUE).
When RtlpCallVectoredHandlers
is called again, this time with the value of 1
in RBX
, RAX
and R8
, we can see it indirectly reference the VCH list:
For reference:
We can define our continue handler like so:
And then register it:
We can safely use AddVectoredContinueHandler
since unlike AddVectoredExceptionHandler
, it isn't hooked. By default, both RtlAddVectoredExceptionHandler
and RtlAddVectoredContinueHandler
simply execute a single instruction and then jmp
to RtlpAddVectoredHandler
.
We can quickly confirm our VCH function is getting called by trying to execute some Havoc
shellcode.
Given the fact that a GUARD_PAGE
exception is only thrown on some specific memory access, it can be challenging, or more like impossible to get all the parameters needed for a syscall passed to us within registers/stack. Because of this, all of the needed parameters are thrown onto the heap and accessed via global pointer. A simple global counter is also used to determine if we need to update the registers/stack, and if so, with what parameters.
This updated ghetto VCH
will simply allocate RWX
memory with NtAllocateVirtualMemory
and create a thread using NtCreateThreadEx
.
This is obviously shitpost tier code, please don't do something like this unironically
Now when triggering a GUARD_PAGE
exception on purpose, our VCH
gets called. It is important to tease the EDR just enough but at the same time not too much so we don't get nuked.
The EDR's VEH
will go over all of our stuff and return EXCEPTION_CONTINUE_EXECUTION
as it didn't find anything suspicious enough. However, little does the EDR know that it just handed us the privilege of modifying the CONTEXT
of our thread in a VCH
.
In this video, PEB and Havoc were slightly modified as to not trip any additional page guards
So if you have some ultra suspicious syscall to make, you can manually trigger an exception an EDR's VEH
checks for and then just hope it returns EXCEPTION_CONTINUE_EXECUTION
.
After you are done, remove the continue handler and continue like nothing happened.
Process Injection - VEHxPool
The following is a small sample from one of my disbanded projects called VEHxPool. The whole idea of VEHxPool was to target msedge
(since it registers a VEH
by default), and to combine the VEH
overwrite abuse primitive with various PoolParty
related process injection techniques.
Both of these examples are rather "cursed" - After doing mapping, the region is clearly visible, but after stealing a worker thread with the help of an exception, the region simply disappears and a new suspicious private region appears in our target instead.
This only gets annoying when you need to make use of your allocated VEH
later on, but this can be easily fixed by allocating separate memory only meant for the VEH
.
Dubious Completion - I/O Ports
We start off by getting all of the open handles on the system, and then duplicating the first IoCompletion
handle belonging to our target msedge
instance. We can quickly filter by ObjectTypeIndex
and UniqueProcessId
:
Most ObjectTypeIndex values depend on Windows version
After we have successfully duped the handle, we can do some mapping. The shellcode we throw into mapped memory should hold both the VEH function and C2 payload for now:
We can use the following VEH function for now:
Mapping was chosen as we can then easily update our placeholder address in memory.
After our initial setup, we can overwrite the VEH
in msedge
:
Now the only thing left is to actually use our duped handle.
Instead of allocating a TP_DIRECT
struct in our target, setting its Callback
member to our C2 payload location and sending a pointer to said struct, we can instead send an arbitrary address pointing towards, for example, some function in ntdll.
Now when we send the completion packet, an EDR will only see addresses backed to disk/random values (if they even check them). When a worker thread starts to deal with our "packet", it will instead throw an exception, causing the execution flow to be redirected to our C2 shellcode.
This method overall is rather unreliable, it seems to require setting up somewhat specific delays and hitting timings. Didn't test this against S1, but here's the memory stuff I mentioned earlier:
HLT! You're Coming with Me
Instead of overwriting the StartRoutine
of a worker factory with our C2 payload, we can overwrite a single instruction to redirect a newly spawn thread to our code instead (single byte trampoline).
By default, the StartRoutine
of worker factories is TppWorkerThread
, so depending which option you choose (discussed shortly), duplicating a TpWorkerFactory
handle might not be needed as we can get the afromentioned StartRoutine
address in our local process.
We can simply overwrite the push rdi
instruction with something that will cause an exception in the newly spawned thread. For this, the hlt
instruction was chosen, as the opcode for it is a single byte (0xf4). We will also update our VEH
to redirect execution flow to our code when an EXCEPTION_PRIV_INSTRUCTION
is thrown:
As this example is more stable compared to the previous one, it was also tested against S1. Only thing we have to do differently is to allocate a separate memory region for the VEH
to ignore PAGE_GUARD
exceptions our payload will trigger later on. Here's another schizo flowchart (MEM #2 gets moved to private memory, but you know that already):
And now, we have two options:
Wait until a new thread naturally spawns, and use synchronization objects so we can revert the startroutine when it occurs
Duplicate a
TpWorkerFactory
handle, change memory permissions to RWX, increment theTotalWorkerCount
of our target to force the immediate creation of a new worker thread, and then revert the startroutine (we'll go for this)
After stealing a thread, we can easily revert the startroutine as the modified opcode acts like a trampoline in our case.
One big thing to note that when hijacking a worker thread by forcing it into raising an exception, is the cursed stack you will get:
MISC
Detecting a VEH Hooking EDR
One easy way of figuring out if you are against S1/CS is to simply check the Handler List. If FirstExceptionHandler
is not pointing back to itself, you know that there are VEH
hooks in play.
If you have the base address of ntdll, you can simply use static offsets so you don't need to use any API-s
After you have determined that there are hooks present, highly specific checks can be made to determine the exact EDR vendor.
Anti Anti-Debug Club
When dynamically analyzing malware that employs VEHs to detect HWBPs etc, it is possible to simply overwrite its VEH
as demonstrated.
Stuff
In all of the examples that dealt with VEH
overwriting or similar, ntdll itself wasn't touched as the location where the list is located at is R/O by default. It is possible to adjust the pointer to the first handler by changing permissions etc, but this can have funky side effects - after simply changing permissions from R -> RW -> R, doing vectored syscalls later on by causing an ACCESS_VIOLATION
would force our sillyware to execute NtTerminateProcess
instead.
But something like this will work (just don't forget to flip the ProcessUsingVEH
bit in PEB
):
C#
When you write anything useful in C#, the 64bit version of clr.dll
is loaded, which in turn will give you access to a free VEH
to abuse. Wanted to include some C# but it got rather cursed. Just don't try to get vectored syscalls working in C#.
Outro
Hope this small blog helped you understand VEHs better. As stated in the intro, this should not be considered as an attack on S1/CS.
The main reason why I wrote this was due to the fact that there aren't many VEH related articles around.
The second reason is that when developing my own EDR driver, I quickly realized just how bad VEH hooking is. In addition to the "normal" performance degradation that comes with VEH hooking, adding VEH tampering checks would cause even further degradation, making it an unviable option from an actual EDR standpoint. I get it from an EDR dev perspective; it's not hard to implement, and it gets the job done. I understand that detection engineering is a pain in the ass and that Windows itself is malware, but please, ditch VEH hooks, come up with better ideas, and move to the kernel instead.
If you made it this far, hope you learned something new. Hope you can now write cool sillyware by combining VEH
stuff with other things.
Further Reading
Exploiting Windows Thread Pools
Shoutout to the Homies
l1inear, 0xTriboulet, 5pider, VirtualAllocEx, Urien, mrd0x, NULL
Last updated