Search
Blog article

Loader Dev. 3 – Evading userspace hooks

Introduction

Evading userspace hooks (Loader Dev. 3)

In this post, we will go over techniques to avoid hooks placed into memory by an EDR.

Disclaimer

These posts are written to provide information to other professionals of the discussed topics.

The techniques used here are not novel and were documented by other people before. Therefore, the benefits of these posts for threat actors will likely be minimal. Nonetheless, we decided against releasing a full PoC implementation and will instead only provide code snippets as part of the posts. All credit should go to the people who did the original research on the techniques used.

There will also be an accompanying blog post on detecting or hunting for malware using the discussed techniques to enable readers to protect their environment.

Hooks

We will start by discussing how hooks work. This information will serve as background knowledge for the later sections in which we will explain how to circumvent these hooks.

IAT Hooks

One way to hook functions is to replace the address of the function in the Import Address Table (IAT) with the address of the hooking logic. If our executable then uses the IAT to resolve a function, it will invoke the hooking logic instead. However, this is not that relevant for our purposes here, as we are not using the IAT for our function resolution. Our understanding is that most EDRs are preferring trampoline hooks over IAT hooks. We will discuss trampoline hooks in the next section.

Inline/Trampoline hooks

Instead of hooking the IAT, we can also overwrite the start of a function with a jump to our hooking logic. If the function is then called, the jump to our logic is executed first and we can analyze the function call before handing over the execution to the original logic. This is shown in the following figure:

Kolja Grassmann

Consultant

Category

Date

Navigation

Hooks in the real world

So let’s have a short look at how this looks in the real world. For this, we will use an unnamed EDR, which hooks certain functions. One hooked function is NtCreateThread. Let us first look at this function without the hook in place:

With the hook in place the function looks as follows:

As we can see, there is an unconditional jump added at the start of the functions. In this case, the remaining space is filled with int3 instructions. Note that there are more instructions in the second screenshot, as instructions can differ in size on Intel architectures. The jump will allow the EDR to analyze our function call and its argument before doing the actual NtCreateThread syscall.

Avoiding hooks

Now that we have covered how EDR products might hook certain functions, we will start discussing how to avoid these hooks.

Moving to lower-level functions

In many cases, the functions documented by Microsoft and used by most programmers like VirtualProtect are a wrapper to lower-level functions and provide a more convenient and stable interface for them. With VirtualProtect, the actual call stack when interacting with the kernel, which changes the protections, is as follows:

As we can see, we could also use the NtProtectVirtualMemory function directly instead of VirtualProtect. Note that we are sacrificing some convenience here and that undocumented functions might be changed by Microsoft at any point.

In the past, some EDRs did only hook functions at a higher level. Therefore, it was possible to avoid hooks by calling lower-level functions directly. By now, in most cases, hooks are also placed in ntdll.dll, which is the interface between user space and the kernel. Therefore, it is generally no longer possible to avoid hooks by moving to lower-level functions.

Loading a second copy

Another strategy to avoid hooks is to load another copy of the used DLL into memory and then use the second copy for our calls. However, this has the disadvantage that loading the copy into memory alone could be detected and deemed malicious. Therefore, if we do not have a way to do it without encountering hooks, this might not work. It also leaves the obvious indicator of compromise, that there are two loaded copies of ntdll.dll.

Direct syscalls

Functions in ntdll.dll are mostly just a wrapper to syscall instructions, which place the appropriate syscall number into the rax register and hand over control to the kernel. We can see this in the following screenshot, which shows the NtProtectVirtualMemory function:

We can replicate this by using our syscall instruction with the right syscall number. Therefore, our next step will be finding the syscall number for the syscall we want to do.

1.1.1.1. Hardcoding the syscall numbers

One way to gather the syscall numbers is to look at the ntdll.dll file and create a mapping between the function we want to invoke and the syscall number. Unfortunately, the syscall numbers depend on the build of Windows and we can therefore only hardcode them if we know the specific version of Windows we are targeting. This approach is used e.g. by SysWhispers.

1.1.1.2. Parsing ntdll.dll dynamically

We could also parse the syscall numbers from the ntdll.dll present on the system, which would then be the correct numbers for the targeted build version. One way to do this is to use the copy already mapped into memory to retrieve the syscall number, as is done by HellsGate. Here, however, we again face the problem that this copy might be hooked and therefore might no longer contain our syscall numbers.

We could also retrieve a clean copy of ntdll.dll from disk. However, opening a fresh copy of ntdll.dll might be suspicious and could be detected by the EDR, as we are using the hooked logic to open the file.

As with unhooking, at which we will take a closer look later in this post, an alternative way here could be to create a suspended process and read the clean copy of e.g. ntdll.dll from its memory before an EDR had the opportunity to place its hooks. Again, the main issue here is that the functions we would use are potentially hooked, which could lead to us being detected.

1.1.1.3. Using function order in memory

Fortunately, we can also find the syscall numbers dynamically by relying on the order of the syscalls in memory. The syscall numbers are sequential in memory, as can be seen in the following screenshot:

As you can see, the functions following each other in memory also have sequential syscall numbers (0x4c-0x50).

There are two strategies, that we are aware of, that use this order to retrieve the syscall numbers. The first one is Halo’s Gate, which we learned about in the course material from Sektor7. This strategy is basically the same as with HellsGate, but instead of parsing the syscall number from the copy in memory and stopping if a hook overwrote the syscall number, we are continuing our search in the function above and below the function for which we want to retrieve the syscall number. The offset to these functions is always 32 bytes and if we find their syscall numbers, we can use the current overall offset used in our search to calculate the syscall number we are searching for.

One disadvantage of Halo’s Gate is that we still need to find a syscall number in memory. While this is likely possible as not all functions will be hooked, it could still be prevented by an EDR that hooks all functions in ntdll.dll. Instead, we can use the method used by FreshyCalls (this is a fork as we did not find the original repo). The basic idea here is that we sort all function names by their address. Afterward, we can search this list for our function name and will be able to use the index into our list of sorted function as the syscall number. As we are not relying on reading the syscall address from memory, this should work even if they have all been removed from memory as long as the order does not change (which is not a given, as Microsoft could change this with every update). As this is one method we decided to port to C, we will cover this in more detail here.

Like FreshyCalls, I defined a struct that contains the mapping between the syscall name and address:

// Struct holding the syscall name and its address
struct SYSCALL_ENTRY {
    char* name;
    DWORD address;
};
// Struct holding the number of found syscalls, as well as the ntdll.dll base address and an array of SYSCALL_ENTRY structs
struct SYSCALL_LIST {
    DWORD size;
    char* pBaseAddress;
    struct SYSCALL_ENTRY entries[MAX_SYSCALL_ENTRIES];
};

We initially fill this with all functions in ntdll.dll (see part 2 for a more detailed description) that start with nt, but not with ntdll (ignoring case):

DWORD* Functions = (DWORD*)(pBaseAddr + pExportDirAddr->AddressOfFunctions);
DWORD* Names = (DWORD*)(pBaseAddr + pExportDirAddr->AddressOfNames);
WORD* Ordinals = (WORD*)(pBaseAddr + pExportDirAddr->AddressOfNameOrdinals);
DWORD j = 0;
for (DWORD i=0; i < pExportDirAddr->NumberOfNames; i++) {
    char* FunctionName = pBaseAddr + Names[i];
    if([...]) { // Starts with nt, but not ntdll
        syscall_list.entries[j].name = FunctionName;
        syscall_list.entries[j].address = Functions[Ordinals[i]];
        j++;
    }
}
syscall_list.size = j;
syscall_list.pBaseAddress = pBaseAddr;

Finally, we will sort all the entries by their address:

for (unsigned long i = 0; i < syscall_list.size - 1; i++) {
    for (unsigned long j = 0; j < syscall_list.size - i - 1; j++) {
        if (syscall_list.entries[j].address > syscall_list.entries[j + 1].address) {
            // Swap entries.
            struct SYSCALL_ENTRY TempEntry = {};
            TempEntry.name = syscall_list.entries[j].name;
            TempEntry.address = syscall_list.entries[j].address;
            syscall_list.entries[j].name = syscall_list.entries[j + 1].name;
            syscall_list.entries[j].address = syscall_list.entries[j + 1].address;
            syscall_list.entries[j + 1].name = TempEntry.name;
            syscall_list.entries[j + 1].address = TempEntry.address;
        }
    }

}

The index at which our function is located is the syscall number that we are searching.  Therefore, we can iterate over our structure as follows and return the syscall number when we find our function:

for (DWORD i=0; i < syscall_list.size; i++) {
    if ( strcmp(syscall_name, syscall_list.entries[i].name)== 0) {
        return i;
    }
}

1.1.1.4. Using Vectored Exception Handling

Another option would be to call our syscall using non-malicious arguments with the hooks in place so that no detection is triggered. Before doing the call, we set a breakpoint at the syscall instruction and use Vectored Exception Handling to handle this breakpoint. Even if the EDR has removed the syscall number from the ntdll.dll memory, it will be placed in EAX before the syscall. So, when our exception is triggered the right syscall number will be in EAX and we can retrieve it in our exception logic. This is described by rad98 in this blog post.

1.1.1.5. Doing the syscall

Using the syscall number, we can replicate the behavior of the function present in ntdll.dll. For this, SysWhispers does ship a syscall instruction. This, however, seems like an easy pattern that AV software could check for, as there is no reason to use this instruction in an executable. In our understanding, it should only be present in ntdll.dll. Instead, we can use a gadget from ntdll.dll, which our code jumps to when performing the actual syscall as done by FreshyCalls. This has the additional advantage that the call originates from ntdll.dll, which could be beneficial if the call stack is checked by an EDR in the kernel.

For this purpose, we implemented logic that searches ntdll.dll for a syscall instruction. We can start at the address of our target function and then search for a syscall instruction as follows:

for(int i = 0; i < 200; i++) {
    if(*( function_base_address + i) == 0x0F && *(function_base_address + i +1) == 0x05) {
        return (unsigned char*) (function_base_address + i);
    }
}

This is not a clean solution, as we are relying on the fact that the instructions are present either in our function or in one of the functions located directly afterwards. It would be cleaner to search specifically within our function and then start at the beginning of the .text segment in order to find a syscall instruction if there is one present. Changing this is still on our TODO list.

As the loader uses MinGW, we can use the following code to store our syscall gadget and the syscall number in the required registries:

register unsigned char* syscall_gadget asm("r11") = tmp_syscall_gadget;
register unsigned int syscall_number asm("rax") = tmp_syscall_number;

Afterward, we can use the following assembly stub to execute the syscall:

// At the beginning of our function we ensure, that all arguments are saved on the stack (assuming stdcall calling convention)
// Here we put them into registers again, as our logic will likely have clobbered the original values
movq 0x10(%rbp), %rcx // restore first argument
movq 0x18(%rbp), %rdx // restore second argument
movq 0x20(%rbp), %r8 // restore third argument
movq 0x28(%rbp), %r9 // restore fourth argument. Everything after this is passed on the stack anyway.

mov %rcx, %r10\n\t” // replicate normal syscall stub behaviour
mov %rbp,%rsp \n\t” // get rid of local variables, which we no longer need
pop %rbp \n\t” // restore base pointer
jmp %r11 \n\t” // jmp to our gadget

This logic makes some assumptions on how our compiler implements the function (e.g. that rbp is stored on the stack). We verified that this is indeed the case in our implementation. However, future versions of the compiler or different implementations might need some adjustments here.

As we are directly calling the syscall from our code, which will not have been hooked by the EDR, this avoids any hooks that might have been placed in user space.

Unhooking

Using direct syscalls is often inconvenient and might lead to a lot of maintenance, as these interfaces might change at any time. Therefore, we should keep our usage of direct syscalls to a minimum. Furthermore, the payload we load will likely use the Windows APIs, which an EDR will still have hooked at this point.

The hooks will likely be placed by the EDR during the initialization of our process or when a new library is loaded. As discussed before, these hooks are most likely trampoline hooks, which are placed at the beginning of the targeted functions. As the functions reside in userspace, we can overwrite them ourselves, too. This means, that we can revert the changes made by the EDR to the function instruction, which is basically what we will be doing when unhooking our process.

IAT Unhooking

As discussed initially, one way to hook functions is by overwriting function addresses in the IAT. Because this seemed less relevant, we decided against integrating this for now. If you are searching for inspiration, have a look at this project, which implements IAT unhooking. There is also an accompanying blog post, which we highly recommend, that explains what we are doing. To summarize the post: We would iterate over the IAT and recalculate the function addresses by looking at the Export Address Table (EAT) of the DLL implementing the function. If the function address differs, we then overwrite the presumably hooked address with our newly calculated one.

Removing inline hooks

To remove inline hooks, we first need access to a clean version of the DLL. We can retrieve a clean version of the DLL from the original file on disk, as the DLLs are only hooked during runtime. Another option would be to start a suspended process and retrieve a clean version of the loaded DLLs before the EDR had the opportunity to hook them. For DLLs included as \KnownDlls\, it is also an option to call NtOpenSection to get a section handle, which can then be used to map the DLL into our process. The \KnownDlls\ entries are a caching mechanism for the more important DLLs used by the system, this technique works e.g. for ntdll.dll.

After we have a clean copy of our target DLL, we then use it to remove any hooks from the .text section of the DLL loaded by our process. The simplest way to do this is to overwrite the complete text section with the clean version. This works well for ntdll.dll; however, I am not sure if it is the best approach for other DLLs. A more fine-grained approach is to check if a hook is in place for each function and then only overwrite the hook if that is the case.

In our case, the implementation was heavily inspired by this code, as this seemed to be the simplest way to archieve the unhooking using only direct syscalls. It uses the \KnownDlls\ path and checks for a jmp at the beginning of each function to evaluate if a certain function is hooked. If this is the case, the start of the function is overwritten with the instructions from the clean version of the DLL. I decided to only unhook kernel32.dll, kernelbase.dll and ntdll.dll. In a future version of the loader, it might be nice to unhook all loaded DLLs. However, we suspect that with these three DLLs, most of the hooks encountered in practice should be covered.

After we have executed this, we should no longer have hooks in our loaded DLLs and should therefore be harder to detect even when using functions provided by the loaded DLLs instead of direct syscalls. Note, however, that the removal of hooks itself might be an indicator of malicious intent and therefore we need to evaluate whether unhooking makes sense in our use case.

Dynamic unhooking

While looking into this topic, we found an implementation of dynamic unhooking by @mgeeeky. The idea here is that instead of unhooking the DLLs we consider relevant at the beginning of our execution, we integrate the unhooking logic into our dynamic function resolution logic (see part 2 of this series). This way we can dynamically unhook only the functions we use, which should be much stealthier. This will make it harder to check if the hooks are still in place, as most hooks will indeed be. Therefore, this seems like a great idea; however, as our payload is not aware of our dynamic function resolution logic, this seems less relevant for developing a loader than e.g. for a custom C2 framework. To make this work within a loader, we would need to ensure that the payload uses this dynamic function resolution logic, which does not seem trivial and which we therefore decided against.

Kernel level detections

The hooks placed by the EDR, which we discussed previously, are located in user space. There is, however, also the possibility that the logic detecting us resides in kernel space. This logic could then e.g. detect our direct syscalls or our function call after we removed the hooks.

Kernel Callbacks

Drivers can register callbacks for some events in the kernel, like the creation of a new process. An EDR that ships with a kernel driver could register such a callback and react to the event. There was a non-comprehesive list of kernel callbacks linked in this awesome series on C2 development.

ETW TI

Another component in the kernel that might still lead to detection is ETW TI. This is a component implemented by Microsoft and therefore heavily used by their EDR, while other EDRs are, to the best of my knowledge, just starting to use it. It is a version of ETW that is implemented in the kernel and logs information about events triggered by a process. I found this blog post helpful for gaining a bit more insight into ETW TI.

Call Stack Spoofing

One thing an EDR could look at to detect direct syscalls or the malicious use of functions is the call stack of the call. If the call stack does not contain the expected calls or contains suspicious addresses that, for example, are not backed by a file, this could lead to detection.

There are multiple projects to avoid this: There is, for example, an implementation by mgeeky, that places a 0 into the call stack of a sleeping thread to stop the unwinding process. There is also this blog post, which discusses spoofing a call stack using a new thread to make an unsuspicious syscall. An other implementation is a part of AceLdr, which uses a jmp gadget to avoid calls from a suspicious location.

EDR Sandblast

EDR Sandblast is a tool that uses a vulnerable driver to execute code in the kernel. It can then remove any kernel callbacks and also deactivate ETW TI. This tool is quite powerful and has other features as well. However, Microsoft is starting to lock down the loading of drivers by requiring them to be signed and by introducing a blacklist for vulnerable drivers. Therefore, if the target system is sufficiently hardened, we might need a custom signed driver or an exploitable zero-day in another driver to use a similar approach.

Summary

In this post, we took a quick look at how hooks work. We then discussed how to evade them using direct syscalls. Here we covered different options for resolving syscall numbers. Afterwards, we discussed unhooking, which will be useful to ensure that our payload stays undetected during execution, as these hooks would likely allow an EDR solution to recognize some of our payloads by their call patterns. In the next post, we will discuss evading AMSI and ETW to ensure that our payload is even harder to detect during runtime.

Further blog articles

Blog

Loader Dev. 5 – Loading our payload

May 10, 2024 – In this post, we will finally cover loading our actual payload. As discussed at the beginning of this series, our loader should be able to load shellcode and C# assemblies as well as PEs. The actual mode will be chosen using an argument to the python script used for compilation.

Author: Kolja Grassmann

Mehr Infos »
Blog

Loader Dev. 4 – AMSI and ETW

April 30, 2024 – In the last post, we discussed how we can get rid of any hooks placed into our process by an EDR solution. However, there are also other mechanisms provided by Windows, which could help to detect our payload. Two of these are ETW and AMSI.

Author: Kolja Grassmann

Mehr Infos »
Blog

Loader Dev. 1 – Basics

February 10, 2024 – This is the first post in a series of posts that will cover the development of a loader for evading AV and EDR solutions.

Author: Kolja Grassmann

Mehr Infos »

Do you want to protect your systems? Get in touch with us.

Search
Search