Setting up a basic IC
Now that we have partially reversed KiSetupForInstrumentationReturn and NtSetInformationProcess we know the following things:
- An IC can be set from user mode with NtSetInfomationProcess.
- ProcessInformationClass needs to be set to 0x28.
- If we want to set an IC on another process, we need to have the SeDebugPrivilege.
- When the IC is executed, r10 will hold the original rip.
For a successful NtSetInformationProcess call, the following struct needs to be passed as ProcessInformation parameter. We will also need the type definition of NtSetInformationProcess.
struct PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION { ULONG Version; ULONG Reserved; PVOID Callback; }; |
Only the Callback member matters to us, the other two need to be set to 0. You can try setting Callback to a function pointer; however, you will not be very successful as the stack was not set up for a function call. The Callback member should instead point to some assembly code. This assembly code, which we will call the bridge, needs to do the following:
- Save the registers
- Set up a function call
- Restore stack and registers after function call
- Jump to r10 as that holds the actual address the code should resume at.
Depending on what you want to use your IC for, you will most likely trigger syscalls from within the IC itself. This would cause an infinite recursion, as the IC would be called again when the syscall is triggered; thus, we will also need an option to disable the IC for the current thread.
Let’s try setting up a very simple IC that will trigger a breakpoint on a kernel to usermode return.
Setting the IC
The following is our exemplary code to set an IC. You will of course need to have a function definition for NtSetInformationProcess.
#include <print> #include <Windows.h> extern "C" void instrumentation_bridge(); extern "C" void instrumentation_callback() { __debugbreak(); }
int main() { PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION instrumentation_info{}; instrumentation_info.Callback = reinterpret_cast<void*>(&instrumentation_adapter); const auto nt_set_info_proc = reinterpret_cast<NtSetInformationProcess_t>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationProcess")); if (!nt_set_info_proc) { std::println("Could not resolve NtSetInformationProcess"); return false; } auto status = nt_set_info_proc(GetCurrentProcess(), static_cast<_PROCESS_INFORMATION_CLASS>(0x28), &instrumentation_info, sizeof(instrumentation_info)); if (status) { std::println("NtSetInformationProcess returned {:x}", status); } else { std::println("Successfully installed instrumentation callback"); } |
extern “C” is used to disable C++ name mangling and instead use C style linkage.
With the line extern “C” void instrumentation_bridge(); we are linking to our not-yet-written assembly bridge.
instrumentation_callback is the function we want to call through our assembly bridge. For now, we just set a breakpoint there, as we will not be implementing a flag to avoid recursion just yet.
Writing the assembly bridge
For writing the assembly bridge, we’ll be using NASM. If you are using MASM or another assembler, you will of course need to adjust the assembly accordingly.
We will start by pushing the registers, setting up the function call, calling it and then undoing our changes. After that, we will jump to r10 to continue the execution flow. There are multiple ways you can save the current registers, either you just push them to the stack, save them to a structure or call Windows functions doing that for you. Please note that the following snippets do not save, for example, the floating-point registers.
extern instrumentation_callback section .code global instrumentation_adapter instrumentation_adapter: pushfq push rax push rbx push rcx push rdx push rdi push rsi push r8 push r9 push r10 push r11 push r12 push r13 push r14 push r15 push rbp mov rbp, rsp sub rsp, 0x20 call instrumentation_callback add rsp, 0x20 pop rbp pop r15 pop r14 pop r13 pop r12 pop r11 pop r10 pop r9 pop r8 pop rsi pop rdi pop rdx pop rcx pop rbx pop rax popfq jmp r10 |
By running the program with an attached debugger, you should now trigger the breakpoint in the C++ code. This means, our function is correctly called. However, we obviously want to do more with our callback than trigger a breakpoint, but for that we will need to implement a check to avoid infinite recursion as the IC would be executed for every syscall, even if the syscall was made by the IC itself.
This flag should be thread-local, as otherwise we would not catch syscall executions in other threads while our IC in one thread is executing.
For this purpose, we’ll be misusing the legacy member InstrumentationCallbackDisabled of the Thread Environment Block (TEB). This is, at least in x64 versions, no longer used. There are smarter ways of implementing such a check, for example with Thread Local Storage, as using the InstrumentationCallbackDisabled member is an obvious giveaway to EDRs/ACs that something weird is going on.
If you look at the structure of the TEB, you will see InstrumentationCallbackDisabled is located at 0x1b8. The idea is that once the IC is triggered, InstrumentationCallbackDisabled gets set to 1 (true) and then our C++ function is executed. If that functions triggers syscalls, they will not call the function again because before that our assembly bridge will check if InstrumentationCallbackDisabled is set to 1 (true). If it is, it continues execution. Once our C++ function is over and the assembly bridge restores the registers, the flag will be cleared.
To do this, the following assembly can be used. The first part before the dots is meant to be added right after the pushfq, and the bottom part is meant to replace everything after pop rax.
mov rcx, gs:[30h] ; TEB add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled cmp byte [rcx], 1 je _ret […] mov rcx, qword gs:[30h] ; TEB add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled mov byte [rcx], 0 _ret: popfq jmp r10 |
The careful eye might’ve noticed something: with this code we are no longer backing up and restoring rcx. Why’s that?
If you attach a debugger to a program, place a breakpoint on the instruction after a syscall and trigger it, you will see the address of the instruction after the syscall being in rcx. If you do the same with an IC, you will see that the address of the IC is in rcx. If you wanted to hide the existence of your IC, this would obviously be counterproductive. Fixing this, is not part of this article and will not be covered here
We would also recommend checking the value of r10 with and without an IC set.
Logging and spoofing syscalls
Let’s recap: by now we can execute our own C/C++ function after every exception and make syscalls from within it. This is cool; however, we can’t do specific things for certain executed syscalls, as we do not have access to the executed syscalls’ address in our C++ function. Let’s fix this and while we are it, let’s pass even more parameters that will be useful to us. In total we are planning to add three parameters giving us the address of the syscall that was executed, the return value and the original stack pointer. Why the original stack pointer is interesting will be explained shortly.
As mentioned before, there are different ways of saving the registers and different ways of passing information to your function. If you saved the registers in, for example, a CONTEXT structure, you could just pass that to your IC.
Let’s first change our function definition to add the three parameters. Additionally, it would be nice to change the return value of syscalls.
Like specified in the windows x64 calling convention, return values are passed in the rax register. When a syscall is made and the IC is triggered, rax will hold the return value of the syscall. By changing the return type of the instrumentation_callback function from void to uint64_t we can easily overwrite the return value of the syscall by returning another value from our C++ code as rax is overwritten by that.
After implementing those changes, the instrumentation_callback function looks as follows:
uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) { __debugbreak(); } |
Now we need to adjust the assembly bridge. We can use rcx to store the original stack pointer, as we do not need to back up rcx because of the reasons mentioned before.
extern instrumentation_callback section .code global instrumentation_adapter instrumentation_adapter: mov rcx, rsp pushfq push rcx mov rcx, gs:[30h] ; TEB add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled cmp byte [rcx], 1 pop rcx je _ret […] push rbp mov rbp, rsp sub rsp, 0x20 ; rcx already contains the stack pointer mov rdx, r10 mov r8, rax call instrumentation_callback add rsp, 0x20 pop rbp […] |
This should trigger the placed breakpoint in our C++ code and shows that the parameters contain the correct values.
Logging syscalls
To log syscalls with their function name, we will use the dbghelp library, which you need to link against.
Additionally, the following code needs to get added to the start of main to allocate a console and initialize the symbol handler.
[…] if (!AllocConsole()) return -1;
FILE* fp; freopen_s(&fp, "CONOUT$", "w", stdout); freopen_s(&fp, "CONIN$", "r", stdin); freopen_s(&fp, "CONERR$", "w", stderr); SymSetOptions(SYMOPT_UNDNAME); if (!SymInitialize(reinterpret_cast<HANDLE>(-1), nullptr, TRUE)) { std::println("SymInitialize failed"); return -1; } […] |
The following instrumentation_callback function then prints out all the called function names, their address, the displacement from the function start and the return value.
extern "C" uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) { std::array<byte, sizeof(SYMBOL_INFO) + MAX_SYM_NAME> buffer{ 0 }; const auto symbol_info = reinterpret_cast<SYMBOL_INFO*>(buffer.data()); symbol_info->SizeOfStruct = sizeof(SYMBOL_INFO); symbol_info->MaxNameLen = MAX_SYM_NAME; uint64_t displacement = 0; if (!SymFromAddr(reinterpret_cast<HANDLE>(-1), return_addr, &displacement, symbol_info)) { printf("[-] SymFromAddr failed: %lu", GetLastError()); return return_val; } if (symbol_info->Name) printf("[+] %s+%llu \n\t- Returns: %llu\n\t- Return address: %llu\n", symbol_info->Name, displacement, return_val, return_addr); return return_val; } |
This functionality is obviously the most useful if the project is a DLL and not an EXE, as it can then be injected into a process to see which syscalls the program triggers.
Spoofing syscalls
Let’s now start doing cool stuff with our IC: as ICs are the first code being executed in user mode after a syscall, we can spoof its return values from our IC.
For this example, our test program will be using OpenProcess to open a handle to another process. Our IC will then retrieve the opened handle from the stack, close it and then return ACCESS_DENIED.
Our IC only gets a callback to NtOpenProcess, which is called by OpenProcess, not to OpenProcess itself. Let’s look at the function definitions for both functions:
HANDLE OpenProcess( [in] DWORD dwDesiredAccess, [in] BOOL bInheritHandle, [in] DWORD dwProcessId ); NTSTATUS NtOpenProcess( [out] PHANDLE ProcessHandle, [in] ACCESS_MASK DesiredAccess, [in] POBJECT_ATTRIBUTES ObjectAttributes, [in, optional] PCLIENT_ID ClientId ); |
As we can see, rax, the register containing the return value of the syscall, will hold a NTSTATUS value and not the handle. First, we need to check if NtOpenProcess was executed without an error and then we need to retrieve the handle from the stack for which we need a stack offset.
As OpenProcess returns a HANDLE, we know the required logic to retrieve the handle is already implemented in OpenProcess after the NtOpenProcess function call.
Let’s reverse OpenProcess in kernelbase to retrieve the offset:
[…] call qword [rel NtOpenProcess] nop dword [rax+rax] test eax, eax js 0x1800338c5 mov rax, qword [rsp+0x88] add rsp, 0x68 retn |
Most of the function is not important for us; we just need to check how the handle gets loaded into rax. This is done through the operation mov rax, qword [rsp+0x88], so we know that if we have the stack pointer of the OpenProcess function, the handle is at an offset of 0x88. Our original_rsp parameter holds the stack pointer of NtOpenProcess, not OpenProcess. This means that the top of the stack holds the address NtOpenProcess should return to in OpenProcess. Therefore, we need to add eight to that value of 0x88 to access the handle.
You might understand now why we added an original_rsp parameter to our C++ function. We could still access the handle from the function with inline assembly; however, every time we add, for example, a local variable in our C++ function, we would need to recalculate our offset to the handle, as a bigger stack frame would be allocated for our function.
Let’s recap what we require to spoof the handle access:
- We need to calculate the return address of the NtOpenProcess
- We need to check if the return address is that of the ret operation of NtOpenProcess.
- We should check the value of rax. If it contains a non-zero value NtOpenProcess
- We need to change the handle at the offset of 0x90 of the original stack pointer to INVALID_HANDLE_VALUE.
- We need to change the return value to STATUS_ACCESS_DENIED (0xC0000022).
As we can now do this in C++, this is very easy and can be done with the following code:
extern "C" uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) { static uint64_t nt_open_proc; if (!nt_open_proc) { nt_open_proc = reinterpret_cast<uint64_t>(GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtOpenProcess")); if (!nt_open_proc) return return_val; nt_open_proc += 20; } if (return_addr != nt_open_proc) return return_val; if (return_val != 0) return return_val; auto handle_ptr = reinterpret_cast<HANDLE*>(original_rsp + 0x90); if (*handle_ptr == INVALID_HANDLE_VALUE) return return_val; std::println("[+] IC: Detected program NtOpenProcess call: {}", *handle_ptr); CloseHandle(*handle_ptr); std::println("[+] IC: Closed opened handle and spoofing Access denied"); *handle_ptr = INVALID_HANDLE_VALUE; return 0xC0000022; // Access denied NTSTATUS value } |
To test this, let’s open a handle to a process with and without an IC set. For this example, we’ll be using notepad.exe as a test program. As OpenProcess requires a process ID, we have also added a basic process ID enumeration function.
#include <tlhelp32.h> […] uint32_t get_process_id(const std::string_view& process_name) { PROCESSENTRY32 proc_entry{ .dwSize = sizeof(PROCESSENTRY32) }; HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0); if (snapshot == INVALID_HANDLE_VALUE) return 0; if (!Process32First(snapshot, &proc_entry)) return 0; do { if (std::string{ proc_entry.szExeFile } != process_name) continue; CloseHandle(snapshot); return proc_entry.th32ProcessID; } while (Process32Next(snapshot, &proc_entry)); CloseHandle(snapshot); return 0; } int main() { PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION instrumentation_info{}; instrumentation_info.Callback = reinterpret_cast<void*>(&instrumentation_adapter); const auto nt_set_info_proc = reinterpret_cast<NtSetInformationProcess_t>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationProcess")); if (!nt_set_info_proc) { std::println("Could not resolve NtSetInformationProcess"); return -1; } const auto pid = get_process_id("notepad.exe"); if (pid == 0) { std::println("Could not find notepad.exe"); return -1; } auto handle = OpenProcess(GENERIC_ALL, 0, pid); if (handle != INVALID_HANDLE_VALUE) std::println("Successfully opened process handle: {}", handle); else std::println("Failed opening process handle: {}", handle); CloseHandle(handle); auto status = nt_set_info_proc(GetCurrentProcess(), static_cast<_PROCESS_INFORMATION_CLASS>(0x28), &instrumentation_info, sizeof(instrumentation_info)); if (status) { std::println("NtSetInformationProcess returned {:x}", status); } else { std::println("Successfully installed instrumentation callback"); } handle = OpenProcess(GENERIC_ALL, 0, pid); if (handle != INVALID_HANDLE_VALUE) std::println("Successfully opened process handle: {}", handle); else std::println("Failed opening process handle: {}", handle); CloseHandle(handle); } |
Executing the code with a working IC should result in one successful and one failed OpenProcess call if notepad.exe is running.
Of course, OpenProcess was just used as an example. This can be done with every syscall.
Closing words
In this blog you learnt how ICs work and how they can be used to log and spoof syscalls from user mode. ICs can be utilized for much more; in the upcoming blogs you will learn how to inject shellcode into other processes and how you can hook function calls with ICs to, for example, prevent users from overwriting your own IC. In a more theoretical part of the series we will discuss other use cases of ICs and possible counter measures.