RunPE
When loading a PE into memory, we need to mimic the behavior of the load process done by Windows internally and resolve all dependencies. There are multiple implementations that inspired our implementation. One implementation is by Netitude in C#. There is also an accompanying white paper. There is also a C++ implementation and a Nim implementation.
Our aim here was to create an implementation that gets a decrypted PE passed to it as an argument, loads this PE, and then runs it. As this is intended as a loader and not as part of a C2 framework, I am less concerned about the cleanup process. The process will simply exit after executing the PE. This differs from the C# implementation done by Netitude. We also do not need to be able to load existing PEs from disk, as we are primarily trying to hide binaries like mimikatz from AV/EDR products and will therefore not put these on disk.
As the loader is implemented in C, we took the basic implementation from the C++ implementation mentioned above. For our version, we made a few adjustments:
- Fine-grained permissions instead of RWX
- Direct syscalls where feasible
- Support for 64bit relocations
Let us go over these changes one by one. The initial memory allocation in the original implementation was done using RWX permissions, which is quite suspicious in our opinion. The needed permissions for each section are present in the Characteristics field in each section header. We can therefore look them up for each of the sections and then change the protections accordingly. Our implementation is initially allocating memory with RW permissions and then adjusting the permissions before executing the PE. The following code is ported from the Netitude implementation:
IMAGE_SECTION_HEADER * SectionHeaderArr = (IMAGE_SECTION_HEADER *)((size_t)ntHeader + sizeof(IMAGE_NT_HEADERS));
for (int i = 0; i < ntHeader->FileHeader.NumberOfSections; i++) {
printf(" [+] Changing the protections for Section %s\n", SectionHeaderArr[i].Name);
bool execute = ((unsigned int) SectionHeaderArr[i].Characteristics & IMAGE_SCN_MEM_EXECUTE) != 0;
bool read = ((unsigned int) SectionHeaderArr[i].Characteristics & IMAGE_SCN_MEM_READ) != 0;
bool write = ((unsigned int) SectionHeaderArr[i].Characteristics & IMAGE_SCN_MEM_WRITE) != 0;
DWORD32 protection = PAGE_EXECUTE_READWRITE;
if (execute && read && write) {
protection = PAGE_EXECUTE_READWRITE;
}
else if (!execute && read && write) {
protection = PAGE_READWRITE;
}
else if (!write && execute && read) {
protection = PAGE_EXECUTE_READ;
}
else if (!execute && !write && read) {
protection = PAGE_READONLY;
}
else if (execute && !read && !write) {
protection = PAGE_EXECUTE;
}
else if (!execute && !read && !write){
protection = PAGE_NOACCESS;
}
printf(" [+] Setting protection: 0x%x\n", protection);
DWORD old_protect = 0;
LPVOID base_address = (LPVOID)(pImageBase + SectionHeaderArr[i].VirtualAddress);
size_t data_size = SectionHeaderArr[i].SizeOfRawData;
NTSTATUS status = pNtProtectVirtualMemory((HANDLE)-1, &base_address, &data_size , protection, &old_protect);
}
Furthermore, we also replaced the function calls with direct syscalls to give fewer detection opportunities to EDRs. As we already discussed direct syscalls before, we will not go into details here.
Then we added support for 64-bit relocations. This was not a huge change, as these are similar to the 32-bit version. However, the original version fails to do relocations when facing a 64bit binary. The case that we added to the original code looked as follows:
else if (type == RELOC_64BIT_FIELD) {
// Get relocation address location
size_t* relocateAddr = (size_t*)((size_t)modulePtr + reloc_field);
printf(" [V] Apply Reloc Field at %x\n", relocateAddr);
*(UINT64 *)relocateAddr = ((*relocateAddr) - oldBase + newBase);
}
Apart from this, we did the usual modifications like dynamically resolving functions and encrypting strings.
One other change that we thought about was to do a custom LoadLibrary version as done here. This should help to get around kernel-level logic that might recognize our loading of DLLs as suspicious. This will get more important as EDRs move towards processing ETW TI logs and registering more kernel callbacks.
The following screenshot shows the loader running mimikatz :