John Uhlmann

Call Stacks: No More Free Passes For Malware

We explore the immense value that call stacks bring to malware detection and why Elastic considers them to be vital Windows endpoint telemetry despite the architectural limitations.

Call Stacks: No More Free Passes For Malware

Call stacks provide the who

One of Elastic’s key Windows endpoint telemetry differentiators is call stacks.

Most detections rely on what is happening — and this is often insufficient as most behaviours are dual purpose. With call stacks, we add the fine-grained ability to also determine who is performing the activity. This combination gives us an unparalleled ability to uncover malicious activity. By feeding this deep telemetry to Elastic Defend’s on-host rule engine, we can quickly respond to emerging threats.

Call stacks are a beautiful lie

In computer science, a stack is a last-in, first-out data structure. Similar to a stack of physical items, it is only possible to add or remove the top element. A call stack is a stack that contains information about the currently active subroutine calls.

On x64 hosts, this call stack can only be accurately generated using execution tracing features on the CPU, such as Intel LBR, Intel BTS, Intel AET, Intel IPT, and x64 Architectural LBR. These tracing features were designed for performance profiling and debugging purposes, but can be used in some security scenarios as well. However, what is more generally available is an approximate call stack that is recovered from a thread’s data stack via a mechanism called stack walking.

In the x64 architecture, the “stack pointer register” (rsp) unsurprisingly points to a stack data structure, and there are efficient instructions to read and write the data on this stack. Additionally, the call instruction transfers control to a new subroutine but also saves a return address at the memory address referenced by the stack pointer. A ret instruction will later retrieve this saved address so that execution can return to where it left off. Functions in most programming languages are typically implemented using these two instructions, and both function parameters and local function variables will typically be allocated on this stack for performance. The portion of the stack related to a single function is called a stack frame.

Stack walking is the recovery of just the return addresses from the heterogeneous data stored on the thread stack. Return addresses need to be stored somewhere for control flow — so stack walking co-opts this existing data to approximate a call stack. This is entirely suitable for most debugging and performance profiling scenarios, but slightly less helpful for security auditing. The main issue is that you can’t disassemble backwards. You can always determine the return address for a given call site, but not the converse. The best approach you can take is to check each of the 15 possible preceding instruction lengths and see which disassembles to exactly one call instruction. Even then, all you have recovered is a previous call site — not necessarily the exact preceding call site. This is because most compilers use tail call optimisation to omit unnecessary stack frames. This creates annoying scenarios for security like there being no guarantee that the Win32StartAddress function will be on the stack even though it was called.

So what we usually refer to as a call stack is actually a return address stack.

Malware authors use this ambiguity to lie. They either craft trampoline stack frames through legitimate modules to hide calls originating from malicious code, or they coerce stack walking into predicting different return addresses than those the CPU will execute. Of course, malware has always just been an attempt to lie, and antimalware is just the process of exposing that lie.

“... but at the length truth will out.”

  • William Shakespeare, The Merchant of Venice, Act 2, Scene 2

Making call stacks beautiful

So far, a stack walk is just a list of numeric memory addresses. To make them useful for analysis we need to enrich them with context. (Note: we don’t currently include kernel stack frames.)

The minimum useful enrichment is to convert these addresses into offsets within modules (e.g. ntdll.dll+0x15c9c4). This would only catch the most egregious malware though — we can go deeper. The most important modules on Windows are those that implement the Native and Win32 APIs. The application binary interface for these APIs requires that the name of each function be included in the Export Directory of the containing module. This is the information that Elastic currently uses to enrich its endpoint call stacks.

A more accurate enrichment could be achieved by using the public symbols (if available) hosted on the vendor’s infrastructure (especially Microsoft) While this method offers deeper fidelity, it comes with higher operational costs and isn’t feasible for our air-gapped customers.

A rule of thumb for Microsoft kernel and native symbols is that the exported interface of each component has a capitalised prefix such as Ldr, Tp or Rtl. Private functions extend this prefix with a p. By default, private functions with external linkage are included in the public symbol table. A very large offset might indicate a very large function, but it could also just indicate an unnamed function that you don’t have symbols for. A general guideline would be to consider any triple-digit and larger offsets in an exported function as likely belonging to another function.

Call StackStack WalkStack Walk ModulesStack Walk Exports (Elastic approach)Stack Walk Public Symbols
0x7ffb8eb9c9c2 0x12d383f0046 0x7ffb8eb1a9d8 0x7ffb8eb1aaf4 0x7ffb8ea535ff 0x7ffb8da5e8cf 0x7ffb8eaf14eb0x7ffb8eb9c9c4 0x7ffb8c3c71d6 0x7ffb8eb1a9ed 0x7ffb8eb1aaf9 0x7ffb8ea53604 0x7ffb8da5e8d4 0x7ffb8eaf14f1ntdll.dll+0x15c9c4 kernelbase.dll+0xc71d6 ntdll.dll+0xda9ed ntdll.dll+0xdaaf9 ntdll.dll+0x13604 kernel32.dll+0x2e8d4 ntdll.dll+0xb14f1ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlAddRefActivationContext+0x40d ntdll.dll!RtlAddRefActivationContext+0x519 ntdll.dll!RtlAcquireSRWLockExclusive+0x974 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlTpTimerCallback+0x7d ntdll.dll!TppTimerpExecuteCallback+0xa9 ntdll.dll!TppWorkerThread+0x644 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21

Comparison of Call Stack Enrichment Levels

In the above example, the shellcode at 0x12d383f0000 deliberately used a tail call so that its address wouldn’t appear in the stack walk. This lie-by-omission is apparent even with only the stalk walk. Elastic reports this with the proxy_call heuristic as the malware registered a timer callback function to proxy the call to VirtualProtect from a different thread.

Making call stacks powerful

The call stacks of the system calls that we monitor with Event Tracing for Windows (ETW) have an expected structure. At the bottom of the stack is the thread StartAddress - typically ntdll.dll!RtlUserThreadStart. This is followed by the Win32 API thread entry - kernel32.dll!BaseThreadInitThunk and then the first user module. A user module is application code that is not part of the Win32 (or Native) API. This first user module should match the thread’s Win32StartAddress (unless that function used a tail call). More user modules will follow until the final user module makes a call into a Win32 API that makes a Native API call, which finally results in a system call to the kernel.

From a detection standpoint, the most important module in this call stack is the final user module. Elastic shows this module, including its hash and any code signatures. These details aid in alert triage, but more importantly, they drastically improve the granularity at which we can baseline the behaviours of legitimate software that sometimes behaves like malware. The more accurately we can baseline normal, the harder it is for malware to blend in.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|file.dll|rundll32.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" }, /* Native API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocExNuma+0x62" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocEx+0x16" }, /* Win32 API */
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x160d8b", /* final user module */
        "callsite_trailing_bytes": "488bf0488d4d88e8197ee2ff488bc64883c4685b5e5f415c415d415e415f5dc390909090905541574156415541545756534883ec58488dac2490000000488b71",
        "callsite_leading_bytes": "088b4d38894c2420488bca48894db8498bd0488955b0458bc1448945c4448b4d3044894dc0488d4d88e8e77de2ff488b4db8488b55b0448b45c4448b4dc0ffd6"
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x7b429" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x44a9" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x5f58" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x3bcf" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x6309" }, /* first user module - typically the ETHREAD.Win32StartAddress module */
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" /* Native API - the ETHREAD.StartAddress module */
      }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.dll",
      "code_signature": [ { "exists": false } ],
      "name": "file.dll",
      "hash": { "sha256": "0240cc89d4a76bafa9dcdccd831a263bf715af53e46cac0b0abca8116122d242" }
    }
  }
}

Sample enriched call stack

Call stack final user module enrichments:

nameThe file name of the call_stack_final_user_module. Can also be "Unbacked" indicating private executable memory, or "Undetermined" indicating a suspicious call stack.
pathThe file path of the call_stack_final_user_module.
hash.sha256The sha256 of the call_stack_final_user_module, or the protection_provenance module if any.
code_signatureCode signature of the call_stack_final_user_module, or the protection_provenance module if any.
allocation_private_bytesThe number of bytes in this memory region that are both +X and non-shareable. Non-zero values can indicate code hooking, patching, or hollowing.
protectionThe memory protection for the acting region of pages is included if it is not RX. Corresponds to MEMORY_BASIC_INFORMATION.Protect.
protection_provenanceThe name of the memory region that caused the last modification of the protection of this page. "Unbacked" may indicate shellcode.
protection_provenance_pathThe path of the module that caused the last modification of the protection of this page.
reasonThe anomalous call_stack_summary that led to an "Undetermined" protection_provenance.

A quick call stack glossary

When examining call stacks, there are some Native API functions that are helpful to be familiar with. Ken Johnson, now of Microsoft, has provided us with a catalog of NTDLL kernel mode to user mode callbacks to get us started. Seriously, you should pause here and go read that first.

We met RtlUserThreadStart earlier. Both it and its sibling RtlUserFiberStart should only ever appear at the bottom of a call stack. These are the entrypoints for user threads and fibers, respectively. The first instruction on every thread, however, is actually LdrInitializeThunk. After performing the user-mode component of thread initialisation (and process, if required), this function transfers control to the entrypoint via NtContinue, which updates the instruction pointer directly. This means that it does not appear in any future stack walks.

So if you see a call stack that includes LdrInitializeThunk then this means you are at the very start of a thread’s execution. This is where the application compatibility Shim Engine operates, where hook-based security products prefer to install themselves, and where malware tries to gain execution before those other security products. Marcus Hutchins and Guido Miggelenbrink have both written excellent blogs on this topic. This startup race does not exist for security products that utilise kernel ETW for telemetry.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x1bac8" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlAnsiStringToUnicodeString+0x3cb" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitShimEngineDynamic+0x394d" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x1db" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x63" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0xe" }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false } ],
      "name": "file.exe",
      "hash": { "sha256": "a59a7b56f695845ce185ddc5210bcabce1fff909bac3842c2fb325c60db15df7" }
    }
  }
}

Pre-entrypoint execution example

The next pair is KiUserExceptionDispatcher and KiRaiseUserExceptionDispatcher. The kernel uses the former to pass execution to a registered user-mode structured exception handler after a user-mode exception condition has occurred. The latter also raises an exception, but on behalf of the kernel instead. This second variant is usually only caught by debuggers, including Application Verifier, and helps identify when user-mode code is not sufficiently checking return codes from syscalls. These functions will usually be seen in call stacks related to application-specific crash handling or Windows Error Reporting. However, sometimes malware will use it as a pseudo-breakpoint — for example, if they want to fluctuate memory protections to rehide their shellcode immediately after making a system call.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll|file.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      {
        "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14",
        "protection_provenance": "file.exe", /* another vendor's hooks were unhooked */
        "allocation_private_bytes": 8192
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0xd99c" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlInitializeCriticalSectionAndSpinCount+0x1c6" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlWalkFrameChain+0x1119" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserExceptionDispatcher+0x2e" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x12612" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "file.exe",
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false }],
      "hash":   { "sha256": "0e5a62c0bd9f4596501032700bb528646d6810b16d785498f23ef81c18683c74" }
    }
  }
}

Protection fluctuation via exception handler example

Next is KiUserApcDispatcher, which is used to deliver user APCs. These are one of the favourite tools of malware authors, as Microsoft only provides limited visibility into its use.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|cronos.exe",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualProtect+0x36" }, /* tail call */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserApcDispatcher+0x2e" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwDelayExecution+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SleepEx+0x9e" },
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x107d",
        "allocation_private_bytes": 147456, /* stomped */
        "protection": "RW-", /* fluctuation */
        "protection_provenance": "Undetermined", /* proxied call */
        "callsite_leading_bytes": "010000004152524c8d520141524883ec284150415141baffffffff41525141ba010000004152524c8d520141524883ec284150b9ffffffffba0100000041ffe1",
        "callsite_trailing_bytes": "4883c428c3cccccccccccccccccccccccccccc894c240857b820190000e8a10c0000482be0488b052fd101004833c44889842410190000488d84243014000048"
      }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|file.exe"
    }
  }
}

Protection fluctuation via APC example

The Windows window manager is implemented in a kernel-mode device driver (win32k.sys). Mostly. Sometimes the window manager needs to do something from user-mode, and KiUserCallbackDispatcher is the mechanism to achieve that. It’s basically a reverse syscall that targets user32.dll functions. Overwriting an entry in a process’s KernelCallbackTable is an easy way to hijack a GUI thread, so any other module following this call is suspicious.

Knowledge of the purpose of each of these kernel-mode to user-mode entry points greatly assists in determining if a given call stack is natural or if it has been misappropriated to achieve alternative goals.

Making call stacks understandable

To aid understandability, we also tag the event with various process.Ext.api.behaviors that we identify. These behaviours aren’t necessarily malicious, but they highlight aspects that are relevant to alert triage or threat hunting. For call stacks, these include:

native_apiA call was made directly to the Native API rather than the Win32 API.
direct_syscallA syscall instruction originated outside of the Native API layer.
proxy_callThe call stack may indicate a proxied API call to mask the true source.
shellcodeSecond generation executable non-image memory called a sensitive API.
image_indirect_callAn entry in the call stack was preceded by a call to a dynamically resolved function.
image_ropNo call instruction preceded an entry in the call stack.
image_rwxAn entry in the call stack is writable. Code should be read-only.
unbacked_rwxAn entry in the call stack is non-image and writable. Even JIT code should be read-only.
truncated_stackThe call stack seems to be unexpectedly truncated. This may be due to malicious tampering.

In some contexts, these behaviours alone may be sufficient to detect malware.

Spoofing — bypass or liability?

Return address spoofing has been a staple game hacking and malware technique for many, many years. This simple trick allows injected code to borrow the reputation of a legitimate module with few consequences. The goal of deep call stack inspection and behaviour baselines is to stop giving malware this free pass.

Offensive researchers have been assisting this effort by looking into approaches for full call stack spoofing. Most notably:

SilentMoonwalk, in addition to being superb offensive research, is an excellent example of how lying can get you into twice the amount of trouble — but only if you get caught. Many Defense Evasion techniques rely on security-by-obscurity — and once exposed by researchers, they can become a liability. In this case, the research included advice on the detection opportunities introduced by the evasion attempt.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAlloc+0x48" },
      {
        "symbol_info": "c:\\windows\\system32\\kernelbase.dll!CreatePrivateObjectSecurity+0x31",
        /* 4883c438 stack desync gadget - add rsp 0x38 */
        "callsite_trailing_bytes": "4883c438c3cccccccccccccccccccc48895c241057498bd8448bd2488bf94885c90f84660609004885db0f845d060900418bd14585c97411418bc14803c383ea",
        "callsite_leading_bytes": "cccccccccccccccccccccccccccccc4883ec38488b4424684889442428488b442460488944242048ff15d9b21b000f1f44000085c00f8830300900b801000000"
      },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!Internal_EnumSystemLocales+0x406" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SystemTimeToTzSpecificLocalTimeEx+0x2d1" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!WaitForMultipleObjectsEx+0x982" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined", /* gadget module resulted in suspicious call stack */
      "reason": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll"
    }
  }
}

SilentMoonwalk call stack example

A standard technique for unearthing hidden artifacts is to enumerate them using multiple techniques and compare the results for discrepancies. This is how RootkitRevealer works. This approach was also used in Get-InjectedThreadEx.exe, which climbs up the thread stack as well as walking down it.

In certain circumstances, we may be able to recover a call stack in two ways. If there are discrepancies, then you will see the less reliable call stack emitted as call_stack_summary_original.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll",
    "call_stack_summary_original": "ntdll.dll|kernelbase.dll|version.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtContinue+0x12" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x13" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll"
    }
  }
}

Call Stack summary original example

Call Stacks are for everyone

By default you will only find call stacks in our alerts, but this is configurable through advanced policy.

events.callstacks.emit_in_eventsIf set, call stacks will be included in regular events where they are collected. Otherwise, they are only included in events that trigger behavioral protection rules. Note that setting this may significantly increase data volumes. Default: false

Further insights into Windows call stacks is available in the following Elastic Security Labs articles: