Sunday, May 18, 2008

Rustock.C – Unpacking a Nested Doll

Unpacking Rustock.C is a challenging task. If you are tired of boring crosswords or Sudoku puzzles and feel like your brain needs a real exercise, think about reversing Rustock.C - satisfaction (or dissatisfaction, depending on the result) is guaranteed.

Rustock.C story began a week ago – when one AV vendor has publicly disclosed the new details about the latest variant of Rustock. As soon as the sample of Rustock.C has been obtained, many researchers started their journey into the center of the rootkit.

First quick look at the driver code reveals a simple decoder. In spite of being simple, it is still a good idea to debug it to see what exactly it produces on its output.

In order to debug a driver, different malware researchers prefer different tools – in our case let’s start from WinDbg configured to debug a VMWare session running in debug mode. For more details of this set up, please read this article.

The very first question one might ask is how to put a breakpoint into the very beginning of the driver code?

Some researchers would hook IopLoadDriver() in the kernel to intercept the code before it jumps into the driver, in order to step in it by slowly tracing single instructions.

A simple known trick however, is to build a small driver (and keep it handy) with the first instruction being “int 3”. Once such driver is loaded, the debugger will pop up with the Debug Breakpoint exception. Stepping out from that place leads back into the kernel’s IopLoadDriver() function – right into the point that follows the actual call to the driver. Now, the actual call instruction address is known - a new breakpoint needs to be placed in it.

With the new breakpoint in place, it is time to load Rustock.C driver in the virtual environment controlled by the debugger. Once loaded, the debugger breaks at the call instruction in kernel’s IopLoadDriver(). Stepping into the driver, placing a new breakpoint at the end of its decoder and letting it run until it hits that breakpoint allows to unpack the code that was hidden under that decoder.

The first-layer decoder reveals us a code with a myriad of fake instructions, blocks of code that do nothing, random jumps from one place to another – a huge maze created with only one purpose – to complicate threat analysis by obfuscating and hiding the truly malicious code.

Tracing that code within debugger might be easier with the disassembly listing of that code in the user mode.

One way to get that listing is to reconstruct the driver as a PE-executable by resetting the DLL bit in its PE-header characteristics and changing its subsystem from Native (0x01) to Windows GUI (0x02) to make debugger happy to load it. Another way is to reconstruct a normal PE-executable by building and compiling an Assembler program that includes the top-level Rustock’s decryptor followed by a large stub of encoded data simply copied from the original driver code.

Buidling a PE-executable equivalent of the Rustock.C driver helps to study the code behind the first-layer decoder. Such program can now be loaded into a user-mode debugger, such as OllyDbg, the first-layer decoder can now be debugged in the user mode to unpack the code behind it. Once unpacked, the entire process can be dumped and reloaded into the disassembler.

At this point of analysis, the code behind the first-layer decoder reveals interesting occurrences of DRx registers manipulations, IN/OUT instructions, “sidt/lidt” instructions, and some other interesting code pieces - for example a code that parses an MZ/PE header:

00011C0A cmp word ptr [eax], ‘ZM’
00011759 mov bx, [eax+3Ch]
00011E31 cmp dword ptr [eax+ebx], 'EP'

The code in general now looks like “spaghetti” – and still, it’s just a second-layer decryptor. The picture below shows you its execution flow – every grey “box” in it represents a stand-alone function:

Placing the breakpoints for all the “interesting” instructions in the driver code is a good idea. The addresses need to offset by a difference between the driver’s entry point reported with a kernel debugger and the entry point of the driver’s PE-executable equivalent, as reported by the user mode debugger.

With the new breakpoints in place, the code will firstly break on the instruction that searches for an MZ-header of the ntkrnlpa.exe:

cmp word ptr [eax], ‘ZM’

In order to find the image base of ntkrnlpa.exe, Rustock.C looks up the stack to find the return address inside ntkrnlpa.exe. It rounds that address up and starts sliding it backwards by the amount of the section alignment until it reaches the image base of ntkrnlpa.exe.

Once the start of ntkrnlpa.exe is found, the driver then parses its PE-header, locates and parses the export table.

Previous variants of Rustock contained explicit imports from ntkrnlpa.exe. This time, Rustock.C obtains kernel’s exports dynamically, by parsing its memory image – the same trick was widely used by the user-mode malware in the past, when the kernel32.dll’s exports were dynamically obtained during run-time by using the hash values of the export names.

The fragment of Rustock’s second-layer decryptor below parses kernel’s export table:

Now that it knows kernel exports, the driver calls ExAllocatePoolWithQuotaTag() to allocate 228,381 bytes in the non-paged pool ( tagged as “Info@”).

The rootkit code then copies itself into that pool and jumps in it to continue its execution from that place.

During the execution, Rustock.C repeats the same trick again – it allocates another 278,528 bytes in the non-paged pool, copies itself into it and transfers there control. This way, the code of the driver "migrates" from one memory location to another. While the "abandoned" areas preserve the severely permutated code, and thus, not easily suitable for scanning, the addresses of the newly allocated areas in the non-paged pool cannot be predicted. Thus, even if the infected driver and its address range in the kernel are established, it is still not clear where the final "detectable" form of Rustock.C code is located.

Following memory allocation tricks, Rustock employs “lidt/sidt” instructions to patch IDT. Executing “lidt” in WinDbg might crash the operating system in the virtual machine. Therefore, “lidt” instruction needs to be skipped (by patching EIP with the address of the next instruction).

Another set of instructions that are better to be skipped with the debugger, are DRx-registers manipulations. By zeroing the debug registers Dr0-Dr3 and the debug control register DR7, the rootkit might attempt to cause trouble for SoftIce – any suspicious instructions need to be skipped for safety reasons.

Following that, Rustock.C driver reads the configuration of devices on a PCI bus by using IN/OUT instructions with the PCI_CONFIG_ADDR and PCI_CONFIG_DATA constants. It then starts a few nested loops to read certain data from the devices attached to a PCI bus. The read data is then hashed with the purpose of creating a footprint that uniquely identifies hardware of the infected host.

Debugging the Rustock.C driver is easier if the successful code execution path is saved into a map (e.g. a hand-written one). Every successfully terminated loop should be reflected in that map. The relative virtual addresses recorded in it allow skipping long loops when the code is analysed again from the beginning – they should be considered “the milestones” of the code flow. If a wrong move crashes the system – the virtual machine needs to be reverted to a clean snapshot, debugger restarted, and the entire debugging process repeated by using the successful “milestones” from the map.

The map of the execution “milestones” should tell what to skip, when to break, what to patch, where to jump – in order to navigate the code successfully through all the traps that the authors of Rustock has set against emulators, debuggers, run-time code modifications, etc.

Whenever the driver attempts to access data at a non-existing address, the code needs to be unwound backwards to establish the reason why the address is wrong. In most cases, following the logics of the code helps to understand what values should replace the wrong addresses.

For example, at one point of execution, Rustock.C driver crashes the session under WinDbg by calling the following instruction while the contents of ESI is not a valid address:

mov esi, dword ptr [esi]

In order to “guide” the code through this crash, the driver needs to be re-analysed from the very beginning to check if this instruction is successfully called before the failure and if it does, what the valid contents of ESI is at that moment of time.

As stated above, the PE-executable equivalent of the driver loaded into the user-mode debugger and disassembler helps to navigate through the code, search instructions in it, search for the code byte sequences, place comments - a good helper for the kernel debugging.

The code of Rustock.C debugged at this stage is a 2nd-layer decryptor that will eventually allocate another buffer in the non-paged pool where it will decrypt the final, but still, ridiculously permutated “spaghetti” code of the driver – this time, with the well-recognizable strings, as shown in the following dumps:

PS: Special thanks to Frank Boldewin for exchanging his tips and ideas with me.