Tuesday, May 27, 2008

Flash Exploit Goes Wild

As predicted recently, a Flash exploit discovered by Mark Dowd (IBM X-Force) did not keep us waiting for too long until it started popping up in numerous "in-the-wild" web page infections. As per initial SecurityFocus/Symantec assessment, there are from 20,000 to 250,000 web pages currently affected with this exploit.

Let's have a look into the binary contents of an SWF file pulled from one of the infected web sites.

The output provided by swfdump tool (which is part of a toolset SWFTools) suggests that the file has 2 streams in the DefineSceneAndFrameLabelData tags.

The tag DefineSceneAndFrameLabelData is defined by Adobe specification as a record header (type 86) followed by the number of scenes (N), followed with a frame offset and a scene name for all N scenes.

The binary contents of DefineSceneAndFrameLabelData tags is:

[056]        40 SCENEDESCRIPTION
a6 e1 8a a0 08 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 43

[056]        12 SCENEDESCRIPTION
01 00 e5 9c ba e6 99 af 20 31 00 00

where [056] is the tag ID of DefineSceneAndFrameLabelData (86 in decimal), and 40 and 12 are respective sizes of the streams.

According to IBM X-Force research paper, a vulnerable function in Adobe Flash DLL checks the "scene number" value in a "greater than zero" conditional jump. Such comparison assumes that the checked value is signed, thus, a negative "scene number" value will always pass the check for zero.

In order to be "negative", the "scene number" value should have its left-most bit set to 1. Thus, any integer 0x80000000 or greater would pass the check since the value is less than zero.

The "scene number" value is stored by Adobe in EncodedU32 format. SWF file format specification discloses an algorithm of "unpacking" the EncodedU32 value.

By building a tool with the same algorithm, we can check the actual unsigned integer values of "scene numbers" specified in the vulnerable SWF file:

unsigned char buf1[]= "\x01\x00\xe5\x9c\xba\xe6\x99\xaf\x20\x31\x00\x00";
unsigned char buf2[]= "\xa6\xe1\x8a\xa0\x08\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x43";

int iResult1 = GetEncodedU32(buf1);
int iResult2 = GetEncodedU32(buf2);

where GetEncodedU32() function is identical to the one specified by Adobe.

The code above will return iResult=1, and iResult2=0x8402b0a6, thus making iResult2 a "negative" value (with the left-most bit set to 1) that will always bypass "greater than zero" conditional jump regardless of the positive amount of data remaining in the buffer.

This will lead to a failure of a subsequent memory allocation function.

The malformed DefineSceneAndFrameLabelData tag with the negative "scene number" is also accompanied with the malformed DefineBits tag (0x06).

Swfdump tool displays the binary contents of DefineBits stream as shown below:

[006]       336 DEFINEBITS

aa 02 34 d1 f5 25 13 90 00 90 90 90 90 90 20 cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 60 (etc.)

where [006] is the tag ID DefineBits, and 336 is its length.

This tag contains the actual ShellCode that provides the payload for this exploit.

In order to analyse the ShellCode, the entire DefineBits stream (336 bytes) needs to be loaded into the disassembler. For start, it obtains the image base of kernel32.dll:

Then, it parses the export tables of kernel32.dll and urlmon.dll and retrieves RVAs of the exports that match the hard-coded hash values:

As the analysis of the shellcode was static, there was an assumption made that the code would need to dynamically retrieve RVAs for the following APIs: kernel32.dll->LoadLibrary(), kernel32.dll->WinExec(), urlmon.dll->URLDownloadToFile() with the following purpose:

  • download an executable from the URL: http://user1.[blocked].net/bak.css
  • save it as C:\6123t.exe
  • execute it and (optionally) terminate the host process

When run, the downloaded file drops a text file into the newly created directory %Windir%\Nt_File_Temp\

The text file contains a list of URLs that the downloader will again download and execute.

By cross-referencing this behaviour with other threats reported by ThreatExpert automation, and analysing the contents of the downloaded files, it was established that such behaviour is common for the password stealing trojan OnlineGames/GamPass/LegMir, trojan Trojan.Drondog, and a rootkit Rootkit.Order.

In conclusion, here is some food for thought:

According to these, these and these stats, in US alone, among 212,080,135 of its active Internet users, 207,838,532 users have Flash Player installed (98%); among which 187,054,679 users run Microsoft Windows (90%), among which 102,880,073 (55%) users do not run the latest Flash Player.

Sunday, May 18, 2008

Rustock.C – Unpacking a Nested Doll

Unpacking Rustock.C is a challenging task. If you are tired of boring crosswords or Sudoku puzzles and feel like your brain needs a real exercise, think about reversing Rustock.C - satisfaction (or dissatisfaction, depending on the result) is guaranteed.

Rustock.C story began a week ago – when one AV vendor has publicly disclosed the new details about the latest variant of Rustock. As soon as the sample of Rustock.C has been obtained, many researchers started their journey into the center of the rootkit.

First quick look at the driver code reveals a simple decoder. In spite of being simple, it is still a good idea to debug it to see what exactly it produces on its output.

In order to debug a driver, different malware researchers prefer different tools – in our case let’s start from WinDbg configured to debug a VMWare session running in debug mode. For more details of this set up, please read this article.

The very first question one might ask is how to put a breakpoint into the very beginning of the driver code?

Some researchers would hook IopLoadDriver() in the kernel to intercept the code before it jumps into the driver, in order to step in it by slowly tracing single instructions.

A simple known trick however, is to build a small driver (and keep it handy) with the first instruction being “int 3”. Once such driver is loaded, the debugger will pop up with the Debug Breakpoint exception. Stepping out from that place leads back into the kernel’s IopLoadDriver() function – right into the point that follows the actual call to the driver. Now, the actual call instruction address is known - a new breakpoint needs to be placed in it.

With the new breakpoint in place, it is time to load Rustock.C driver in the virtual environment controlled by the debugger. Once loaded, the debugger breaks at the call instruction in kernel’s IopLoadDriver(). Stepping into the driver, placing a new breakpoint at the end of its decoder and letting it run until it hits that breakpoint allows to unpack the code that was hidden under that decoder.

The first-layer decoder reveals us a code with a myriad of fake instructions, blocks of code that do nothing, random jumps from one place to another – a huge maze created with only one purpose – to complicate threat analysis by obfuscating and hiding the truly malicious code.

Tracing that code within debugger might be easier with the disassembly listing of that code in the user mode.

One way to get that listing is to reconstruct the driver as a PE-executable by resetting the DLL bit in its PE-header characteristics and changing its subsystem from Native (0x01) to Windows GUI (0x02) to make debugger happy to load it. Another way is to reconstruct a normal PE-executable by building and compiling an Assembler program that includes the top-level Rustock’s decryptor followed by a large stub of encoded data simply copied from the original driver code.

Buidling a PE-executable equivalent of the Rustock.C driver helps to study the code behind the first-layer decoder. Such program can now be loaded into a user-mode debugger, such as OllyDbg, the first-layer decoder can now be debugged in the user mode to unpack the code behind it. Once unpacked, the entire process can be dumped and reloaded into the disassembler.

At this point of analysis, the code behind the first-layer decoder reveals interesting occurrences of DRx registers manipulations, IN/OUT instructions, “sidt/lidt” instructions, and some other interesting code pieces - for example a code that parses an MZ/PE header:

00011C0A cmp word ptr [eax], ‘ZM’
00011759 mov bx, [eax+3Ch]
00011E31 cmp dword ptr [eax+ebx], 'EP'

The code in general now looks like “spaghetti” – and still, it’s just a second-layer decryptor. The picture below shows you its execution flow – every grey “box” in it represents a stand-alone function:

Placing the breakpoints for all the “interesting” instructions in the driver code is a good idea. The addresses need to offset by a difference between the driver’s entry point reported with a kernel debugger and the entry point of the driver’s PE-executable equivalent, as reported by the user mode debugger.

With the new breakpoints in place, the code will firstly break on the instruction that searches for an MZ-header of the ntkrnlpa.exe:

cmp word ptr [eax], ‘ZM’

In order to find the image base of ntkrnlpa.exe, Rustock.C looks up the stack to find the return address inside ntkrnlpa.exe. It rounds that address up and starts sliding it backwards by the amount of the section alignment until it reaches the image base of ntkrnlpa.exe.

Once the start of ntkrnlpa.exe is found, the driver then parses its PE-header, locates and parses the export table.

Previous variants of Rustock contained explicit imports from ntkrnlpa.exe. This time, Rustock.C obtains kernel’s exports dynamically, by parsing its memory image – the same trick was widely used by the user-mode malware in the past, when the kernel32.dll’s exports were dynamically obtained during run-time by using the hash values of the export names.

The fragment of Rustock’s second-layer decryptor below parses kernel’s export table:

Now that it knows kernel exports, the driver calls ExAllocatePoolWithQuotaTag() to allocate 228,381 bytes in the non-paged pool ( tagged as “Info@”).

The rootkit code then copies itself into that pool and jumps in it to continue its execution from that place.

During the execution, Rustock.C repeats the same trick again – it allocates another 278,528 bytes in the non-paged pool, copies itself into it and transfers there control. This way, the code of the driver "migrates" from one memory location to another. While the "abandoned" areas preserve the severely permutated code, and thus, not easily suitable for scanning, the addresses of the newly allocated areas in the non-paged pool cannot be predicted. Thus, even if the infected driver and its address range in the kernel are established, it is still not clear where the final "detectable" form of Rustock.C code is located.

Following memory allocation tricks, Rustock employs “lidt/sidt” instructions to patch IDT. Executing “lidt” in WinDbg might crash the operating system in the virtual machine. Therefore, “lidt” instruction needs to be skipped (by patching EIP with the address of the next instruction).

Another set of instructions that are better to be skipped with the debugger, are DRx-registers manipulations. By zeroing the debug registers Dr0-Dr3 and the debug control register DR7, the rootkit might attempt to cause trouble for SoftIce – any suspicious instructions need to be skipped for safety reasons.

Following that, Rustock.C driver reads the configuration of devices on a PCI bus by using IN/OUT instructions with the PCI_CONFIG_ADDR and PCI_CONFIG_DATA constants. It then starts a few nested loops to read certain data from the devices attached to a PCI bus. The read data is then hashed with the purpose of creating a footprint that uniquely identifies hardware of the infected host.

Debugging the Rustock.C driver is easier if the successful code execution path is saved into a map (e.g. a hand-written one). Every successfully terminated loop should be reflected in that map. The relative virtual addresses recorded in it allow skipping long loops when the code is analysed again from the beginning – they should be considered “the milestones” of the code flow. If a wrong move crashes the system – the virtual machine needs to be reverted to a clean snapshot, debugger restarted, and the entire debugging process repeated by using the successful “milestones” from the map.

The map of the execution “milestones” should tell what to skip, when to break, what to patch, where to jump – in order to navigate the code successfully through all the traps that the authors of Rustock has set against emulators, debuggers, run-time code modifications, etc.

Whenever the driver attempts to access data at a non-existing address, the code needs to be unwound backwards to establish the reason why the address is wrong. In most cases, following the logics of the code helps to understand what values should replace the wrong addresses.

For example, at one point of execution, Rustock.C driver crashes the session under WinDbg by calling the following instruction while the contents of ESI is not a valid address:

mov esi, dword ptr [esi]

In order to “guide” the code through this crash, the driver needs to be re-analysed from the very beginning to check if this instruction is successfully called before the failure and if it does, what the valid contents of ESI is at that moment of time.

As stated above, the PE-executable equivalent of the driver loaded into the user-mode debugger and disassembler helps to navigate through the code, search instructions in it, search for the code byte sequences, place comments - a good helper for the kernel debugging.

The code of Rustock.C debugged at this stage is a 2nd-layer decryptor that will eventually allocate another buffer in the non-paged pool where it will decrypt the final, but still, ridiculously permutated “spaghetti” code of the driver – this time, with the well-recognizable strings, as shown in the following dumps:

PS: Special thanks to Frank Boldewin for exchanging his tips and ideas with me.

Wednesday, May 7, 2008

Memory Stealthiness of Kraken

A new variant of Kraken (v317) demonstrates extremely stealthy memory techniques.

This time, it dynamically decodes the chunks of code and data only when it needs them, leaving no traces behind that could be suitable for generic memory signatures.

The total amount of memory that Kraken consumes was measured with a tool specially built for this purpose. The tool simply checked its total memory consumption every 100ms, from starting the executable untill it reached its active phase. With every check, the tool also scanned the entire scope of its address space (including all modules and heap) by looking for a string "yi.org" which is known from a dynamic analysis of this bot.

The tool produced an interesting result that is shown below:

As the graph suggests, the Kraken executable spends considerable amount of time to "shake off" emulators from its tail. But even when it achieves its active payload phase, it still does not expose its original strings.

The vertical red lines on the chart represent the occurrence of the string "yi.org", which is a part of a dynamic DNS name that it generates. Thus, it pulls the name, works with it, then destroys it, keeping the amount of data suitable for a generic detection as low as possible. Well, if APIs could accept encrypted parameters, it would surely feed them encrypted, but they don't, thus, Kraken has no choice left as to decrypt them only when it calls APIs, and only after it reaches its active payload phase. Pretty impressive stealthiness for memory.

The next image shows the contents of a small heap fragment at the same address, analysed every 100ms:

That narrow "window" is all that Kraken exposes for memory contents analysis, making generic memory signature-based detection unreliable.

Another aspect worth noting is that the new Kraken now has its cryptography based on "LibTomMath", an open-source library.

As for the system info it collects and reports - here is its format, including the Kraken version number:

  <windowsversion>5.1.2600 Pro</windowsversion>
  <cpu> Intel(R) Pentium(R) 4 CPU 3.20GHz (3193 MHz)</cpu>

Tuesday, May 6, 2008

New Storm on the horizon – now even Microsoft cannot detect it

The new version of Storm that was firstly seen over the last weekend now sends a clear message that the Storm group is not ready to give up, in spite of recent reports that Microsoft has used the power of its auto-updates to roll out the Storm bot killer.

Being very similar to its predecessors, the new variant can be distinguished by its deployment method – and that is, the iframe injections.

An iframe with a link to a remote malicious script can be inserted into a blog post so that every reader of that post may have its browser attempting to execute that script.

In order to do nasty things on a client computer, the remote script needs to elevate its privileges. It attempts to do so by relying on a buggy code that is already running inside the client's browser – the buggy (and therefore, vulnerable) ActiveX applets.

The obfuscated script that attempts to install Storm on the client machines targets 8 different ActiveX vulnerabilities.

  • One vulnerability that the Storm script targets, exists in the MySpace ActiveX component that is used to upload images and files. When this vulnerability was discovered 3 months ago, the manufacturer of this component – company Aurigma - mentioned in their reply that their ActiveX uploader was used by hundreds of millions of users over the period of 5 years.

    What it means is that those MySpace users who are still running the older MySpace ActiveX component to upload their images and files, are directly exposed to risk of turning their computers into zombies just by visiting legitimate sites that happen to have the injected iframes (e.g. via malformed blog posts).

  • Another vulnerability that the Storm deployment script attempts to exploit (CVE-2008-0647) is a stack-based buffer overflow in the HanGamePluginCn18 ActiveX control of Ourgame GLWorld (aka Lianzong Game Platform), caused by passing a long argument to its hgs_startNotify() method.

Other exploits the Storm script relies on are:

  • America Online SuperBuddy ActiveX Control Code Execution Vulnerability

  • Real Networks RealPlayer ActiveX Control Heap Corruption Exploit

  • IE 6/Microsoft Html Popup Window (mshtml.dll) DoS Exploit

  • DirectAnimation.PathControl COM object (daxctle.ocx) Exploit

  • Exploit that exists in 2 ActiveX HotBar components, by Zango Inc.

    (that must be the most unusual deployment method used by Storm)

  • MDAC ActiveX Code Execution Exploit

Since last weekend, there were only 5 unique samples of the new Storm seen in the wild. As mentioned above, the new variant is almost identical to the previous builds. As seen in this report, the new Storm now uses filenames libor.exe and gogora.config.

VirusTotal results are low as usual (22%).