Friday, April 25, 2008

Universal CAPTCHA Cracker: a new Deep Blue or "The Turk"?

According to some recent reports, there are cases when the toughest CAPTCHA puzzles are resolved in a matter of dozens of seconds.

The new automated bots were blamed in auto-registering Windows Live Hotmail, Windows Live Mail, Google's GMail, and Google's Blogger accounts, for SPAM/malware distribution and SEO poisoning attacks.

But what CAPTCHA-cracking engine stands behind these automated bots - a new Deep Blue endowed with AI, or the "The Turk"?


  • In 1997, Deep Blue has managed to convince the world champion Garry Kasparov that the machine had made a startling move only a human could conceive (he implied that the machine had cheated because the move seemed all too "human.").

  • On the other hand, we all know "The Turk" - a legendry chess-playing machine of the late 18th century, that appeared to be able to play a strong game of chess against a human opponent, but later explained as an elaborate hoax.



One website - CaptchaBot.com - allows bot masters to log on and call its web service requesting it to crack CAPTCHA images "of any complexity" on-the-fly.

They charge 3 US cents for every CAPTCHA they crack and guarantee the response time to be less than 90 seconds.

CaptchaBot's "How it works" page contains this scheme:



As seen in the picture, the scheme implies that some mysterious brain stands behind the entire CAPTCHA cracking mechanism, and recognizes images by using OCR.

In the same time, there are some interesting web sites that allow the subscribers to make some cash by resolving CAPTCHA images ("in your spare time or while you work").

One such site - KolotiBablo.com - is interesting in particular as on many forums people actually share their own experience with it.

Some users complain that while KolotiBablo.com still advertises its service as an easy way to make $3 per hour, the real money is getting much less than that because its load is now balanced between a growing number of users, thus making them wait in the queue until they receive the next image to break.

Another site, Grand-Sale-5.com challenges KolotiBablo.com by doubling the money they pay for every manually resolved CAPTCHA.

One user claims he made $15 in 2 months, by resolving around 250 CAPTCHA images every day.

Now try to imagine a kiddo who managed to crack 15,000 CAPTCHAs in 2 months:



Wednesday, April 23, 2008

"Bobax" the Sheep

Tuesday, April 22, 2008

Kraken is Finally Cracked

The previous post provided a snapshot of the Kraken code responsible for generating dynamic DNS names.

As it was mentioned, those names are pseudo-random as their original seed remains the same.

ThreatExpert system reports the list of DNS names, but this list is not full.

If Kraken is left “running” by ThreatExpert for a bit longer, it will eventually generate a couple of thousand of unique host names.

What are those names? Are they important? Is it possible to predict them?

Once we have the disassembled code of Kraken and once we know the seed of its randomizer, it’s a trivial task to build our own tool around that code.

This source code is a quick-and-dirty port of Kraken’s DNS name randomizer.

It generates and prints on screen 10,000 dynamic DNS names, that the latest Kraken variant uses to address its C&C servers, but that limit can easily be changed.

The tool can be compiled with any version of VC++.

The text output of this tool is provided here.

Monday, April 21, 2008

Kraken changes tactics

A new variant of Kraken/Bobax bot, firstly seen in the wild on 14th April 2008, seem to be gaining a bit of power: over the last week-end, our ThreatExpert system has received around 50 of unique samples of it, and we're still getting them at the same pace - 20-25 of new samples a day.

In the new variant of Kraken, dumping its c.dll module from the heap of its own process is a bit trickier due to the fact that its PE-header is now wiped out. Thus, restoring the module's imports is not straight-forward. You can still see its strings in the main process module, but to dump its code, look for a heap page that is 0x1B000 bytes in size. Otherwise, all you'll find in the main process module is the code of the packer itself.

For example, look for a page that the packer allocates on the heap and extracts to at the address range of 0x1DF0000-0x1E0B000.

Once the code is located, let's see what it's doing.

To hook itself into the system, the previous variant registered itself as a service with the fixed display name "Print Spooler Service".

The new Kraken randomly chooses its service display name from the following list:


  • SolidWorks Licensing Service

  • LXCCCustomerConnect

  • Wireless Adapter Configurator

  • DeepSight Extractor Service for NP08

  • Dell Printer Status Watcher

  • DigiCtrl

  • CMG Shield

  • Cognos ReportNet

  • CommServer

  • Compaq DMI Web Agent

  • ActiveSMART Service

  • Advanced Networking Service

  • Amazon Unbox Video Service

  • Ati HotKey

  • Aventail VPN Client

  • Axon Service

  • BlueSoleilCS

  • BT Modem Lock

  • Creative Labs Licensing

  • DQLWinService

  • Electronic Arts Licensing Service

  • Electronic Arts Licensing



In order to evade host intrusion prevention systems (e.g. firewalls), new Kraken "talks" to its command-and-control servers via HTTP protocol that relies on pseudo-random URLs.

The URLs it builds consist from several parts:

- a host name that is a pseudo-random string with a variable length from 7 to 12 characters; the algorithm that constructs this string was altered in this variant - it is provided below:



- please note that the seed it starts from is always the same; thus, it is possible to use the same algorithm and predict the entire list of its pseudo-random domain names;

- the constructed domain name is then appended with a dot (".") and then it is followed with one of the domain suffixes below (the first three suffixes have twice the chance of being selected than the following four suffixes):


  • dyndns.org

  • yi.org

  • mooo.com

  • dynserv.com

  • com

  • cc

  • net



- the dynamically constructed host name is then followed by a random resource name - the first part of this name is constructed from two sets of characters - the code is hopping from one list to another to produce a string with the properly matched vowels and consonants; it has an internal rule that dictates when to hop - that is, when to pick a random vowel and when to pick a random consonant:

"a", "o", "e", "i", "y", "u", "ou", "oo"

and

"w", "r", "t", "p", "s", "d", "f", "g", "h", "j", "k", "l", "z", "c", "v", "b", "n", "m", "qu"

- the second part of the random resource name is a string that the bot picks up from the following list of 33 common English noun, verb, adjective, and adverb suffixes:


  • able

  • al

  • ance

  • ate

  • dom

  • en

  • ence

  • ency

  • er

  • ful

  • hood

  • ible

  • ify

  • ish

  • ism

  • ist

  • ity

  • ize

  • less

  • list

  • ly

  • ment

  • ness

  • or

  • ous

  • ship

  • sion

  • tial

  • tic

  • tion

  • tive

  • ulent



- the constructed random resource name is then followed by an extension that is randomly chosen from the following list:


  • shtml

  • asp

  • pl

  • cgi

  • jsp

  • php

  • ai



As shown above, the random resource name used by the bot combines a random string with the properly matched vowels and consonants that is followed with a valid suffix.

In some way, we may call this new feature of the bot as an "Artificial English Word Generator", that follows English grammar rules and produces words that look like most of other words. For example, compare "confusulent" or "pritation" with something like "ktjptrca".

What is it for? Probably, to evade SPAM filters, or any other algorithms that can distinguish a random word by locating weird or non-common combinations of characters. If no rule or algorithm can be built to distinguish such word, then it cannot be detected, and therefore, blocked.

The bot constructs an HTTP package with the encrypted contents that is MIME-encoded and is presented as a random MIME-type archive in the HTTP header.

Kraken/Bobax POSTs that HTTP package to its C&C servers (with the pseudo-random URLs), thus making it non-trivial to detect and block such traffic, as not much is left to "hook" in it.

The HTTP traffic it generates uses different MIME types that the bot randomly selects from the following list of 22 types:


  • ai: application/postscript

  • avi: video/x-msvideo

  • bin: application/octet-stream

  • bmp: image/x-ms-bmp

  • eps: application/postscript

  • gif: image/gif

  • gtar: application/x-gtar

  • hqx: application/mac-binhex40

  • jpeg: image/jpeg

  • jpg: image/jpeg

  • mpeg: video/mpeg

  • mpg: video/mpeg

  • pdf: application/pdf

  • png: image/x-png

  • ppt: application/ms-powerpoint

  • ps: application/postscript

  • sea: application/x-stuffit

  • sit: application/x-stuffit

  • tar: application/x-tar

  • uu: application/octet-stream

  • wav: audio/x-wav

  • zip: application/zip



As demostrated above, the new factor of "randomness" in this bot makes it extremely dangerous considering how serious is its effort in concealing its traffic in order to flow with no obstruction imposed by the firewalls.

The backdoor component is left intact in the new variant - its code was copy-and-pasted from the previous variant: the same commands, the same responses.

The SPAM engine and the email collector module are also identical to the previous variant.

Virustotal.com results are not very good considering only 9 out of 32 AV scanners (28.12%) can detect this threat, among which only two can actually identify this threat explicitly.

ThreatExpert was updated to generically detect the whole Kraken family, as seen in this report.

Monday, April 7, 2008

Crikey, you’ve been Kraken!

Kraken bot, also known as Bobax, Bobic, Oderoor, Cotmonger, Hacktool.Spammer, is a template-based SPAM mailbot that was recently reported by Paul Royal, principal researcher at Atlanta-based security company Damballa (please read the reports here and here).

At ThreatExpert site, there is a slight surge in the number of submissions of this threat.

As this bot unpacks its embedded stub onto its own heap (kind of self-injection), the quickest way of analyzing it is to locate a heap page in its own process that has an approximate size of 200Kb and starts with an MZ-header, dump that page and load it into the disassembler.

As seen from the code, the control channel for this bot is established over the TCP port 447:



The backdoor component of this bot allows the following remote commands:

  • "info": report bot statistics

  • “version”: report bot version

  • “windowsversion”: report Windows version of the compromised system (as reported by GetVersionExA() API)

  • "hostname": report the local host name

  • "upspeed": report the upload speed of the bot

  • "countrycode": report what country the bot is currently running with the GetLocaleInfoA() API


Other commands instruct the bot to report various system information, such as total amount of system memory, amount of free memory, amount of time the bot is running, etc.

For example, the bot may construct and send back the collected information in the form of the following XML file:

<info>
  <first>1</first>
  <userdata>nine</userdata>
  <version>315</version>
  <windowsversion>5.1.2600 Pro</windowsversion>
  <xpsp2>1</xpsp2>
  <connectionlimit>10</connectionlimit>
  <hostname>ComputerName</hostname>
  <upspeed>0</upspeed>
  <countrycode>1</countrycode>
  <language>en</language>
  <hostname>ComputerName</hostname>
  <cpu> Intel(R) Pentium(R) 4 CPU 3.20GHz (3193 MHz)</cpu>
  <memtotal>1024</memtotal>
  <memavailable>538</memavailable>
  <uptime>17024</uptime>
</info>

In order to retrieve the connection limit of the current system, the bot checks the content of the file tcpip.sys (as it is known, patching this file on Windows XP system may extend the concurrent connection limit).

By the way, one of the encrypted strings it uses has characters "odneRO0R" in it:



which explains why some vendors named it "Oderoor".

Another remote command instruct the bot to enumerate the file system, and harvest email addresses from the files with the following extensions:

• 123
• asm
• chm
• cpp
• csv
• dbf
• dif
• doc
• eps
• h
• htm
• html
• hwp
• inc
• info
• jtd
• nfo
• ott
• pdf
• php
• ps
• rtf
• sdc
• sdw
• slk
• sxw
• sys
• tmp
• txt
• wab
• wk1
• wks
• wpd
• wps
• xml

The bot has an internal client SMTP engine that it engages in sending out SPAM.

Wednesday, April 2, 2008

New Little Feature


There was a new feature added to ThreatExpert reports that some researchers might find useful.

Whenever ThreatExpert comes across a filename or a threatname in a report, it will check if that name was previously mentioned in other reports.

If it was mentioned, such name will be accompanied with a link to a page that enlists any findings associated with that name:



All filenames and threat aliases are cross-referenced by MD5.

In a certain way it is similar to VGrep.

For example, searching for "Puper" and clicking its threat name inside any report will bring you to a page, where you will be able to see how other vendors are detecting the same threat (e.g. Zlob/Popuper/Vapsup), where this threat is likely to be coming from, and how many incidents were registered at threatexpert.com.

Tuesday, April 1, 2008

Piece-of-Cake Storm Detection


Storm (aka Peacomm/Nuwar/Zhelatin/Tibs) is known to be repacked every few minutes. Even if its sample is modified, recompiled and then re-packed with a different packer, it can still be recognized as Storm by looking at its memory contents.

Such “recognition” is possible due to the same patterns that repeat in its memory from generation to generation. As long as they can be recognized visually, the automation can recognize them too.

ThreatExpert recognises these patterns during the static analysis of the memory contents of Storm, which is obtained during the dynamic phase of the analysis.

To show these repeating patterns, let's consider two samples of Storm: an earlier one (circa January, 2008) and the latest one (the “April Fool’s” one).

What these 2 samples have in common?

They both are the droppers.

In the first case, the dropper injects the code into the process services.exe, as seen in the report:



ThreatExpert automation will inspect two newly created memory objects in this case: the dropper's process itself and the newly allocated pages on the heap of services.exe, where the injected code is running:



The memory content of the dropper’s process resembles an unpacked image of the original file. However, the stub that it carries inside for further injection into services.exe is still packed. Thus, it’s difficult to find any effective signatures in the dropper’s memory itself (they would not be generic enough).

On the other hand, the code injected into services.exe contains an unpacked image of the packed stub contained in the dropper. The injected code is a perfect place to choose generic Storm signatures:



The new (“April Fool’s”) Storm uses a different memory model – the dropper will drop and load a DLL file testdll_f.dll into its own dropper process - this can be seen very well in the report:



ThreatExpert automation inspects this new model as shown below:



As in the first case, the new dropper’s process also has no valuable signatures to select. The only one more or less appropriate signature in it is:



But it’s too short and is not generic, e.g. the next Storm variant will use a different name of the DLL that it drops and loads, thus this signature won’t be detected. Because of this reason, we will ignore it and have a closer look at the memory contents of its DLL once it’s loaded into the dropper’s process:



If you look closer, the memory content of this DLL is very similar to the previously shown code injected into services.exe. It’s not identical, but the patterns are the same.

The picture below simply compares the final memory contents of both samples of Storm, side-by-side, to show what patterns are identical:



Recognizing the patterns that are constant across multiple generations of Storm guarantees a reliable generic detection of all Storm samples + any zero-day samples that the author of Storm will release in the future.

What kind of packer was used, how many encryption layers were added, what memory injection mechanism was chosen – all of this is irrelevant. In order to run, it needs to unpack itself at some point, and once it does so, ThreatExpert will inspect its memory contents: be it a process, a module, or a heap page, and recognize the common Storm patterns.

The analysis of samples from two different generations clearly shows that Storm is not evolving at all, apart from trying various dumb tricks.

In other words, "And you, no matter what position, will never make a fine musician", a Russian fable writer Krylov once said about the quartet that tried to play better music by switching seats.

New Storm, Old Song


The new Storm (the “April Fool’s” one), also known as a CME-711/Peacomm/Nuwar/Zhelatin/Tibs, uses a cheap trick of dropping and loading a DLL named testdll_f.dll, where now all Storm’s functionality resides.

Interestingly enough, ThreatExpert Memory Scanner detected and reported the new Storm with the stone-age memory signatures, as shown below:



ThreatExpert Automation was tweaked to report the new Storm in a more efficient way.

Now, the details of the peer-to-peer botnet used by this threat are enlisted, alone with the file extensions it considers for harvesting email addresses and the email addresses it avoids touching.

For more information, please review the latest ThreatExpert report.