Wednesday, December 17, 2008

How to Defeat Koobface


As published in the previous blog post, analysis of the current version of Koobface uncovered a very interesting part about it – its "ability" to resolve CAPTCHA protection at the Facebook web site. To put it simply, if Koobface was unable to resolve Facebook’s CAPTCHA protection, it would’ve been unable replicating because in order to submit a new message, one needs to resolve CAPTCHA image first.

Every time Koobface runs into CAPTCHA protection at Facebook, it transfers that image to its command-and-control server. From there, the image is relayed to an army of CAPTCHA resolvers, who work day and night ready to pick up a new image from their profile, solve it, submit an answer, and get paid something like 0.5 cent for the answer.

You wonder if it's financially sustainable?

Think about it this way: according to the World Bank, at least 80% of humanity lives on less than $10 a day. In the same time, web resources like this one, give its users an opportunity to make that kind of money ($9) in three hours by resolving CAPTCHA images relayed to them. Don’t you think the potential army of CAPTCHA resolvers has all the reasons to grow?

Detailed analysis of traffic between Koobface and its command-and-control server allowed tapping into its communication channel and injecting various CAPTCHA images in it to assess response time and accuracy. The results are astonishing – the remote site resolved them all.

But here is a twist: uploading a large number of random CAPTCHA images into its communication channel will load its processing capacity, potentially up to a denial-of-service point. Well, if not that far, then at least it could potentially harm its business model, considering that the cost of resolving all those injected images would eventually be paid by the Koobface gang.

The tapping mechanism is best illustrated with the following scheme:



There was a tool specifically built to upload CAPTCHA images to the Kobface C&C server and receive the responses. It is available for download here (the ZIP file contains a few test images to upload).

The tool opens up an interesting "dialog" with the back-end operators, a dialog with some interesting discoveries.

At first, the response clearly looks like it was produced by automation:



As seen in this example, the automation tried to OCR the image (which contains a very specific Russian word) – it’s very unlikely that a human would have provided such answer.

Trying to submit it images with the provocative phrases had no luck either – the remote server resolves them vigorously – as if it was a bot, or maybe a smart operator instructed to reply as if he or she was a bot:



But given that no automation can presumably handle really complex images – images that are difficult even for humans to resolve, let’s try to submit with the tool the more complex ones. Here are the results:



As seen on the picture, all Facebook’s CAPTCHAs were resolved pretty well.

But here are a couple of bloopers – these images were resubmitted because the original answers were totally wrong:



Let’s see how it withstands Google’s CAPTCHAs. Here is another blooper revealed:



The wrong answers like "edtgted rghf", "edrfb dfbn", "dfgd dfg", and "asdf df" mean it was not an automation. Otherwise, it would have tried to resolve the images at least partially, or maybe provided nonsense for the noise detected in the picture or any other answer suggesting it was a bot. In the end, the wrong answers would have been at least consistent across several attempts.

These wrong answers simply mean someone was hitting the keyboard (check these keys location), giving those pictures up as too complex puzzles that require too much time/attention, in order to proceed to the easier ones.

These results could mean that the back-end CAPTCHA server has a queue of CAPTCHA images to resolve, and in front of that queue there must be an automation that firstly tries to resolve CAPTCHAs automatically, by using optic image recognition techniques. If the automation fails, it then passes the image down into the queue to be further distributed and picked up by an operator to be processed manually. Such relaying obviously has no method to oppose, as it destroys the very meaning of CAPTCHA – to distinguish a bot from a human. By having them eventually processed by humans, the only reason to keep CAPTCHA protection is to make the resolving process as expensive as 0.5 cent per image.

The question is: is it expensive enough to be justified at all? Probably, it’s expensive enough for the kids who build malware out of curiosity or self-determination (compare it with a trivial latch on your window). But it’s nothing for those guys who build malware for any kind of profit (case with Koobface) as more than likely they can afford 0.5 cent per image.

Taking the C&C down? Maybe, but it will rather pop up in a different place the very next day.

A different way of destroying it is via poisoning its traffic with the fake CAPTCHAs that look exactly as the ones that are passed by a valid Koobface worm. In this case, Koobface authors will be paying for every fake CAPTCHA resolved, the ones generated in the lab, not the real-wild-world ones.

Destroying it financially could be a better option in the end.