Covert channels in spam

In the aftermath of the 9/11 terror attacks, politicians of many democratic countries were eager to establish an Internet surveillance infrastructure of proportions unheard of ever before. The collective hysteria and public demand for control, control, and more control, … were just too useful to be left unexploited by our governments.

Everyone who hasn’t been living under a rock for the last 10+ years is painfully aware of the sad truth that we’re heading towards an Orwellian 1984-ish global society, and there’s little one can do to prevent it. In the EU and other regions, data retention laws have been enacted to facilitate traffic analysis. The official aim of such laws is to discover terrorist networks, but once the technical infrastructure is in place, it can (and will) also be used to pinpoint political opponents, regime dissidents, or any group that needs a good spanking according to your lobbyist du jour. Right now, file sharers could be a prime target for traffic analysis. Tomorrow, it could be an ethnic minority, and soon after that, it could be just everybody.

Since we’re talking about traffic analysis, one particular kind of traffic has been largely ignored so far: the gargantuan amount of spam mails. Sending spam is illegal in most countries, but is one of those crimes that are not seriously prosecuted, if they are prosecuted at all! In some parts of the world, it is even officially semi-tolerated: the strongest proponent of Internet surveillance, the US government under president G. W. Bush, didn’t deem it necessary to curtail spam with enough energy and dedication. On the contrary: it is a much more serious crime in the US to share a couple of music files without license (sic!) than to operate a spam spewing company (thank CAN-SPAM Act and the direct marketers who actively promoted it).

Spam and traffic analysis

How then does spam relate to traffic analysis? And, more importantly, how can spam be used as a hidden channel for terrorists, political opponents, file sharers…?

Let’s look at spam from the perspective of a traffic analyst:

  1. it is sent from a huge population of innocuous end users PCs (botnets),
  2. it is sent to a huge number of recipient mail servers,
  3. it is seldom rejected upfront with a 5xx SMTP error code,
  4. it is “always on”, i.e. the stream is more or less constant in time.

How is that useful to defeat traffic analysis, if used to carry real information besides spam? Assuming that the payload of the message is disguised (encrypted low-bandwidth steganography) and can’t be directly identified as non-spam, we have the following arguments:

1. As nearly every personal PC runs Windows nowadays, it can also be infected by a troyan or virus, and become part of a botnet. If Alice wants to send a hidden message to Bob, she could intentionally become part of a botnet and send out 100,000+ messages, but just one of them to Bob. If asked about her contact with Bob, Alice could deny any personal contact with the excuse: “What? Who is Bob? I didn’t communicate with Bob!” When shown logs of the surveillance infrastructure, she could argue that her PC was infected (these logs would indeed show those 100,000+ outbound messages and confirm her claim), and her lawyer can claim that she can’t be hold responsible, at least not for contacting Bob directly and intentionally.

2. If Alice wants to send a hidden message to Bob, but doesn’t know where Bob is located (worst case: compartimentalized sleeper cells, average case: anonymous file sharer network), she can use her botnet client to spread her message all over the Net. No matter where Bob is sitting, he’ll eventually receive the message as part of the batch run. That’s even better than posting a message in a newsgroup on Usenet: reading news is geeky and requires access to an NNTP server (which is easily detectable, even if done over an encrypted channel), while getting spam is so common, that it can be easily overlooked or filtered out as irrelevant by traffic analysts.

3. Most mail servers are poorly configured, and don’t reject spams outright with a clear 5xx error code. Instead, they first accept the messages, and process them through SpamAssassin or similar scanners. Probable spam is then dropped into a spam folder and can be read by end users. Assuming that an analyst observes Bob’s traffic, she can’t tell which spam messages Bob did actively read, and which message he or his spam filtering software simply deleted beforehand. From the point of view of the analyst, ALL spam messages can potentially carry a useful payload. But this is precisely why the analyst can’t find out who Alice was among the thousands of senders.

4. One particular aspect of traffic analysis is the temporal domain. Let’s assume that Alice and Bob communicate rarely, but one day, they start to communicate frequently. To an analyst, this is tell-tale sign that something is brewing: in the worst case, a sleeper cell is being activated and now planning some bad deeds; in the harmless case, file sharers are now super-seeding the newest not-yet-released blockbuster movie among themselves. But since spam is “always on”, it is easy to slip payload messages into the stream, without modifying its temporal characteristics. Or, looking at it from the other side, it is not necessary to pad the channel by constantly sending dummy messages: those are already being sent routinely as part of the spam background noise. For a low bandwidth hidden channel (useless for the fast movie-sharing case, but ideally suited for e-mails), adding to the noise remains undetectable.

It should now be obvious that spam is a powerful medium for a hidden channel. But there’s more: one doesn’t even have to send an information in the payload of the spam message, as this could be detected: indeed, the messages of a batch run are usually identical for economical reasons, and any hand-crafted “spamogram” would be visible like a sore thumb, no matter how elegant the forgery is. And here again, the hidden channel can be simply the temporal domain.

How? Let’s assume that Alice controls a botnet. She could modulate the sending towards Bob’s mail server(s) by carefully selecting the sending zombies. If Bob received spam from the IP addresses IP1, IP2, IP3, IP4, IP5 (in that order, in a special rhythm), no matter what’s in the payload of the spam messages, this alone could be a signal in and of itself. Think of a kind of distributed port knocking. Using this technique systematically and throwing in cryptographic techniques to hide any periodic patterns, one can establish an extremely effective covert channel (of low bandwidth) that no traffic analyst would be able to reliably detect. While acquiring and controlling a botnet is not for the faint of heart, it is possible. Furthermore, the Bad Guys(tm) do have more than enough criminal energy and funding to rent a botnet from your friendly Russian Business Network.

Conclusion

Instead of embarking on a detailed analysis of the technical details of those hidden channels with or without payloads and using or not using the temporal domain (like, say, performing a mathematical analysis of their capacity, resilience, etc.) or programming a prototype proof of concept, let me return to one political aspect of those Orwellian surveillance laws and infrastructures, which I deem more important than the technical issues (however interesting those are):

If politicians really cared about security, they’d combat spam with the same eagerness they show against illegal file sharers… if not even more so. Spam is not only a huge drag on the productivity of zillions of computer users, it can be used to hide suspicious activity, like allowing terrorists to communicate securely and undetected. Spam hurts everyone, individuals and corporations alike, while copyright infringement hurts only a relatively small (though powerful) content industry. Unfortunately, the relation of spam to terrorism hasn’t been highlighted enough in the past, though it deserves a lot more attention.

On second thought, it’s perhaps better so: should the governmental repression and meddling in the Internet ever become unbearable, we’d be thankful for spam as the last refuge of untrackable anonymous communications.