The Sunday Times Mirror

Recently we launched an investigation of spam email and I promised you that we would talk about tracking e-mails to their senders.

Well, every email message consists of two parts, the body and the header. The header can be thought of as the envelope of the message, containing the address of the sender, the recipient, the subject and other information. The body contains the actual text and the attachments. Some header information usually displayed by your email programme includes:

From: - The sender's name and email address.
To: - The recipient's name and email address.
Date: - The date when the message was sent.
Subject: - The subject line.

The actual delivery of emails does not depend on any of these headers, they are just convenience. Usually, the ‘From’ line, for example, will be set to the sender's address. This lets you know who the message is from and can reply easily. Spammers want to make sure you cannot reply easily, and certainly don't want you to know who they are. So they insert false email addresses in the ‘From’ lines of their junk messages.

So the ‘From’ line is useless if we want to determine the real source of an email. Fortunately, we need not rely on it. The headers of every email message also contain ‘Received’ lines. These are not usually displayed by email programs by default, but they can be very helpful in tracing spam.

Just like a postal letter will go through a number of post offices on its way from sender to recipient, an email message is processed and forwarded by several mail servers.

Imagine every post office putting a special stamp on each letter. The stamp would say exactly when the letter was received, where it came from and where it was forwarded to by the post office. If you got the letter, you could determine the exact path taken by the letter. This is exactly what happens with E-mail.

As a mail server processes a message, it adds a special line, the ‘Received’ line to the message's header. The ‘Received’ line contains, most interestingly,

aThe server name and IP address of the machine the server received the message from and aThe name of the mail server itself.

The ‘Received’ line is always inserted at the top of the message headers. If we want to reconstruct an e-mail's journey from sender to recipient we also start at the topmost ‘Received’ line and work our way down until we have arrived at the last one, which is where the email originated.

Spammers know that we will apply exactly this procedure to uncover their whereabouts. So, to fool us, they may insert forged ‘Received’ lines that point to somebody else sending the message.

Since every mail server will always put its ‘Received’ line at the top, the spammer's forged headers can only be at the bottom of the ‘Received’ line chain. This is why we start our analysis at the top and don't just derive the point where an email originated from the first ‘Received’ line (at the bottom).

The forged ‘Received’ lines inserted by spammers to fool us will look like all the other ‘Received’ lines (unless they make an obvious mistake, of course). By itself, you can't tell a forged ‘Received’ line from a genuine one. This is where one distinct feature of ‘Received’ lines come into play. As we've noted above, every server will not only note who it is but also where it got the message from (in IP address form).

We simply compare who a server claims to be, with what the server one notch up in the chain says it really is. If the two don't match, the earlier ‘Received’ line has been forged. In this case, the origin of the email is what the server immediately after the forged ‘Received’ line has to say about who it got the message from.

Now that we know how emails work in theory, let's see how analysing a junk email to identify its origin works in real life.

I've just received an exemplary piece of spam that we can use for exercise. Here are the header lines:

Received: from unknown (HELO 38.118.132.100) (62.105.106.207) by mail1.infinology.com with SMTP; 16 Nov 2003 19:50:37 -0000
Received: from [235.16.47.37] by 38.118.132.100 id <5416176-86323>; Sun, 16 Nov 2003 13:38:22 -0600
Message-ID: <o7-89089$t--2-370--h6b1@y07l72.olpvl>
From: "Reinaldo Gilliam"<27knxeppzk@yahoo.com>
Reply-To: "Reinaldo Gilliam" <27knxeppzk@yahoo.com>
To: ladedu@ladedu.com
Subject: Category A Get the meds u need lgvkalfnqnh bbk
Date: Sun, 16 Nov 2003 13:38:22 GMT
X-Mailer: Internet Mail Service (5.5.2650.21)
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="9B_9.._C_2EA.0DD_23"X-Priority: 3
X-MSMail-Priority: Normal

First, take a look at the - forged - ‘From’ line.

The spammer wants to make it look as if the message was sent from a Yahoo! Mail account. Together with the ‘Reply-To’ line, this ‘From’ address is aimed at directing all bouncing messages and angry replies to a non-existing Yahoo! Mail account.

Next, the ‘Subject’ is a curious agglomeration of random characters. It is barely legible and obviously designed to fool spam filters (every message gets a slightly different set of random characters), but it is also quite skilfully crafted to get the message across in spite of this.

Finally, the ‘Received’ lines. Let's begin with the oldest, ‘Received’ from [235.16.47.37] by 38.118.132.100 id <5416176-86323>; Sun, 16 Nov 2003 13:38:22 -0600.

There are no host names in it, but two IP addresses: 38.118.132.100 claims to have received the message from 235.16.47.37. If this is correct, 235.16.47.37 is where the email originated, and we'd find out which ISP this IP address belongs to, then send an abuse report to them.

Let's see if the next (and in this case last) server in the chain confirms the first ‘Received’ line's claims: ‘Received’ from unknown (HELO 38.118.142.100) (62.105.106.207) by mail1.infinology.com with SMTP; 16 Nov 2003 19:50:37 -0000. Since mail1.infinology.com is the last server in the chain and indeed ‘my’ server, I know that I can trust it. It has received the message from an ‘unknown’ host that claimed to have the IP address 38.118.132.100 (using the SMTP HELO command). So far, this is in line with what the previous ‘Received’ line said.

Now let's see where my mail server did get the message from. To find out, we take a look at the IP address in brackets immediately before by mail1.infinology.com. This is the IP address the connection was established from, and it is not 38.118.132.100. No, 62.105.106.207 is where this piece of junk mail was sent from.

With this information, you can now identify the spammer's ISP and report the unsolicited email to them so they can kick the spammer off the net.