DATA LOSS PREVENTION
RISE OF THE 'BUSINESS WORM'?

MIMEsweeper EMAIL MANAGED SERVICE
BE AFRAID. BE VERY AFRAID.

Spam-advertised Domain Names and the SURBL

By John Graham-Cumming

It's a pretty simple and obvious idea: most spam contains a URL leading you to the spammer's web site. If you could just build a list of those “spam advertised domains” then it would be trivial to build a spam filter. If a message contains a spammy URL then it's spam, if it doesn't then it is not.

The SURBL service (http://www.surbl.org/) does just that. And it does it in a way that makes it trivial to add to a spam filter; all that's needed is a DNS query.

SURBL stands for Spam URI Realtime BlockLists

DNS is the system used to translate friendly domain names (such as www.clearswift.com) into IP addresses that a web browser can use to retrieve web pages. For example, when you type www.clearswift.com your web browser does a DNS query and is told to go to IP address 213.146.158.137.

The SURBL service uses the same mechanism to translate the name of a web site into either an error (meaning that the web site is not run by a spammer) or a valid IP address (meaning that the web site is used by a spammer). By checking web sites present in email using the SURBL it's possible to rapidly decide whether a message is likely to be spam, or not.

Anecdotal evidence is that SURBL catches around 80% of spam with a false positive rate of less than 0.1% (that is it will incorrectly identify less than 1 in 1,000 genuine messages as spam). The SURBL is one of a number of layered spam fighting techniques that Clearswift uses in its SpamLogic™ technology.

The SURBL doesn't catch 100% of spam for a few reasons:

1. Not all spam contains a URL
For example, 419/Nigerian scams offering to give you 10% of some huge amount of money if you'll just help move it out of some foreign country usually contain phone numbers or email addresses for contact. It's also quite common for some Russian language spam to contain full contact details for the spammer include address and phone number, yet no URL.

2. Some spammers actively try to evade a filter from realizing there's a URL present
They use, for example, “the tURLing test” trick in The Spammers'
Compendium-(http://www.jgc.org/tsc/) to obfuscate the URL. Instead of putting www.spammysite.com in their spam they'll give instructions to the user: “Type www.spammy followed by site.com in your browser's address bar”. Since not even the best spam filters can read English, the name of the spammer's web site can't be checked against the SURBL. Naturally spammers would prefer not to do this as it reduces the likelihood that you'll visit their site.

3. The SURBL itself is not updated in real time
To prevent false positives SURBL entries are made with great care and examined by a person before a web site is blacklisted.


Where SURBL data comes from
The domain names in the SURBL come from six distinct sources (you can see full information about these sources here: http://www.surbl.org/lists.html):

1. SpamCop (http://www.spamcop.net/): domains reported to SpamCop from within spam messages. About 1.5% of SURBL entries come via SpamCop and are referred to as the [sc] data in the SURBL documentation.

2. SpamAssassin rule sets: a number of SpamAssassin rule sets have been turned into SURBL entries with additional domains from SARE (see http://www.rulesemporium.com/) to form the [ws] data set which accounts for about 36% of SURBL entries.

3. Phishing lists from MailSecurity (http://www.mailsecurity.net.au/) and MailPolice (http://rhs.mailpolice.com/#rhsfraud). This [ph] list accounts for about 1.5% of SURBL entries.

4. OutBlaze (http://www.outblaze.com/) data (the [ob] list), which makes up around 61% of the SURBL data, is collected from a large set of spam traps run by OutBlaze.

5. AbuseButler (http://www.abusebutler.com/) provides the [ab] list of domains that makes up about 0.5% of the SURBL list.

6. The [jp] list comes from jwSpamSpy (http://www.joewein.de/sw/jwSpamSpy/) and Prolocation (http://www.prolocation.net/) and makes up 55% of the domains in the SURBL list.

(Note that the percentages sum to more than 100% because some domains appear on more than one list). There's also a [multi] list which combines all of the above into a single data source.

Using SURBL
To use the SURBL it's simply a matter of doing the following:

1. Spot a domain name in an email message (e.g. www.jgc.org). This might be simple if the spammer has just included the domain name, or might require a smart filter to undo some level of obfuscation to get to the actual name.

2. Strip the domain name down to the top level part (.com) and the first name before that (clearswift.com). This step needs to take into account the rules for domain names (e.g. .co.uk is the top level part for a UK company, or .co.tw for a company in Taiwan).

3. Make a DNS query for that name with the appropriate SURBL list name appended (to check clearswift.com on the [multi] list the DNS query would be for clearswift.com.multi.surbl.org). (See below for a list of possible suffixes if you want to check a name on a specific SURBL list.)

4. Perform the DNS query. If it comes back with an IP address in the form 127.0.0.X then the domain name is bad and the message is spam. If the DNS query fails then the domain name is not part of the SURBL and hasn't been seen in spam (perhaps, yet!).

You can test the SURBL yourself just using the ping program. For example, pinging clearswift.multi.surbl.org tells me (the ping program is present on both Unix and Windows; in Unix bring up a shell, or Windows run cmd.exe):

$ ping clearswift.com.multi.surbl.org
ping: unknown host clearswift.com.multi.surbl.org

Phew! clearswift.com is not a spammy web site. Here's what happens for a spammy site

$ ping fakerolex.biz.multi.surbl.org
PING fakerolex.biz.multi.surbl.org (127.0.0.84)
64 bytes from 127.0.0.84: ttl=64 time=0.068 ms

fakerolex.biz is a spammy web site. You can see that the SURBL DNS query returned the IP address 127.0.0.84. Decoding the number 84 into binary digits reveals that that domain appears on the [jp], [ob] and [ws] lists (84 = 64 + 16 + 4). Here's the full list of lists, bits in the [multi] list and the domain suffix if you use an individual list:

Bit List Domain Suffix
2 [sc] SpamCop sc.surbl.org
4 [ws] SpamAssassin ws.surbl.org
8 [ph] Phishing Sites ph.surbl.org
16 [ob] OutBlaze ob.surbl.org
32 [ab] AbuseButler ab.surbl.org
64 [jp] jwSpamSpy/Prolocation jp.surbl.org

Checking an individual list is easy. For example, to see if fakerolex.biz appears on the [ws] list do another ping:

$ ping fakerolex.biz.ws.surbl.org
PING fakerolex.biz.ws.surbl.org (127.0.0.2)
64 bytes from 127.0.0.2: ttl=64 time=0.066 ms

Yes, it does.

The SURBL can also handle spammer web sites that are just specified using IP addresses. The IP address has to be reversed, before passing it in a DNS query in the same way as a regular domain name. Spammers sometimes use IP addresses instead of domain names to make the spammy sites harder to spot.

For example, to check whether 136.31.160.202 is a spammy web site the IP address is first reversed to 202.160.31.136 and then appended with a SURBL list name:

$ ping 202.160.31.136.multi.surbl.org
ping: unknown host 202.160.31.136.multi.surbl.org

Apparently, not.

How spammers get around the SURBL
There are four popular ways that spammers try to get around the SURBL:

1. Register, use and discard a web site fast. By registering a name, doing a spam run and then ditching the domain name the web site can come and go before the SURBL catches up.

2. Obfuscating the URL. To try to prevent a spam filter from seeing the URL and being able to check the SURBL the spammer can deploy a number of tricks. See, for example “the tURLing test”, “Enigma” and “Ultra” in The Spammers' Compendium (http://www.jgc.org/tsc/).

3. Use a redirector (for example, http://tinyurl.com/). Use of an “innocent” redirector means that the mail parser might miss the real URL (or be unable to determine it), thus hiding the spammy site from a SURBL check.

4. No URL at all! Some spam has an email address or a phone number for
contact.

Conclusion
The SURBL is a very valuable resource for any anti-spam tool. It's accurate, easy to integrate with and deserves support because it's offered as a labour of love by its maintainer. All of the effort in using the SURBL is concentrated on email parsing and spammer trick avoidance; once you've extracted the URL it's easy to SURBL it and find out whether it's a spammer web site or not.

--------------------------------------------------------------------------------------------------------------
About the Author
John Graham-Cumming is an independent consultant specializing in email classification. He publishes a twice monthly newsletter highlighting the latest in spam and anti-spam techniques. He can be reached at http://www.jgc.org/.

RELATED TOPICS
FIND OUT MORE ABOUT SPAMLOGIC
MIMEsweeper’s anti-spam solution delivers the best detection rates (98% out of the box) as well as the fewest false positives available today (0% when used with Personal Message Manager).
ANY QUESTIONS?
Send your questions to the Editor


TAKE A BREAK!
Read the articles offline - download a PDF here.
Update from the CEO, Jon Lee
As you may have noticed, things have been extremely busy at Clearswift over the last few months.
>> MORE

THREATLAB

THREATLAB ALERTS
SUBSCRIBE to our ThreatLab Alerts and reduce your "Zero-day" window!
HOT LINKS

FREE Evaluations
FREE White Papers
Case Studies
Latest Patches
Product Training Courses
Register for this newsletter

GLOBAL EVENTS
Click here to see the latest Clearswift events, webinars and conferences around the world.
LETTERS & QUESTIONS
Submit your letters and questions to the editor.

This newsletter is published quarterly and is edited by Isabelle Duarte, Director of Communications at Clearswift.

 

 

 

 

 

 

www.clearswift.com
© 2006 Clearswift Limited. All rights reserved.