| 
DATA
LOSS PREVENTION
RISE
OF THE 'BUSINESS WORM'?
MIMEsweeper
EMAIL MANAGED SERVICE
BE
AFRAID. BE VERY AFRAID.
Spam-advertised
Domain Names and the SURBL
By John
Graham-Cumming
It's a
pretty simple and obvious idea: most spam contains a URL leading
you to the spammer's web site. If you could just build a list
of those “spam advertised domains” then it would
be trivial to build a spam filter. If a message contains a
spammy URL then it's spam, if it doesn't then it is not.
The SURBL
service (http://www.surbl.org/)
does just that. And it does it in a way that makes it trivial
to add to a spam filter; all that's needed is a DNS query.
SURBL
stands for Spam URI Realtime BlockLists |
DNS is
the system used to translate friendly domain names (such as
www.clearswift.com)
into IP addresses that a web browser can use to retrieve web
pages. For example, when you type www.clearswift.com your
web browser does a DNS query and is told to go to IP address
213.146.158.137.
The SURBL
service uses the same mechanism to translate the name of a
web site into either an error (meaning that the web site is
not run by a spammer) or a valid IP address (meaning that
the web site is used by a spammer). By checking web sites
present in email using the SURBL it's possible to rapidly
decide whether a message is likely to be spam, or not.
Anecdotal
evidence is that SURBL catches around 80% of spam with a false
positive rate of less than 0.1% (that is it will incorrectly
identify less than 1 in 1,000 genuine messages as spam). The
SURBL is one of a number of layered spam fighting techniques
that Clearswift uses in its SpamLogic™ technology.
The SURBL
doesn't catch 100% of spam for a few reasons:
1.
Not all spam contains a URL
For example, 419/Nigerian scams offering to give you 10% of
some huge amount of money if you'll just help move it out
of some foreign country usually contain phone numbers or email
addresses for contact. It's also quite common for some Russian
language spam to contain full contact details for the spammer
include address and phone number, yet no URL.
2.
Some spammers actively try to evade a filter from realizing
there's a URL present
They use, for example, “the tURLing test” trick
in The Spammers'
Compendium-(http://www.jgc.org/tsc/)
to obfuscate the URL. Instead of putting www.spammysite.com
in their spam they'll give instructions to the user: “Type
www.spammy followed by site.com in your browser's address
bar”. Since not even the best spam filters can read
English, the name of the spammer's web site can't be checked
against the SURBL. Naturally spammers would prefer not to
do this as it reduces the likelihood that you'll visit their
site.
3.
The SURBL itself is not updated in real time
To prevent false positives SURBL entries are made with great
care and examined by a person before a web site is blacklisted.
Where SURBL data comes from
The domain names in the SURBL come from six distinct sources
(you can see full information about these sources here: http://www.surbl.org/lists.html):
1. SpamCop
(http://www.spamcop.net/):
domains reported to SpamCop from within spam messages. About
1.5% of SURBL entries come via SpamCop and are referred to
as the [sc] data in the SURBL documentation.
2. SpamAssassin
rule sets: a number of SpamAssassin rule sets have been turned
into SURBL entries with additional domains from SARE (see
http://www.rulesemporium.com/)
to form the [ws] data set which accounts for about 36% of
SURBL entries.
3. Phishing
lists from MailSecurity (http://www.mailsecurity.net.au/)
and MailPolice (http://rhs.mailpolice.com/#rhsfraud).
This [ph] list accounts for about 1.5% of SURBL entries.
4. OutBlaze
(http://www.outblaze.com/)
data (the [ob] list), which makes up around 61% of the SURBL
data, is collected from a large set of spam traps run by OutBlaze.
5. AbuseButler
(http://www.abusebutler.com/)
provides the [ab] list of domains that makes up about 0.5%
of the SURBL list.
6. The
[jp] list comes from jwSpamSpy (http://www.joewein.de/sw/jwSpamSpy/)
and Prolocation (http://www.prolocation.net/)
and makes up 55% of the domains in the SURBL list.
(Note
that the percentages sum to more than 100% because some domains
appear on more than one list). There's also a [multi] list
which combines all of the above into a single data source.
Using
SURBL
To use the SURBL it's simply a matter of doing the following:
1. Spot
a domain name in an email message (e.g. www.jgc.org).
This might be simple if the spammer has just included the
domain name, or might require a smart filter to undo some
level of obfuscation to get to the actual name.
2. Strip
the domain name down to the top level part (.com) and the
first name before that (clearswift.com). This step needs to
take into account the rules for domain names (e.g. .co.uk
is the top level part for a UK company, or .co.tw for a company
in Taiwan).
3. Make
a DNS query for that name with the appropriate SURBL list
name appended (to check clearswift.com on the [multi] list
the DNS query would be for clearswift.com.multi.surbl.org).
(See below for a list of possible suffixes if you want to
check a name on a specific SURBL list.)
4. Perform
the DNS query. If it comes back with an IP address in the
form 127.0.0.X then the domain name is bad and the message
is spam. If the DNS query fails then the domain name is not
part of the SURBL and hasn't been seen in spam (perhaps, yet!).
You can
test the SURBL yourself just using the ping program. For example,
pinging clearswift.multi.surbl.org tells me (the ping program
is present on both Unix and Windows; in Unix bring up a shell,
or Windows run cmd.exe):
$
ping clearswift.com.multi.surbl.org
ping: unknown host clearswift.com.multi.surbl.org
Phew!
clearswift.com is not a spammy web site. Here's what happens
for a spammy site
$
ping fakerolex.biz.multi.surbl.org
PING fakerolex.biz.multi.surbl.org (127.0.0.84)
64 bytes from 127.0.0.84: ttl=64 time=0.068 ms
fakerolex.biz
is a spammy web site. You can see that the SURBL DNS query
returned the IP address 127.0.0.84. Decoding the number 84
into binary digits reveals that that domain appears on the
[jp], [ob] and [ws] lists (84 = 64 + 16 + 4). Here's the full
list of lists, bits in the [multi] list and the domain suffix
if you use an individual list:
Bit
List Domain Suffix
2 [sc] SpamCop sc.surbl.org
4 [ws] SpamAssassin ws.surbl.org
8 [ph] Phishing Sites ph.surbl.org
16 [ob] OutBlaze ob.surbl.org
32 [ab] AbuseButler ab.surbl.org
64 [jp] jwSpamSpy/Prolocation jp.surbl.org
Checking
an individual list is easy. For example, to see if fakerolex.biz
appears on the [ws] list do another ping:
$
ping fakerolex.biz.ws.surbl.org
PING fakerolex.biz.ws.surbl.org (127.0.0.2)
64 bytes from 127.0.0.2: ttl=64 time=0.066 ms
Yes, it
does.
The SURBL
can also handle spammer web sites that are just specified
using IP addresses. The IP address has to be reversed, before
passing it in a DNS query in the same way as a regular domain
name. Spammers sometimes use IP addresses instead of domain
names to make the spammy sites harder to spot.
For example,
to check whether 136.31.160.202 is a spammy web site the IP
address is first reversed to 202.160.31.136 and then appended
with a SURBL list name:
$
ping 202.160.31.136.multi.surbl.org
ping: unknown host 202.160.31.136.multi.surbl.org
Apparently,
not.
How
spammers get around the SURBL
There are four popular ways that spammers try to get around
the SURBL:
1. Register,
use and discard a web site fast. By registering a name, doing
a spam run and then ditching the domain name the web site
can come and go before the SURBL catches up.
2. Obfuscating
the URL. To try to prevent a spam filter from seeing the URL
and being able to check the SURBL the spammer can deploy a
number of tricks. See, for example “the tURLing test”,
“Enigma” and “Ultra” in The Spammers'
Compendium (http://www.jgc.org/tsc/).
3. Use
a redirector (for example, http://tinyurl.com/). Use of an
“innocent” redirector means that the mail parser
might miss the real URL (or be unable to determine it), thus
hiding the spammy site from a SURBL check.
4. No
URL at all! Some spam has an email address or a phone number
for
contact.
Conclusion
The SURBL is a very valuable resource for any anti-spam tool.
It's accurate, easy to integrate with and deserves support
because it's offered as a labour of love by its maintainer.
All of the effort in using the SURBL is concentrated on email
parsing and spammer trick avoidance; once you've extracted
the URL it's easy to SURBL it and find out whether it's a
spammer web site or not.
--------------------------------------------------------------------------------------------------------------
About the Author
John Graham-Cumming is an independent consultant specializing
in email classification. He publishes a twice monthly newsletter
highlighting the latest in spam and anti-spam techniques.
He can be reached at http://www.jgc.org/.
| RELATED
TOPICS |
FIND
OUT MORE ABOUT SPAMLOGIC
MIMEsweeper’s anti-spam solution delivers the best
detection rates (98% out of the box) as well as the fewest
false positives available today (0% when used with Personal
Message Manager). |
ANY
QUESTIONS?
Send your questions to the Editor |
|