Legal Information
PC Knowledge Base - Bayesian Filtering

Good Knowledge Is Good2Use

Bayesian filtering is a system that has a fairly complex implementation, but is easy to understand in principle. It's effectively a self-learning filtering system. Bayesian filtering uses some complex mathematical probability calculations to decide whether the e-mail being scanned is spam or not spam. It bases the final decision on the contents of all the legitimate e-mails and all of the spam you've received up to that point. This is an extremely effective filtering system, because it constantly learns and adapts to the e-mail you receive.

As anl example, imagine you're a keen tropical fish keeper. You've signed up for tropical fish information newsletters, which you receive weekly. When you install your Bayesian filtering, you show it a list of spam and a list of ham so it can build its initial database. In the list of ham were a number of fish newsletters, each containing some information on a medicine called Methylene Blue.

The following week, you receive another newsletter. When the Bayesian filter scans the letter, it notices that Methylene Blue has shown up a couple of times in the ham (legitimate e-mail ) e-mails, and not at all in the spam e-mails. On this basis, it lets the e-mail through. Now imagine a friend of yours has also installed a Bayesian filter, but he doesn't like tropical fish. When he configured the filter he supplied it with e-mails containing the phrase Methylene Blue that he considered to be spam. Whenever your friend receives the tropical fish newsletter, the Bayesian filter sees that Methylene Blue shows up in spam e-mails a number of times so the e-mail is blocked. In practice, the decision to block or allow e-mail is based on more than a single string of text in the message, but the same principle applies. By learning your e-mail habits, Bayesian filtering can effectively reduce the spam you receive by more than 99 percent.

Part of the reason it's so effective is because it relies on you, the end user, to teach it what to do. Much like a small child, you tell the filter what's good, what's bad, and if it gets something wrong you let it know what it did wrong. as ham!
One of the most effective desktop Bayesian filters available is SpamBayes, shown below, SpamBayes is an Outlook plug-in that monitors incoming e-mail and applies Bayesian filtering before it arrives in your inbox. It's extremely easy to configure, and very effective at filtering spam.

To begin using SpamBayes, simply download the Outlook add-in and follow the installation wizard. You're asked to supply two sets of e-mails: one set of spam and one set of ham. If you don't have enough spam to train SpamBayes at the moment, simply skip the configuration and wait a few days until enough has built up in your inbox. You can then restart the training process by opening the SpamBayes manager from the toolbar icon, selecting the Training tab, and then clicking Start . The figure below shows SpamBayes processing the selected spam and ham folders: Figure 4-5: SpamBayes training.

One extremely interesting feature of SpamBayes is the ability to see the clues used to detect whether a message is spam or ham. When you select an e-mail (either spam or ham), and then select Show spam clues for current message from the SpamBayes menu, SpamBayes generates a statistical summary for you, similar to the one shown below:

SpamBayes statistical analysis.

It can be very interesting to see exactly which words and phrases trigger the Bayesian filter!

Network Bayesian Filtering

If you have more than one computer that downloads e-mail, you might find it practical to use a different type of filtering system. Because the Outlook Junk E-mail filter and SpamBayes are client-side filtering systems, they can only protect the computer on which they're installed. To provide more comprehensive spam filtering from a single computer, inline spam firewalls are used. Spam firewalls such as No Spam Today are similar to a proxy server in that they sit between your ISP's e-mail server and your home network, scanning all incoming e-mail regardless of which computer it's destined for.

Inline spam firewalls are favored by large organizations for a number of reasons. First, they're easier to manage than individual desktop filters -- although users lose the individual control over their spam filtering, from a corporate point of view, the overall result is far more effective.

Second, once an organization is sufficiently large, the licensing costs for desktop applications become enormous. It's far more economical to spend $2,000 on one spam firewall than to buy 1,000 desktop licenses at $5 or $10 each. The more popular corporate spam firewalls include the Barracuda and GFI Mail Essentials.



Search Knowledge Base Feedback
If you like our web site refer a friend.
Your friends name.
Your friends email address.
Your Name
Your Email Address


© Copyright 1998-1999 GOOD2USE