O'Reilly: WindowsDevCenter.com - March 30, 2004
Kill Internet Ads with HOSTS and PAC Files

By Sheryl Canter

If you're looking to block online site ads and offensive Web content, you don't need to buy special software -- instead, you can use two techniques available for any browser. One technique uses the HOSTS file built into Windows, and the other uses PAC files, a feature of all modern browsers. Problems can crop up with both these approaches. This article will explain why the problems occur and how to solve them.

Few web sites host their own banner ads. Typically they sign up with ad servers that deliver content and track views and clicks. Thus you can block most web site ads by blocking a fairly limited number of ad servers. HOSTS and PAC files can block web ads by blocking access to these ad servers. You can also block other sites serving objectionable content.

What Is the HOSTS File?

Unless a computer is configured to use a proxy server, the HOSTS file is the first place a browser looks for an IP address when you type in a URL such as www.permutations.com. Only if the domain name is not found in the HOSTS file does the browser then query the DNS server. It is this fact that makes the HOSTS file an effective means for blocking web site ads.

The HOSTS file is stored in different places depending on your operating system:

Windows 95/98/Me             c:\windows\hosts
Windows NT/2000/XP Pro       c:\winnt\system32\drivers\etc\hosts
Windows XP Home              c:\windows\system32\drivers\etc\hosts

It's a text file you can open in Notepad. Comments at the top explain the simple syntax. Each line consists of an IP address, a domain name, and an optional comment placed after a pound sign. The one default entry in every HOSTS file looks like this:

127.0.0.1      localhost      # this is the IP address of your local computer

127.0.0.1 is a special IP address called the "loopback" because it refers to the local computer. The loopback address gives developers a way to test network software without being physically connected to a network. This prevents buggy network hardware or software from obscuring test results. The loopback address also can be used to prevent web ads from displaying.

Figure 1
Figure 1 A site with flashing banner ads before and after ad blocking.

To use the HOSTS file to block web ads, you add a list of hosts serving objectionable content (such as ad servers), and associate these domains with the loopback address -- your own computer. Then when you navigate to a site that contains banner ads, the browser looks on your own machine for the ads and never visits the ad server. Thus the ads are never displayed, and the ad server has no opportunity to put tracking cookies on your computer.

Compiling a list of ad servers for an ad-blocking HOSTS file would take a lot of time, but happily you don't have to do it. There are numerous ad-blocking HOSTS files available for download on the Internet. Mike Skallas distributes one that is updated each month.

Regular updates are necessary because new ad servers pop up all the time. If you see an ad while running an ad-blocking HOSTS file, it means one of two things: (1) the ad is hosted on the site's own server, or (2) it's new. To find out where the ad is coming from, right click on it and select Copy Shortcut. If the ad is hosted on the site, you can't block it with a HOSTS file because HOSTS files only can block entire sites. (This is not true of PAC files, which I'll discuss later.) If it's a new ad server, paste the domain portion of this URL into your HOSTS file with a redirect to 127.0.0.1.

HOSTS File Problems and Solutions

The HOSTS file trick is clever, but there are some potential problems with it. Ad-blocking HOSTS files can include sites that shouldn't be there, blocking access to sites you want to see. This occurs because some ad servers also provide other types of content. For example, the ad server akamai.com also provides streaming media for many web sites, including Microsoft, for whom they handle Windows Updates. If you block akamai.com, you won't be able to access Windows Updates.

Then there's the aesthetic issue. Ideally, you'd see blank areas in place of ads, but in actual practice there are unattractive "Action canceled" error messages repeated wherever an ad would have been. There is a solution to this, as you'll see shortly.

And then there is the problem with delays. The idea behind the HOSTS file trick is to redirect ad-server requests to an IP address where there is no server. Internet Explorer will fail immediately if it can't find a server, but other browsers (notably, Opera) wait much longer before giving up.

Both these problems can be solved by installing a small, single purpose web server that does nothing but serve transparent bitmaps when requests are received on the loopback address. This replaces unsightly error messages with blank areas, and eliminates delays because the browser receives an immediate response. A free utility for this purpose will be described later in this article.

But there are other potential problems. If you are running a real web server on your computer such as Personal Web Server (PWS) or Internet Information Services (IIS), you'll get a dialog prompting for a network password each time you navigate to a site with redirected ads. This is because, by default, PWS and IIS are configured as the "default web site," responding to all IP addresses assigned to the computer that are not assigned to other sites. When the HOSTS file redirects your browser to the loopback address, an actual web server is there to answer. Since the request is for resources it can't find, it pops up an "Enter network password" dialog.

There are various things you can do to get around this, but all involve giving up something. If your computer is on a network, you can change the default IP setting of "(All Unassigned)" to the computer's network IP address, thus excluding 127.0.0.1. The PWS/IIS Help file warns against doing this because it can cause some server features to stop working. But if all you're using your web server for is testing sites before uploading, you may not care.

Another possibility is to redirect the ad servers in the HOSTS file to a non-existent IP address such as 0.0.0.0. This works with IE and Mozilla-based browsers, but Opera objects to the non-existent address and pops up an error message. Also, if you change the port from 127.0.0.1, you can't use a special purpose web server to eliminate unsightly error messages and delays.

PWS and IIS are configured by default to use TCP port 80, which is standard for HTTP. Another way you can prevent the "Enter network password" popup is to change the port to something other than 80 (81, for example). But this will make your server invisible to anyone who doesn't know that the port must be specified in the URL.

The best solution if you're running a web server is to not use the HOSTS file for ad blocking at all, but instead to use a PAC file, which doesn't conflict with existing web servers. PAC files have other advantages as well. As mentioned earlier, HOSTS files can only block entire sites, and not specific URLs within a site. PAC files can block specific URLs within a site so, for example, you could block akamai.com ads without disabling Windows Update.

HOST files have to be large to block all the major ad servers because wildcards are not supported; you have to list the exact domain names. Very large HOSTS files slow your browser because of the time it takes to search a large, unindexed text file. PAC files are based on JavaScript and can specify URLs using shell expressions (the Unix implementation of regular expressions), so this problem is eliminated.

Finally, ad-blocking HOSTS files cannot be used on systems using proxy servers because the HOSTS file is bypassed. Proxy servers are not a problem with PAC files.

What Are PAC Files?

Proxy Automatic Configuration (PAC) files were introduced by Netscape with the release of JavaScript back in 1995, and all modern browsers support them, including Internet Explorer and Opera. PAC files consist of JavaScript defining the function FindProxyForURL(url, host), and are saved with the file type .pac. The return value for this function says whether to use a proxy for this URL, a SOCKS proxy, or connect directly. If your browser is configured to use a PAC file, the FindProxyForURL function is called every time your browser attempts to access a URL, even if JavaScript is turned off in your browser.

The idea of using PAC files to block Web site ads was conceived by John R. LoVerso in 1996, while he was immersed in finding and documenting security flaws in JavaScript. PAC files support some special functions, two of which are useful for blocking ad sites:

  • This detects whether the URL host name belongs to a given DNS domain:
    dnsDomainIs(host, domain)
  • This checks whether str (could be the URL or host name) matches a shell expression:
    shExpMatch(str, shexp)

To block ads, your FindProxyForURL function will contain an if statement that describes all the ad sites, and perhaps another if statement with a white list. The basic structure looks like this:

// Allowed sites
function FindProxyForURL(url, host)
{
    if (0
        || dnsDomainIs(host, ".sprintpcs.com")
        || {{continue with exception list}}
    )
        return "DIRECT";
}

// Blocked sites (ad servers or others)
if (0
    || shExpMatch(url, "*.ad.*")
    || {{continue with disallowed list}}
)
    return "PROXY localhost:3421";

(The zero is only there to line up the JavaScript statements.)

Note that the blocked sites are redirected to port 3421 of localhost. If you're not running a web server on your computer, you can specify port 80 here, and that will work with all browsers. But if you are running a web server, specifying port 80 will trigger the "Enter network password" dialog.

Redirecting to an unused port like 3421 causes no problems for IE or Mozilla, but Opera will pop up an error message complaining that there is no proxy at that address. The solution to this problem is the special purpose web server mentioned earlier.

It's good to understand how PAC files work so you can modify them if necessary, but you don't have to start from scratch. John R. LoVerso provides a very good ad-blocking PAC file PAC file with detailed comments here. Open the file in WordPad for editing; Notepad won't show the line breaks.

Once you have the PAC file, you have to tell your browser to use it. The location of the setting is a little different in each browser, but in general you'll find it among the network or connection settings. You specify the file using a syntax like this:

file://C:/PacFiles/ads-proxy.pac

If you are using Internet Explorer, you have to change two other settings. Open the Internet Options dialog and click on the Security tab. Select "Local intranet" and click the "Sites…" button. Uncheck the box labeled "Include all sites that bypass the proxy server."

One other change is necessary. You must turn off the auto-proxy caching mechanism, since it prevents being able to restrict some server content while allowing other content. Unfortunately, there is no interface to this setting in the Internet Options dialog, but you can use a clever .REG file to not only change the option, but add a checkbox for it on the Advanced page of the Internet Options dialog. This .REG file was written by Bill Talcott. Open Notepad, copy and paste these lines, save it with the file type .reg, then double-click on the file to load the settings into the registry:

REGEDIT4

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\AdvancedOptions\PAC]
"Text"="Automatic Proxy Configuration"
"Type"="group"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\AdvancedOptions\PAC\PROXYCACHE]
"CheckedValue"=dword:00000001
"DefaultValue"=dword:00000000
"HelpID"="iexplore.hlp#00000"
"HKeyRoot"=dword:80000001
"RegPath"="Software\\Policies\\Microsoft\\Windows\\CurrentVersion\\Internet Settings"
"Text"="Use automatic proxy result cache (UNCHECK for no-ads)"
"Type"="checkbox"
"UncheckedValue"=dword:00000000
"ValueName"="EnableAutoProxyResultCache"
 
[HKEY_CURRENT_USER\Software\Policies\Microsoft\Windows\CurrentVersion\Internet Settings]
"EnableAutoProxyResultCache"=dword:00000000

The BlackHoleProxy Utility

As mentioned earlier, when using the HOSTS file or a PAC file to redirect ad servers, it's a good idea to run a small, single-purpose web server on the loopback address that responds to requests with a transparent bitmap. This is what BlackHoleProxy does, and it can be used with HOSTS files, PAC files, or both. You can download BlackHoleProxy for free, with source code.

You may have heard of a similar utility called eDexter that is free for personal use. BlackHoleProxy has some important options that eDexter lacks. It allows you to configure the port to use, which is crucial if you're running a web server on your computer. Another option lets BlackHoleProxy respond to computers other than localhost so it can be shared on a network. eDexter only offers this option in the paid version. Also, eDexter handles PAC files in a nonstandard way. You have to enter settings into an eDexter data file using a proprietary syntax, and the actual PAC file is generated on the fly.

Although BlackHoleProxy has all the features you might need, the interface is bare bones. There is no install program and no user interface. Options are set through the command line. For easy access, you can create shortcuts for the command-line options you think you'll need, plus another shortcut pointing to the documentation, and then create a folder for these in your Start menu.

Figure 2
Figure 2. Shortcuts to BlackHoleProxy options in Start menu.

To use BlackHoleProxy with an ad-blocking HOSTS file, you must set it to port 80 by launching it with this command line:

BlackHoleProxy -port 80

If you're running a web server on your computer, you should use a PAC file rather than the HOSTS file to block ads so you can change to a port that doesn't conflict. By default, BlackHoleProxy uses port 3421 because it was designed to be used with the No-Ads PAC file.

Last but not least, don't forget to clear your browser's cache after setting up your ad-blocking HOSTS or PAC file, or the ads will be retrieved from your cache.

Sheryl Canter has been a Contributing Editor to PC Magazine since 1993, a software developer since the early 1980's, and was the editor of PC Magazine's Utilities column from 1993-2002.

Click here to download Sheryl's ad-blocking PAC file. See comments in file for usage.


HomeWritingSpeakingWeb DesignGraphic DesignBioBlogContact