yafla

Blocking Google Prefetching in .NET

Dennis W. Forbes - May 8th, 2005

Top Page Referrers (tracking is done by log analysis)

http://discuss.joelonsoftware.com/?joel
http://www.slashdot.org
http://www.bvsoftware.com/forums/default.aspx?f=17&m=24685
Updated Daily

Background

Google recently released a client web acceleration tool,  which in some cases provides a faster web experience for end users. It does this via a client proxy application, intercepting web requests and using several techniques to attempt to speed them up.

Once the page is retrieved, the local proxy analyzes the HTML for possible links to prefetch, and then attempts to grab that data while the machine is "idle" (for instance while the user is reading a review, the next page might already be loading, instantly ready once the user clicks next).

This feature, prefetching, was original added in Firefox/Mozilla, allowing web authors to supply hints in webpages indicating where the user will likely go next (for example in a multiple-page game review - the browser could have page 2 loaded by the time the user is done reading page 1). Apparently Google took this a step further by automatically guessing where the user will go next, and prefetching even non-explicitly configured links.

The Problem

There are several potential issues with this. Firstly even with the explicit-prefetch system of Firefox, negligent or malicious web admins could add prefetches in their pages which could cause a denial-of-service or other unpleasant effect on a third-party website (I could add prefetches on this page hitting a tremendously resource intensive page on a site of someone I don't like, and this distributed attack would come from every equipped visitor). The automatic prefetch takes it a step further, however, in that it could follow links that should not be automatically followed. For instance a link on a web application to delete a record might be a simple get request link: /mywebapp/delete_record.aspx?recordid=15 (though it should be noted that the HTTP/1.1 spec indicates that GET and HEAD requests are only to be used to retrieve information, and not to perform actions. Reality is that many webapp developers chose to utilize those methods to perform actions. I know I have). With prefetching, users suddenly find that actions that they didn't explicitly take are occurring because their browser is "executing" them in the background.

Records are deleted. Employees are fired. Mass chaos ensues. Society degrades until we're all living in caves flinging dung at each other.

Thankfully, there's several trivial ways to block either kind of prefetching in .NET (there are easy methods in other environments as well, however in this outing I'm covering .NET), useful for those cases where you know that you don't want prefetching on your web app, or even just specific parts of your web app.

A Solution

The solution to block prefetch attempts is evident once you realize that both Firefox and the Google Web Accelerator politely pass the X-moz: prefetch header when making such requests. The Google documentation itself details a simple Apache technique of blocking prefetch attempts, encouraging you to return an error 404 instead. You can find it described here.

In the same spirit, a simple solution for situations where you have just a few pages where you wish to block prefetch execution, for example a record delete page, is to simply check for the aforementioned header and end the request.


	public class WebForm1 : System.Web.UI.Page
	{
		public WebForm1()
		{
			System.String xmodHeader = HttpContext.Current.Request.Headers["X-moz"];
			if ((xmodHeader != null) && (xmodHeader.IndexOf("prefetch") >= 0))
			{
				HttpContext.Current.Response.StatusCode = 404;
				HttpContext.Current.Response.StatusDescription = "Prefetching not allowed";
				HttpContext.Current.Response.End();
			}
		}
		...
	}

In this case we're checking for the banned header in the page constructor, returning an error 404 if set.  To make this code reusable one could create a class derived from the System.Web.UI.Page class containing this logic in the constructor, and then inherit from that class in pages which they don't want precached. Of course you don't have to catch the prefetch at the constructor, and could do it elsewhere throughout the page and control lifecycle (such as the Load event), however in this case the idea is to avoid any unnecessary extra processing and to catch it as soon as possible. The benefit of this approach is that we could allow other pages to still be prefetched, potentially improving the user experience.

However if one were to desire this sort of blocking site wide, the worst technique would be to copy/paste duplicate logic into each and every page. Instead .NET allows for a very simple method of universally, or by directory, catching prefetch attempts - HttpModules. I won't go into detail about what an HttpModule is (check the MSDN documentation, or numerous tutorial sites online), but to say that, in a nutshell, an HttpModule is like an ISAPI filter (I bet that really cleared it up for you), in that it gets a before and after chance to change or deny a request, or to alter output.


using System;
using System.Web; 
using System.Collections;

namespace yafla.BlockPrefetch
{
	public class BlockPrefetchModule : IHttpModule 
	{
		public String ModuleName 
		{ 
			get { return "BlockPrefetchModule"; } 
		}    
    
		public void Init(HttpApplication application) 
		{
			application.BeginRequest += (new EventHandler(this.Application_BeginRequest));
		}
    
		private void Application_BeginRequest(Object source, EventArgs e) 
		{
			HttpApplication application = (HttpApplication)source;
			HttpContext context = application.Context;

			System.String xmodHeader = context.Request.Headers["X-moz"];
			if ((xmodHeader != null) && (xmodHeader.IndexOf("prefetch") >= 0))
			{
				context.Response.StatusCode = 404;
				context.Response.StatusDescription = "Prefetching not allowed";
				context.Response.End();
			}
		}
    
		public void Dispose() 
		{
		}
	}
}

Save the above in a file, say yafla.blockprefetch.cs, and compile it at a .NET SDK or VS.NET command line as follows.

csc /target:library yafla.blockprefetch.cs

Copy the resulting yafla.blockprefetch.dll assembly into your web app /bin directory, and then add the module to your web.config, for instance like follows.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.web>
   <httpModules>
      <add name="BlockPrefetchModule" 
         type="yafla.BlockPrefetch.BlockPrefetchModule, yafla.BlockPrefetch" />
    </httpModules>
 </system.web>
</configuration>

Alternately the same simple logic can be implemented (in a less reusable form) in your global.asax, as follows (in this case in the code-behind global.asax.cs).


		...
		protected void Application_BeginRequest(Object sender, EventArgs e)
		{
			System.String xmodHeader = HttpContext.Current.Request.Headers["X-moz"];
			if ((xmodHeader != null) && (xmodHeader.IndexOf("prefetch") >= 0))
			{
				HttpContext.Current.Response.StatusCode = 404;
				HttpContext.Current.Response.StatusDescription = "Prefetching not allowed";
				HttpContext.Current.Response.End();
			}
		}
		...

Voila, prefetch requests are blocked before they make it to your page, returning a 404 error. If you want to easily test this, you can either try hand-writing HTTP requests in a telnet session, or use one of the various tools which allows you to specify the request headers. A good example would be the replay functionality of LiveHttpHeaders.

Caveats

Conclusion

While prefetching can be a powerful, experience-improving technology for users, in some cases web developers and admins might want to block it. Above I've provided a quick and dirty solution for .NET developers that are paranoid about prefetch eating their lunch. Hopefully this provides some useful food for thought.

Note: No warranty, express or implied, is provided with any code or advice given. Vet the code yourself, and benchmark and prove in your own environment before using.

Other Articles By Dennis Forbes
yafla