President Barack Obama debuted his new WhiteHouse.gov Web site during his inauguration yesterday. In addition to the slew of visible changes from the previous version, though, this one also has a behind-the-scenes change that makes the site far more accessible to search engines than its predecessor.
We're talking, of course, about the robots.txt file -- the text file that, when placed in a Web site's root directory, tells search engine robots what pages they can and cannot index. It's commonly used to stop search engines from finding and listing certain pages within a site.
While we were all looking at the revamped design, blogger Jason Kottke thought to check out how the robots.txt file had changed. And change, it definitely did.
The Bush WhiteHouse.gov robots.txt file -- saved here -- has nearly 2400 lines of disallowed pages. At a glance, it looks like most of the site was designed not to be indexed by search engines. A small sampling:
Disallow: /stateoftheunion/2002/behindthescenes/print/text Disallow: /stateoftheunion/2002/behindthescenes/text Disallow: /stateoftheunion/2002/photos/print/text Disallow: /stateoftheunion/2002/photos/text Disallow: /stateoftheunion/2002/print/text Disallow: /stateoftheunion/2002/text Disallow: /stateoftheunion/2003/print/text Disallow: /stateoftheunion/2003/text Disallow: /vicepresident/news-speeches/speeches/images/text Disallow: /vicepresident/news-speeches/speeches/print/text Disallow: /vicepresident/news-speeches/speeches/text Disallow: /vicepresident/news-speeches/text Disallow: /vicepresident/photoessay/text
The new Obama WhiteHouse.gov robots.txt, in comparison, contains almost nothing. The entire file:
User-agent: * Disallow: /includes/
Interesting contrast. Perplexing, too, why the previous administration wanted to keep its site so cloaked to external searchers. Any theories?