inquisitrlogo

 
The New WhiteHouse.gov: More Open to Search Engines


WhiteHouse.gov Robots.txtPresident Barack Obama debuted his new WhiteHouse.gov Web site during his inauguration yesterday. In addition to the slew of visible changes from the previous version, though, this one also has a behind-the-scenes change that makes the site far more accessible to search engines than its predecessor.

We’re talking, of course, about the robots.txt file — the text file that, when placed in a Web site’s root directory, tells search engine robots what pages they can and cannot index. It’s commonly used to stop search engines from finding and listing certain pages within a site.

While we were all looking at the revamped design, blogger Jason Kottke thought to check out how the robots.txt file had changed. And change, it definitely did.

The Bush WhiteHouse.gov robots.txt file — saved here — has nearly 2400 lines of disallowed pages. At a glance, it looks like most of the site was designed not to be indexed by search engines. A small sampling:

Disallow: /stateoftheunion/2002/behindthescenes/print/text
Disallow: /stateoftheunion/2002/behindthescenes/text
Disallow: /stateoftheunion/2002/photos/print/text
Disallow: /stateoftheunion/2002/photos/text
Disallow: /stateoftheunion/2002/print/text
Disallow: /stateoftheunion/2002/text
Disallow: /stateoftheunion/2003/print/text
Disallow: /stateoftheunion/2003/text
Disallow: /vicepresident/news-speeches/speeches/images/text
Disallow: /vicepresident/news-speeches/speeches/print/text
Disallow: /vicepresident/news-speeches/speeches/text
Disallow: /vicepresident/news-speeches/text
Disallow: /vicepresident/photoessay/text

The new Obama WhiteHouse.gov robots.txt, in comparison, contains almost nothing. The entire file:

User-agent: *
Disallow: /includes/

Interesting contrast. Perplexing, too, why the previous administration wanted to keep its site so cloaked to external searchers. Any theories?











Comments


5 Archived Responses to “ The New WhiteHouse.gov: More Open to Search Engines ”

  1. Are you kidding? An empty robots.txt file on a brand new site with near zero content means the new administration is more open – or that the Bush Administration was 'cloaked'? You are right, I'm certain that the developers actually spoke directly with Obama on this one and it was one of his first policy decisions. Riiiight.

  2. I suspect that these were technical decisions, not political ones. Rather than speculating that Karl Rove wanted to suppress information, or Rahm Emanuel wanted to open it up, it's possible that some IT person wanted to enforce, or not enforce, these for some technical reason or another.

    Or I could be wrong, and Bush and Obama wrote their own robots.txt files and told their underlings to comply.

  3. If the robots don't see it, it can't be cached. Therefore you can remove it and it vanishes. Thats classic Bush doctrine.

    On Obama- will wait and see…

    Tom

  4. FYI, Jake Kuramoto has addressed these issues in the Oracle AppsLab blog. In addition to whitehouse.gov and robots.txt, Kuramoto mentions some other issues, such as the “dark ages” problems that Obama's staff encountered after January 20. Good read.

  5. If you look at the old robots.txt file all of the urls ended in /text. The reason all those disallows were in there is because they didn't want the bots to crawl the text version of the site which is there for accessibility for blind people, etc…

    This has nothing to do with Obama being more open, it's simply a matter of obama's tech team being more efficient with how they handle the accessibility part of the website. You should do some research before you make a post like this, you're spreading wrong rumors.