In response to a spate of criticism about what is basically spam in Google search results, a post on Google’s official blog last week suggests that the search engine giant is planning on taking a harder line stance against “content farms” and other spammy sites aimed at taking top spots on search engine rankings with low-quality, keyword-heavy pieces. (Steve wrote an excellent post about why they’re not so great for the web as a whole.)
If you’ve used Google for any purpose lately, you’re undoubtedly familiar with the problems inherent in polluting the web with this sort of thing. (For instance, my search last week to read about a few products at Trader Joe’s was impossible due to the volume of “Calories in Trader Joe’s…” posts on Livestrong.com, information that is clearly marked on any box of any food purchased.) The sites in question employ an army of low-paid, word-churning drones to put out an endless amount of content to target any search that could ever be conceived of, ever.
Google responded in the above referenced post about the impact of this low-quality content on the value of search results:
As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.
In response, Google says, they are developing ways to better detect this human-generated spam and weed it out of the top spots:
As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments.
It’s unlikely that Google will have to stop policing their results any time soon, as we have all seen Bender’s Big Score and know how this kind of thing works. Are you finding Google’s results to be drowned in a sea of How-To articles? Does Google stand a chance against millions of human word-vomiting machines?