We generally index website search results in search engines with intention to get more back links from Search Engine results page (SERP).

So we should use robots.txt to disallow website search results pages crawling that don’t add much value for users coming from search engines. Its one of the quality guidelines Google mentions in webmaster guidelines. Its one of the important SEO checklist we should track.

Possible reasons:

  1. Duplicate content -
    Search results holds a snippet of article title, short description and link to that article and in contrast we also have article page which has article title, article description. So if search bot crawls search results and individual article, it can potentially lead to duplicate content.
  2. Door way pages -
    Search results acts like a doorway pages to individual articles, doorway pages are one of the cloaking techniques that should be avoided.
  3. Search engine subsystem -
    Search engine as a whole strives to provide unique results, clicking on the links(indexed website search pages) in search engine results page which in turn takes to the website search results (running another search in website).

Example -
Disallow: /search?p=* in health.yahoo.com/robots.txt
Disallow :/search in http://search.yahoo.com/robots.txt
Disallow: /results in http://youtube.com/robots.txt

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Simpy
  • StumbleUpon
  • Technorati
  • YahooMyWeb