In my earlier post i had posted about robots.txt and robots meta tag.

Following are the Search engine optiomization(SEO) checklist  related to robots.txt -

1. robots.txt http status code

Search bot (eg: googlebot) before crawling the website it will always requests robots.txt and understands the definition robots.txt and crawls the website locations which is allowed. So its always important for webmaster to check the http status code of the robots.txt of the site and make sure it returns http status code 200 or http status code 404.

Why is that so important to check the http status code?

Search bot before crawling requests for robots.txt-

If the http status code of 200 is returned, it reads and crawls the locations of website which is allowed.

If the http status code of 404 is returned, search bot goes ahead with its job with no restriction on website crawling.

If the page takes lot of time and if there is no response code returns, search bot waits and after sometime it skips crawling because it always respects robots.txt and this can adversely affect the crawling of our website.

2. URLs restricted by robots.txt

Consider the impact of following robots.txt definition

User-agent: *
Disallow: *

It blocks all search bots to crawl the entire website, we should make sure we block only those areas which block.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Simpy
  • StumbleUpon
  • Technorati
  • YahooMyWeb