18 Jan
Posted by Ganesh H S , Bangalore, India as Search Engine Optimization
In my earlier post i had posted about robots.txt and robots meta tag.
Following are the Search engine optiomization(SEO) checklist related to robots.txt -
1. robots.txt http status code
Search bot (eg: googlebot) before crawling the website it will always requests robots.txt and understands the definition robots.txt and crawls the website locations which is allowed. So its always important for webmaster to check the http status code of the robots.txt of the site and make sure it returns http status code 200 or http status code 404.
Why is that so important to check the http status code?
Search bot before crawling requests for robots.txt-
If the http status code of 200 is returned, it reads and crawls the locations of website which is allowed.
If the http status code of 404 is returned, search bot goes ahead with its job with no restriction on website crawling.
If the page takes lot of time and if there is no response code returns, search bot waits and after sometime it skips crawling because it always respects robots.txt and this can adversely affect the crawling of our website.
2. URLs restricted by robots.txt
Consider the impact of following robots.txt definition
User-agent: *
Disallow: *
It blocks all search bots to crawl the entire website, we should make sure we block only those areas which block.
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Nov | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||
RSS feed for comments on this post · TrackBack URI
Leave a reply