Web Robots (Crawlers, or Spiders) are programs that traverse the Web automatically. Search engines use them to index the web content.
Whenever search engine robotos wants to crawl the website, it looks for robots.txt which is the file where we write the instructions to the robots/crawlers about what it should crawl and what it shouldn’t. Some of the folders will be confidential, so we may don’t want those folders should be indexed or crawled by this robots. We can also specify the sitemap path in robots.txt.
robots.txt should always be placed at the first level/top-level directory of your web server, suppose http://www.example.com/ is the domain, then robots.txt should be placed http://www.example.com/robots.txt
Lets start writing robots.txt -
User-agent: * Disallow: /
User-agent: google Disallow: /
User-agent: * Disallow: /admin Disallow: /account/index.html
robots.txt can also be used to specify the sitemap path
User-agent: *
Sitemap: http://ganeshhs.com/sitemap.xml
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Jul | ||||||
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
RSS feed for comments on this post · TrackBack URI
Leave a reply