How to Create and Configure robot.txt for Apache web server


linuxnlenux

Robots.txt” is a regular text file that through its name, has special meaning to the majority of “honorable” robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it’s both meaningless to you and a waste of your site’s bandwidth. “Robots.txt” lets you tell Google just that.

1) Here’s a basic “robots.txt”:

User-agent: *
Disallow: /

With the above declared, all robots (indicated by “*”) are instructed to not index any of your pages (indicated by “/”). Most likely not what you want, but you get the idea.

2) you may not want Google’s Image bot crawling your site’s images and making them searchable online, if just to save bandwidth. The…

View original post 271 more words