How To Use a Robots.txt File
Using a Robots Text File
Search engines will search for a special file called robots.txt before spidering your site. The Robots Text File is created specifically to give directions to web crawlers/robots. Place the following two lines in your robots.txt file if you wish to allow search engines to crawl/spider everything on your site:
robots.txt example
User-agent: *Disallow:
The * in the first line specifies that the directions are for all search engines. The second line indicates that nothing is disallowed.
Once you have created your robots.txt file, upload it to your websites main directory where your homepage and other HTML files are located.
Robots.txt Usage Examples
Below are several common examples of how you can use a robots.txt file to set parameters and control how different crawlers/robots access your website.
The following example would allow all crawlers/robots to access all files except for your images file.
robots.txt example
User-agent: *Disallow: /images/
The following would give Google a direct link to your XML sitemap (more info here), as well as other search engiens, and direct crawlers/robots to crawl all files except the cgi-bin files, images directory, and your log files.
robots.txt example
User-agent: *Disallow: /cgi-bin/
Disallow: /logs/
Disallow: /images/
Using “Crawl-delay” parameters in the robots.txt file. This parameter indicates the number of seconds for a crawler/spider to delay between requests.
robots.txt with Crawl-delay
User-agent: GooglebotCrawl-delay: 20
User-agent: Slurp
Crawl-delay: 20
User-Agent: msnbot
Crawl-Delay: 20
Bad Robots and Email Harvestors
Below are several robots/crawlers that you might want to block.
robots.txt example
User-agent: *User-agent: Titan
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: ExtractorPro
User-agent: WebZip
User-agent: larbin
User-agent: b2w/0.1
User-agent: htdig/3.1.5
User-agent: teleport
User-agent: NPBot
User-agent: TurnitinBot
User-agent: dloader(NaverRobot)
User-agent: dloader(Speedy Spider)
User-agent: FunWebProducts
User-agent: WebStripper
User-agent: WebSauger
User-agent: WebCopier
Robots Resources and Tools
- The Robots Exclusion Protocol
- Web Robots FAQ
- Using Apache To Stop Bad Robots
- List of Robots
- Database of Web Robots
- Types and Details of Robots
- Articles and Papers