Robots.txt is a text file that gives permission to the search engine bots, to whether or not to crawl the website content. Basically, the search engine robots before crawling the website content, checks if it has the permission to do so.
(When a search engine bot start crawling your domain. it will first look for robots.txt if found then it will follow the instructions mention in the file. and if robots.txt is not available it will crawl the entire domain including admin urls.)
The code for these instruction is written simple in a notepad and saved as a txt file. It looks something like this:
In order to solve the Robot.txt test error, simply follow two steps:
- Create a Robot.txt file (Test the code suggested below)
- Add the file in the cpanel –> public folder of the domain.
How to create robots.txt using Google Webmaster?
- Open webmaster.google.com and login
- open old version->crawl–>robots.txt tester or click on https://www.google.com/webmasters/tools/robots-testing-tool
- Paste your piece of code created for your domain and click on the button ‘test’ you can also test specific URL to test whether it is blocked in the code or not.
You can download the robots.txt file from the google webmaster panel itself by clicking ‘submit’.
In this simple code each word has its Specification
- User-agent defines the search engine. * sign by default means all search engines.
For specific search engine it can be like:
# Only for Google:- “User-agent: Googlebot ”
# Only for Bing :- “User-agent: Bingbot “
- Allow: / means all the search engines robots are allowed to crawl all the website directories/Pages (links).
- Code ‘Disallow’ is used to disallow bot(s) to not crawl the directorie(s) or page(s) Disallow: /cgi-bin :- means the robots are not allowed to visit the cgi files. Disallow: /wp-admin :- means the robots are not allowed to crawl the wordpress admin urls. It will not show in SERPs
These urls can be different in your case. It depends on the platform which you are using to develop your website.
Note:- some smart Search engine (bad search engines) can skip your robots.txt and crawl your entire domain. But most popular search engine like google, bing etc gives priority to it.