The robots.txt file is a small text file that resides in the root folder of your site. It tells the search engine bots, like Googlebot, which parts of the site to crawl and index and which not to.
If you make even the smallest mistake while editing or optimizing robots.txt, the search engine bots will stop crawling and indexing your website, and your site will not appear in the search results.
Proper configuration of robots.txt in your site is very beneficial to SEO, and it’s also easy. You can manage how search crawlers crawl your site and index your pages by configuring robots.txt files.
In this guide, I’ll tell you how you can create a perfect robots.txt file to boost the SEO of your website or blog.
Why do You Need Robots.txt Files foYour Website?
When search engine bots visit websites, they follow the robot file and crawl the content. But if your site will not have a Robots.txt file, the search engine bot will start crawling and indexing all the content of your website that you do not want to index.
A wise webmaster or SEO professional never allows search engine bots to crawl all the pages and files on a website. A website normally can have some thin and private pages that shouldn’t be crawled and indexed by search engines; otherwise, there can be many adverse effects, including worsen search engine ranking. Robots.txt file is used to instruct the search crawler to avoid such pages.
Search engine bots search the robot file before indexing any website. When they do not get any instructions from the Robots.txt file, they start indexing all the contents of the website. If no further instructions are found, they index the entire website. You can stop search crawler to crawl any of the pages (or files) on your website by setting robots.txt. You can also do so by adding a ‘nofollow’ metatag to any page. But, sometimes, crawlers don’t obey the instruction provided by the ‘nofollow’ tag. In such cases, robots.txt files are very useful.
You can get the following advantages of adding robots.txt file on your website:
- You can tell the search engine bot to crawl only the allowed pages of your site.
- Some particular files, folders, images, etc. can be prevented from being indexed in the search engine.
- You can add ‘disallow robots.txt’ to less important pages and attachments to your site. This protects your server from being overloaded. This will decrease the crawl rate and improve your site’s speed.
- You can prevent private and personal pages of your website from being visible on search results.
- You can improve your website’s SEO by blocking low-quality pages.
Do You Really Need a Robots.txt File?
It’s not always necessary that you’ll require robots.txt files for your website. Depending upon your site’s utility, you may either need or don’t need to set robots.txt files. You may not have pages or files to hide from search engines. In such a case, you don’t need to bother about robots.txt. For example, simple blogs don’t need to block any of their pages from the search crawlers. However, complex eCommerce sites need to block the user login pages and user profile’s from search engines. In such cases, they need to set multiple robots.txt files for different types of pages.
How Can You Find Robots.txt of Your Website?
Robots.txt exists in the root of your website. You can check and edit them by accessing your root folders on your hosting.
If you are a WordPress user, you can find your robot.txt files in your SEO plugin. You can add and edit robots.txt to your site by accessing your plugin setup.
You can also view your robots.txt setting by entering your_site’s_domain_name/robots.txt to any browser’s address bar.
You can test your site’s robots.txt by using Google’s Robot.txt Testing Tool.
Basic Formats of Robots.txt
The common formats of Robots.txt look like this.
Allow full access:
Block all access:
Block one file:
Block one folder:
How to Create Perfect Roborts.Txt Files?
Creating and setting robots.txt file is very simple. Even if you don’t have any technical skills, you can set robots.txt to your website.
You can edit your robots.txt files either in the root folder of your website or through the SEO plugins you are using for your WordPress site.
To edit robots.txt file directly at the root, you need to access your website’s folders at your hosting. Depending upon the setting of your dashboard at your hosting service provider, access to your root files can vary. However, in most cases, you can access it through the File Manager in the cPanel. The process flow can generally be – Setting/Advanced Setting>>Files>>File Manager>>Home/Root>>Add/Edit/Upload File.
If you’re a beginner and a WordPress user, it’s better to add/edit your robots.txt file through your SEO plugin. The process can be – SEO plugin (ex. – All in One SEO)>>Robots.txt>>Add Rules. It’s easy.
Before adding or editing robots.txt to your website, you need to understand the meaning of basic concepts and instructions of robots.txt. All these are as follow:
A user agent is a search engine robot or crawler. Different search engines have different user agents, such as Googlebot or Bingbot. Google alone has seven types of robots as follow:
A “*” mark refers to all user agents irrespective or search engines and types. If you want to instruct all bots, you can set robots.txt file like – User-agent: *. Otherwise, you can set separately, such as – User-agent: Adsbot Google.
Rules are the instructions to search engine bots. There can be two types of rules – either “Allow” or “Disallow”.
A directory path is a page/file/folder of your website. You can specify a directory path to a user-agent with instruction of either “allow” or “disallow”. For example –
The above rule informs that you’ve allowed Googlebot to crawl the image folder on your website except for a specific image.
How to Test Robots.txt of Your Website?
You can use Google’s Robot Tester to find out whether any of your webpages are blocked by robots.txt or not.
If any of your important pages are blocked by robots.txt file, unblock it by deleting the “disallow” rule.
Warning: Erroneous use of robots.txt can ruin all your SEO efforts. If you’ve yet any difficulty in understanding the use of robots.txt, ask me in the comment.