WordPress Robots.txt – What It Is and How to Use It

WordPress Robots.txt – What It Is and How to Use It

Online by JSC0d3
February 24, 2019 | | Recognition

To ensure that your site ranks highly in Search Engine Result Pages (SERPs), you’ll need to make it easy for search engine ‘bots’ to explore its most important pages. Having a well-structured robots.txt file in place will help direct those bots to the pages you want them to index (and avoid the rest).

Ever heard the term robots.txt and wondered how it applies to your website? Most websites have a robots.txt file, but that doesn’t mean most webmasters understand it. In this post, we hope to change that by offering a deep dive into the WordPress robots.txt file, as well as how it can control and limit access to your site.

What Is a WordPress Robots.txt?

Before we can talk about the WordPress robots.txt, it’s important to define what a “robot” is in this case. Robots are any type of “bot” that visits websites on the Internet. The most common example is search engine crawlers. These bots “crawl” around the web to help search engines like Google index and rank the billions of pages on the Internet.

So, bots are, in general, a good thing for the Internet…or at least a necessary thing. But that doesn’t necessarily mean that you, or other webmasters, want bots running around unfettered. The desire to control how web robots interact with websites led to the creation of the robots exclusion standard in the mid-1990s. Robots.txt is the practical implementation of that standard – it allows you to control how participating bots interact with your site. You can block bots entirely, restrict their access to certain areas of your site, and more.

What the robots.txt file does is provide a set of instructions for search engine bots. It tells them: “Hey, you can look here, but don’t go into those rooms over there!” This file can be as detailed as you want, and it’s rather easy to create, even if you’re not a technical wizard.

In practice, search engines will still crawl your website even if you don’t have a robots.txt file set up. However, not creating one is inefficient. Without this file, you’re leaving it up to the bots to index all your content, and they’re so thorough that they might end up showing parts of your website you don’t want other people to have access to.

More importantly, without a robots.txt file, you’ll have a lot of bots crawling all over your website. This can negatively impact its performance. Even if the hit is negligible, page speed is something that should always be at the top of your priorities list. After all, there are few things people hate as much as slow websites (and that includes us!).

Why Should You Care About Your Robots.txt File?

For most webmasters, the benefits of a well-structured robots.txt file boil down to two categories:

  • Optimizing search engines’ crawl resources by telling them not to waste time on pages you don’t want to be indexed. This helps ensure that search engines focus on crawling the pages that you care about the most.
  • Optimizing your research usage by blocking bots that are wasting your server resources.

Where the WordPress robots.txt File Is Located

When you create a WordPress website, it automatically sets up a virtual robots.txt file located in your server’s main folder. For example, if your site is located at yourfakewebsite.com, you should be able to visit the address yourfakewebsite.com/robots.txt, and see a file like this come up:

User-Agent: *
Allow: /wp-content/uploads/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
Sitemap: https://yourfakewebsite.com/sitemap_index.xml

How To Create And Edit Your WordPress Robots.txt File

By default, WordPress automatically creates a virtual robots.txt file for your site. So even if you don’t lift a finger, your site should already have the default robots.txt file. You can test if this is the case by appending “/robots.txt” to the end of your domain name. For example, “https://yourfakewebsite.com/robots.txt” brings up the robots.txt file.

Because this file is virtual, though, you can’t edit it. If you want to edit your robots.txt file, you’ll need to actually create a physical file on your server that you can manipulate as needed. Here are three simple ways to do that…

What To Put In Your Robots.txt File

Ok, now you have a physical robots.txt file on your server that you can edit as needed. But what do you actually do with that file? Well, as you learned in the first section, robots.txt lets you control how robots interact with your site. You do that with two core commands:

  • User-agent – this lets you target specific bots. User agents are what bots use to identify themselves. With them, you could, for example, create a rule that applies to Bing, but not to Google.
  • Disallow – this lets you tell robots not to access certain areas of your site.

There’s also an Allow command that you’ll use in niche situations. By default, everything on your site is marked with Allow, so it’s not necessary to use the Allow command in 99% of situations. But it does come in handy where you want to Disallow access to a folder and its child folders but Allow access to one specific child folder.

You add rules by first specifying which User-agent the rule should apply to and then listing out what rules to apply using Disallow and Allow. There are also some other commands like Crawl-delay and Sitemap, but these are either:

  • Ignored by most major crawlers, or interpreted in vastly different ways (in the case of crawl delay)
  • Made redundant by tools like Google Search Console (for sitemaps)

Let’s go through some specific use cases to show you how this all comes together.

How To Use Robots.txt To Block Access To Your Entire Site

Let’s say you want to block all crawler access to your site. This is unlikely to occur on a live site, but it does come in handy for a development site. To do that, you would add this code to your WordPress robots.txt file:

User-Agent: *
Disallow: /

What’s going on in that code?

The *asterisk next to User-agent means “all user agents”. The asterisk is a wildcard, meaning it applies to every single user agent. The /slash next to Disallow says you want to disallow access to all pages that contain “yourfakewebsite.com/” (which is every single page on your site).

How To Use Robots.txt To Block A Single Bot From Accessing Your Site

Let’s change things up. In this example, we’ll pretend that you don’t like the fact that Bing crawls your pages. You’re Team Google all the way and don’t even want Bing to look at your site. To block only Bing from crawling your site, you would replace the wildcard *asterisk with Bingbot:

User-Agent: Bingbot
Disallow: /

Essentially, the above code says to only apply the Disallow rule to bots with the User-agent “Bingbot”. Now, you’re unlikely to want to block access to Bing – but this scenario does come in handy if there’s a specific bot that you don’t want to access your site. This site has a good listing of most service’s known User-agent names.

How To Use Robots.txt To Block Access To A Specific Folder Or File

For this example, let’s say that you only want to block access to a specific file or folder (and all of that folder’s subfolders). To make this apply to WordPress, let’s say you want to block:

  • The entire wp-admin folder
  • wp-login.php

You could use the following commands:

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php

How to Use Robots.txt To Allow Access To A Specific File In A Disallowed Folder

Ok, now let’s say that you want to block an entire folder, but you still want to allow access to a specific file inside that folder. This is where the Allow command comes in handy. And it’s actually very applicable to WordPress. In fact, the WordPress virtual robots.txt file illustrates this example perfectly:

User-Agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

This snippet blocks access to the entire /wp-admin/ folder except for the /wp-admin/admin-ajax.php file.

How To Use Robots.txt To Stop Bots From Crawling WordPress Search Results

One WordPress-specific tweak you might want to make is to stop search crawlers from crawling your search results pages. By default, WordPress uses the query parameter “?s=”. So to block access, all you need to do is add the following rule:

User-agent: *
Disallow: /?s=
Disallow: /search/

This can be an effective way to also stop soft 404 errors if you are getting them.

How To Create Different Rules For Different Bots In Robots.txt

Up until now, all the examples have dealt with one rule at a time. But what if you want to apply different rules to different bots? You simply need to add each set of rules under the User-agent declaration for each bot. For example, if you want to make one rule that applies to all bots and another rule that applies to just Bingbot, you could do it like this:

User-Agent: *
Disallow: /wp-admin/
User-agent: Bingbot
Allow: /

In this example, all bots will be blocked from accessing /wp-admin/, but Bingbot will be blocked from accessing your entire site.

Testing Your Robots.txt File

You can test your WordPress robots.txt file in Google Search Console to ensure it’s setup correctly. Simply click into your site, and under “Crawl” click on “robots.txt Tester.” You can then submit any URL, including your homepage. You should see a green Allowed if everything is crawlable. You could also test URLs you have blocked to ensure they are in fact blocked, and or Disallowed.

You can use Google search console ‘Fetch as Google tool’ to see if whether or not your content can be accessed by Robots.txt file.

This steps are simple. Login to Google search console, select your site, go to diagnostic and Fetch as Google.

Add your site posts and check if there is any issue accessing your post.

You can also check for the crawl errors caused due to Robots.txt file under Crawl error section of search console.

Under Crawl > Crawl Error, select Restricted by Robots.txt and you will see what all links have been denied by the Robots.txt file.

Be conscious! Replytocom links may be rejected by Robots.txt and have other links that should not be part of Google. FYI, Robots.txt file is an essential element of SEO, and you can avoid many post duplication issues by updating your Robots.txt file.

Wrap Up

As we wrap up our robots.txt guide, we want to remind you one more time that using a Disallow command in your robots.txt file is not the same as using a noindex tag. Robots.txt blocks crawling, but not necessarily indexing. You can use it to add specific rules to shape how search engines and other bots interact with your site, but it will not explicitly control whether your content is indexed or not.

A common myth among SEO experts is that blocking WordPress category, tags, and archive pages will improve crawl rate and result in faster indexing and higher rankings.

For most casual WordPress users, there’s not an urgent need to modify the default virtual robots.txt file. But if you’re having issues with a specific bot, or want to change how search engines interact with a certain plugin or theme that you’re using, you might want to add your own rules.

We hope you enjoyed this guide and be sure to leave a comment if you have any further questions about using your WordPress robots.txt file.

JSC0d3's Logo
About JSC0d3

JSC0d3 is an entrepreneur, online marketer, and an employee of an IT company. When not building websites, creating content, or helping customers improve their online business, spend time with their wife and two beautiful children. Although he still feels new in WordPress, he enjoys sharing what he has learned with all of you! If you want to get in touch with him, you can do this through this website.

On the same idea

Posted by | April 1, 2019

Images are vital components of every website Before you start questioning the importance of images, just try to imagine your favorite blog or website...

Posted by | March 5, 2019

I’ve been loosely following the noise and #wpdrama surrounding Gutenberg for as long as it has been around and honestly for the most part I’ve...

Posted by | February 17, 2019

In this post, I will walk you through the process of creating your own like post button in WordPress Here’s a demo of how my final button looks...

Previous PostBackNext Post

Leave here an impression