Have you received an email from Google Search Console that says you have a “New Index coverage issue detected” lately?
If the warning or error states, “Indexed, though blocked by robots.txt” then here’s a little help with what’s going on and ways to fix it, and if it’s okay to just ignore it.
What is Robots.txt?
This file resides at your host, just below your WordPress site files.
It is the second file that bots read when they come to crawl your site. (The first file is .htaccess.)
Directives in the robots.txt file can include:
- The path to your XML sitemap
- Disallow directives for specific bots
- Disallow directives for specific file types
- And more
Basically, it’s a traffic cop and gate keeper for every bot that crawls your site, telling where things are, and what to stay out of.
Good Bot, Bad Bot
Google is not the only entity sending bots to poke around and see what’s on your site.
Every search engine, like Bing and Duck Duck Go, send their own bots.
Those are the good bots, and are well behaved, meaning that they will honor the directives in your robots.txt file.
And then there are the bad bots – LOTS of them – to the tune of 10k – 40k a month on many sites (according to what I see in site audits).
These are ill-behaved bots and many will not honor the directives in your robots.txt file.
So, What’s the Point of a Robots.txt File?
That’s a good question!
Many SEOs, including famous and popular ones, say to put nothing in your robots.txt file.
I have yet to meet an SEO who knows anything about site security and performance at the level someone who specializes in those services knows.
All they care about is Google.
And that would be fine if Google were the only bot crawling your site.
But, it’s not – FAR from it.
Bots Chewing Up Everything
I don’t mean to be gross, but if you could see your hosting account the way I do, it would look like a roach infested apartment with bots chewing on everything.
Many of the requests I receive for a site audit are from site owners who have repeatedly had their sites limited by their host for resource overages.
The two main causes of that are:
- Bots running wild
- Resource hog plugins
The two bot fixes for that are:
- Kick the bad bots to the curb before they ever reach your site
- Keep the good bots out of areas they have no business being
What’s In My Robots.txt File
Since 2013 I’ve been loading my robots.txt with directives that keep good bots out of areas where there is nothing for them to crawl or index.
You can see my robots.txt file here anytime. (Keep in mind that I test things all the time, so what you see may not stay that way.)
It also includes a delay so bots have to take a bit, chew, and swallow instead of just chomp, chomp, chomp.
My year-over-year site audits on multiple client sites show me that hosting resources are properly managed with this setup.
That means that the majority of the resources you are paying your host for are reserved for human site visitors.
What’s Google’s Problem?
That’s a good question!
Ever since they launched the new Search Console interface in early 2018, we have been receiving emails with the most insane warnings and errors.
I just received on that complained a “lost password” page was indexed, but blocked by robots.txt.
It’s an action string on a wp-login page, which is indeed blocked in my robots.txt.
I’ve had reports from clients that they are getting similar warnings for comments, which are also blocked, on purpose, in robots.txt.
Why all of a sudden is there a problem, Google?
FYI – The robots.txt file is the default thing blamed whenever Google is blocked from crawling something. You’ll get that message even if there is no robots.txt, or it’s empty, and something else is blocking the crawl.
What’s the Fix?
That’s a good question!
And you won’t like the answer, which is, it depends.
As I see it, at the time of this writing, there are three ways to go with this.
As insane as this sounds, Google has to be allowed to crawl a page to see that it should be noindexed.
So, you have to allow that crawl in robots.txt.
In other words, you have to waste your limited crawl budget on something that you never intended to show up in SERPs.
Like I said – insane.
If you have Cloudflare and other measures to kick as many bad bots to the curb as possible before they ever hit your hosting account and site, then perhaps you can open up your robots.txt like so many SEOs recommend and then monitor your bot hits via AWStats (in the cPanel of your hosting account).
AWStats tracks all hits to your site. Google Analytics only tracks humans – well, mostly. Some bots are good at cloaking as human.
But the point is, you’ll see a wild difference in those stats with AWStats being WAY higher. That’s normal.
That will get rid of this error message from Google Search Console.
Modify your robots.txt to allow indexing of whatever Search Console is complaining about.
And again, monitor your stats.
You can leave your robots.txt as it is and check the warning to see if it’s something you’re okay with being blocked.
Don’t just put all warnings on ignore!
See what it is first!!
FYI – Marking it as fixed in Search Console is only a temporary solution. It will return.
Test, Test, Test
I’m blessed to have so many site audit clients with plenty enough daily traffic to allow us to test different fixes for different warnings/errors and monitor the effects quickly.
Be sure to follow me for the latest test results.
Tips Tuesday is my weekly roundup post/podcast/livestream of top site news and tips. It’s a 100% non-optional read for DIY site owners now, if you have any hope of keeping calmly ahead of all the site changes from everywhere we have going on.
A BlogAid News subscription will put all of my posts in your inbox, so you never miss anything, and get you exclusive news and discounts via my newsletter (I only send those about 1-2 times a year, unless there is a site emergency you have to know about, like a major security issue.)
BlogAid on Facebook is where I stream live with breaking news.
BlogAid on YouTube for replays and video tutorials.
I’m everywhere else you want to follow too! Join me where you like to hang out.