Have you received an email from Google Search Console that says you have a “New Index coverage issue detected” lately?
If the warning or error states, “Indexed, though blocked by robots.txt” then here’s a little help with what’s going on and ways to fix it, and if it’s okay to just ignore it.
What is Robots.txt?
This file resides at your host, just below your WordPress site files.
It is the second file that bots read when they come to crawl your site. (The first file is .htaccess.)
Directives in the robots.txt file can include:
- The path to your XML sitemap
- Disallow directives for specific bots
- Disallow directives for specific file types
- And more
Basically, it’s a traffic cop and gate keeper for every bot that crawls your site, telling where things are, and what to stay out of.
Good Bot, Bad Bot
Google is not the only entity sending bots to poke around and see what’s on your site.
Every search engine, like Bing and Duck Duck Go, send their own bots.
Those are the good bots, and are well behaved, meaning that they will honor the directives in your robots.txt file.
And then there are the bad bots – LOTS of them – to the tune of 10k – 40k a month on many sites (according to what I see in site audits).
These are ill-behaved bots and many will not honor the directives in your robots.txt file.
So, What’s the Point of a Robots.txt File?
That’s a good question!
Many SEOs, including famous and popular ones, say to put nothing in your robots.txt file.
I have yet to meet an SEO who knows anything about site security and performance at the level someone who specializes in those services knows.
All they care about is Google.
And that would be fine if Google were the only bot crawling your site.
But, it’s not – FAR from it.
Bots Chewing Up Everything
I don’t mean to be gross, but if you could see your hosting account the way I do, it would look like a roach infested apartment with bots chewing on everything.
Many of the requests I receive for a site audit are from site owners who have repeatedly had their sites limited by their host for resource overages.
The two main causes of that are:
- Bots running wild
- Resource hog plugins
The two bot fixes for that are:
- Kick the bad bots to the curb before they ever reach your site
- Keep the good bots out of areas they have no business being
READ: What Celebrity Homes and Secure Sites Have in Common
READ: How CloudFlare makes Your Site Faster and Safer
What’s In My Robots.txt File
Since 2013 I’ve been loading my robots.txt with directives that keep good bots out of areas where there is nothing for them to crawl or index.
You can see my robots.txt file here anytime. (Keep in mind that I test things all the time, so what you see may not stay that way.)
It also includes a delay so bots have to take a bit, chew, and swallow instead of just chomp, chomp, chomp.
My year-over-year site audits on multiple client sites show me that hosting resources are properly managed with this setup.
That means that the majority of the resources you are paying your host for are reserved for human site visitors.
What’s Google’s Problem?
That’s a good question!
Ever since they launched the new Search Console interface in early 2018, we have been receiving emails with the most insane warnings and errors.
I just received on that complained a “lost password” page was indexed, but blocked by robots.txt.
It’s an action string on a wp-login page, which is indeed blocked in my robots.txt.
Really, Google?
I’ve had reports from clients that they are getting similar warnings for comments, which are also blocked, on purpose, in robots.txt.
Why all of a sudden is there a problem, Google?
FYI – The robots.txt file is the default thing blamed whenever Google is blocked from crawling something. You’ll get that message even if there is no robots.txt, or it’s empty, and something else is blocking the crawl.
What’s the Fix?
That’s a good question!
And you won’t like the answer, which is, it depends.
As I see it, at the time of this writing, there are three ways to go with this.
Way 1
As insane as this sounds, Google has to be allowed to crawl a page to see that it should be noindexed.
So, you have to allow that crawl in robots.txt.
In other words, you have to waste your limited crawl budget on something that you never intended to show up in SERPs.
Like I said – insane.
If you have Cloudflare and other measures to kick as many bad bots to the curb as possible before they ever hit your hosting account and site, then perhaps you can open up your robots.txt like so many SEOs recommend and then monitor your bot hits via AWStats (in the cPanel of your hosting account).
AWStats tracks all hits to your site. Google Analytics only tracks humans – well, mostly. Some bots are good at cloaking as human.
But the point is, you’ll see a wild difference in those stats with AWStats being WAY higher. That’s normal.
That will get rid of this error message from Google Search Console.
Way 2
Modify your robots.txt to allow indexing of whatever Search Console is complaining about.
And again, monitor your stats.
Way 3
You can leave your robots.txt as it is and check the warning to see if it’s something you’re okay with being blocked.
Don’t just put all warnings on ignore!
See what it is first!!
FYI – Marking it as fixed in Search Console is only a temporary solution. It will return.
Test, Test, Test
I’m blessed to have so many site audit clients with plenty enough daily traffic to allow us to test different fixes for different warnings/errors and monitor the effects quickly.
Be sure to follow me for the latest test results.
Tips Tuesday is my weekly roundup post/podcast/livestream of top site news and tips. It’s a 100% non-optional read for DIY site owners now, if you have any hope of keeping calmly ahead of all the site changes from everywhere we have going on.
A BlogAid News subscription will put all of my posts in your inbox, so you never miss anything, and get you exclusive news and discounts via my newsletter (I only send those about 1-2 times a year, unless there is a site emergency you have to know about, like a major security issue.)
BlogAid on Facebook is where I stream live with breaking news.
BlogAid on YouTube for replays and video tutorials.
I’m everywhere else you want to follow too! Join me where you like to hang out.
Thanks MaAnna. I don’t know what I’d do without you–panic probably. I’ve shared on social media.
I can’t believe that Google is wasting our time with such nonsense of alerting to the fact that we have chosen to block bots from places they have no business being. This is insane.
Thanks for getting to the bottom of this so quickly. I have just been verifying it is fixed and ignoring the messages. And it isn;t hurting me at all. Your site audits have been the best investment I can make. Thanks for keeping all of us ahead of the game.
Just discovered that marking as fixed will only be a temporary solution, unfortunately. Will be updating the post.
Dang. I also found out when I click on the message in the old console about seeing the errors, it takes me to the new console. I can;t get anything on the errors from the old console.
Thank you for putting this post out MaAnna! I just received this email from search console while on vaca. ugh. They want one of my admin pages…ha. Your post has helped me understand this so much more! Thank you!
Keeping on top of the news and changes is what it’s all about!! Keeps us all calm, and that’s a good thing.
Doesn’t make any sense! But at least I understand it better now : )
Great article! I just fixed my issue. I just saw 42,000 warnings on my search console and went to investigate. It was an error message “Indexed, though blocked by robots.txt.” Google started indexing my contact forms and as they are auto generated that resulted in over 42K pages with thin content, all indexed – perfect for a Panda penalty perhaps?
I have blocked that file for many many years using robots.txt but google now, not only not likes that, but they go ahead and index your page anyway. I was showing all variants of a single url resulting in thin content.
I unblocked undesired pages on robots.txt and added “noindex” header tag on the html header on the unwanted page itself:
Thank you!
Thanks for sharing this!! Google used to be a well-behaved bot, but it is now blowing through all directives in robots.txt. And since all ill-behaved bots don’t follow the directives either, there’s not much point in having it do disallows anymore.
hiii,
Can you pl tell me how to modify the robots.txt file, i’m using blogger account, pl tell me
Hi Ajay, I only cover self-hosted WordPress sites, so can’t help with Blogger, and there is no robots.txt file on there anyway. You won’t find most of what I cover on that platform.
Hiii Sir,
Thanks for your reply, i’ve been searching but failing, can you please suggest any site for getting the issue solved.
Thanks in advance.
Best Regards,
AJAY K
My DIY SEO course covers how to fix this.
Thank you for this explanation! I got this warning in my email this morning, so of course I immediately Googled it on my phone. When I looked at the affected page a few minutes later in Search Console, I fond that it was a specific post’s twitter share option. Definitely not something I’m going to stress over! Everyone else just tells you how to “fix” the warning without explaining what’s really going on.
Yeah, knowing what’s going on is empowering, right?
Glad it’s a minor issue. But you might want to fix it, as they will keep bugging you about it.
Interesting to read, I got this same message too and was wondering what it is, now I have I idea of it. Thanks for this.
Seeing the same message, but not on a WordPress site. In fact, this site is not using a CMS at all. Old fashioned, PHP-build website where we update the pages with a text-editor and push the updates online via FTP.
So my question is how do I use Google’s search console to determine what the issue is? I’ve spent some time navigating it and have gotten nowhere. You’d think if there was a serious problem they would put it front and center on the dashboard when you click the link in the email.
I cover the new Search Console, and Coverage Report examples of issues, in the DIY SEO course.
Even though the GSC stuff is pretty much standalone, the course is geared toward WP sites using Yoast SEO.
So, not thinking the course is a good fit, and doing a live session, or just getting a course on Coverage Report issues would be the way to go.
Without looking at your exact links and issues, can’t answer it generically.
I just got this email for my blogspot blog. I recently changed my comment settings to require word verification and comment moderation. (Anonymous spammers) Would that be the reason? I came across your website when I was googling to see if this was a legitimate email. I didn’t want to click the link in the email if it wasn’t.
I only work with WordPress sites, so can’t say what changes at Blogspot may have triggered the Google warning. You can skip the email link and just go straight to your Google Search Console account to see the Coverage Report. That will have any issues listed.
Hey, MaAnna! Nice post. Just wanted to make a slight correction. Bots do not read your .htaccess file – and actually it shouldn’t be visible to the public. The .htaccess provides instructions to your Apache web server. It’s basically a way for you to make per-website changes to both the settings of Apache and even PHP. Some examples of its use are to tweak your caching settings or allow users to upload much larger files.
Thanks for correction, Doug. I’m well aware of the functions of .htaccess, I teach a class on it. While technically correct that bots don’t “read” the file, they are most definitely governed by it, as is all traffic to the site. If you want to lock all bots out, .htaccess is the place for that directive.
I have frequently got this message from search console that’s Warning: Indexed, though blocked by robots.txt.But don’t take any action. Should I overlook or take any action in this regard? Suggest, please.
The main thing to keep in mind is that GSC is trying to help you get your site crawled properly. Any notification that it sends you needs to be checked to determine what the problem is so you can decide if you need to make changes on your site.
I have a blogspot blog and this makes no sense to me. The point of blogspot is that i shouldn’t need to worry about nonsense like this. Every time I get one of these, I google it yet again, then check that I’m “visible to search engines” (yes), that custom Robots.txt is disabled, and then that’s as far as I get. If they want met o actually take an action they need to tell me what it is.
I think they’re using a default robots.txt that is poorly configured for their own search engine / crawlers. That’s totally not my problem to fix. Good thing I’m one of the rare unicorns that doesn’t monetize.
One of the bizarre effects of this is that Bing is nicer to my site than Google. I get actual search keyword lists from bing and nothing from Google. Whatever. Yet another thing I need to send instantly to my spam folder.
Google has a whole lot of stuff that is “do as we say not as we do” for its own products.