Discover why it’s a waste of time to try to thwart image scraping of your site.
Why Scrapers Download Your Images
You work hard to make nice graphics for your site.
That’s especially true for bloggers who want to drive traffic from Pinterest.
There’s a LOT of money to be made from an eye-catching image that gets clicks to a site.
So, scrapers gather up all of your best images and pin them to Pinterest too. But they link to their own website.
I’ve also heard bloggers complain that every image on an “ideas” post that is getting massive Saves and clicks from Pinterest contains scraped images stolen from other blogs – even images that have the original brand’s watermark.
Scrapers don’t care.
They got the click and they made money from the ads they display on their site.
So, there’s BIG money to be had from scraping.
Waste of Time to Take Scrapers Down
Scraper sites like that are super easy and fast to create.
So even if bloggers report the scraper account to Pinterest, or try a takedown notice for copyright infringement with the site’s host, a scraper can quickly set up another site on a new domain and hosting account, get a new Pinterest account, and repeat the process.
What Hotlink Protection Does
Hotlink Protection won’t help solve this problem, and here’s why.
Hotlink Protection nullifies the site visitor’s ability to alt-click on their mouse and download an image.
That’s a very labor-intensive way of scraping images from a site.
That’s why scrapers don’t bother using an alt-click mouse method to manually download images one at a time like that.
It’s About the File Link, Not the Image
Scrapers use the direct link to the image file to quickly get every image on every post and page on your site.
Here’s how that works.
On WordPress, when you upload an image, it is stored in your wp-content > uploads folder.
And that media is stored in sub-folders by year, and then inside that folder is another sub-folder for the month.
That creates a unique link for your image that looks something like this:
That file path is easily read in the HTML code for the page/post.
You can easily see that HTML code for yourself by alt-clicking with your mouse in an empty spot of one of your posts.
A pop up will appear where you can click View Source.
A new browser tab will open with the HTML document for that post.
You’ll see all of the HTML code, including links to images, that it takes to display that post.
How Scrapers Download Images in Bulk
Scrapers have bots that “read” your HTML code and filter for all of the links that have wp-content/upload in them. They could also just as easily filter for files that have extensions like .jpg or .png or such too.
Those links are then compiled into a .csv file and can be displayed in a spreadsheet.
There are tools, even simple browser extensions, that can follow every link in that spreadsheet and download the image.
There are plenty of bot and download tools like this available for free online.
So, it costs nothing to get started in the scraper business other than a little time.
In fact, I got a free scraper and spreadsheet download tool and it took me about 30 minutes to scrape BlogAid images – all of them.
And I scraped any images I wanted off a site that had Hotlink Protection.
Yep, it’s that easy.
And it’s even faster and easier with paid scraper tools.
They grab your XML sitemap and scan it for all of your post and page links and set the scraper program to crawl all of those links and harvest the image files.
It’s an insanely efficient process.
Why Hotlink Protection is a Waste of Time
As you can see, it’s all about gathering the link to the image file.
Hotlink Protection cannot stop direct access to the image file.
Nor would you want to use any method to deter access to the image file.
The image file path is used by legitimate scraper bots, such as:
- Pinterest preview pop ups
- Special Pinterest tag directives, like which image to share when the Pin button is clicked
- Google image bots
- Facebook Open Graph (OG) tags to mark your featured image
Scraping Protection You Won’t Use
There are ways to harden your site against being scraped.
- Get a $500/mo service to screen visiting IPs without slowing down legit visitors
- Require a login to see your content
- Require a CAPTCHA to be checked to see your content
- Rate limit IP addresses
It’s highly unlikely that you will do the first method unless you have an enterprise level site with sensitive data.
And it’s highly unlikely that you’ll do the next two methods unless you are running a member site or have somewhat sensitive data that you don’t want scraped (not just images).
But, you can easily do the third method by using a CDN service that has extra security, like Cloudflare, and not just CDN delivery service.
(This is key, as not all CDN services offer extra security measures.)
It will auto block known bad bot IP addresses.
If the IP address is not known bad yet, it will notice rapid-fire hits to your site and inspect the IP address.
If it deems the attack malicious or overwhelming, it may block it.
You’ll have limited success with this method, though.
There are MANY rapid fire bots that are legitimate, such as:
- All search engine bots, such as Google, Bing, Yandex, etc
- SEO competitor bots, such as Ahrefs, SEMRush, etc.
No generic IP inspector will block all rapid-fire attempts.
You’ll also have limited success with this method if you don’t put your site on Cloudflare immediately after you point your DNS to your hosting and set up your site.
That’s because bots, both good and bad, crawl every IP on the web as fast as they can.
Once they get the IP address of your host on your unprotected site, they can then run end around any domain-related protection and access your site directly using the IP address.
And you’ll have limited success with this method on domain-related bot hits if you don’t fully configure Cloudflare to block bots coming from known scraper/hacker countries.
Some hosts also have bad bot/IP protection, but they are no more effective than a CDN service that has extra security.
Manually Blocking IPs
Don’t even bother.
It’s a whack-a-mole game you can’t win.
Bad bot hackers/scrapers mask the IPs they are using anyway. And it’s super fast and easy for them to change to another IP.
Don’t Go Overboard with Protection
You also have to be careful of adding too much IP scrutiny on incoming traffic to your site.
Think of it like a checkpoint on the road.
The guard in the shack has a list of who can gain access to your site and who can’t.
Every visitor has to be checked against that list, even the good ones.
That takes time.
And it can cause a bottleneck and slow down visitor access, even to legitimate human traffic.
This method is especially problematic if you try to deny access to IPs in your .htaccess file at your host.
There are bad bot lists available that you can include in that file. Services like Cloudflare have the same list and process the request faster, and before it ever hits your hosting.
Plus, services like Cloudflare update their bad bot list constantly. You would not want to take on that job to do it manually – like every day.
You can set hotlink protection on your site if it doesn’t interfere with Pinterest previews.
Just know that it is giving you a false sense of protection against image scraper bots.
The better thing to do is to get real security on your site that includes protection and screening before the bots ever hit your hosting account, much less your site.
There is NO behemoth security plugin that will do that for you. In fact, most of those are giving you a false sense of every kind of site security too.
Plus, site security is a combo of things, not just one thing. And it must be done at the hosting level and outside the site to be effective.
Need More Help?
Get a site audit.
Security and performance go hand-in-hand.
On average I find 26 security holes and performance drags that no tester can see.
You’ll get a 20-30 page report and a live chat to go over it in non-geek-speak and you’ll clearly see the issues for yourself too.
Care to Share?