Bad Behavior, CloudFlare and Google Bot

This blog has several layers of protection from the evils of the outside world, but those layers don’t always get along. One problem that I had is pretty common among CloudFlare users, and the documentation provided by the relevant players has a hole in it – a key nugget of information that can make all the difference.

The nugget follows in due course.

My first line of defense from ne’er-do-wells and miscreants is CloudFlare. They stop most of the bad guys before they even reach my site. Still, for some sorts of attacks, when there’s doubt it’s better to let the bad guy through. It may turn out to be a good guy.

A program called Bad Behavior is my next line of defense. It sits on my server and quickly spots liars and weasels. For dangerous-looking attacks, that’s the limit. But, when there’s doubt and the site itself is not at risk, Bad Behavior will let the attack through.

At this point, ‘attack’ means ‘comment spam’. Everything else is stopped before it reaches this stage. Most of the comment spam has been stopped as well, but some has been given the benefit of the doubt. That’s where Akismet comes in. This layer spots the rest of the comment spam, and it can be much more aggressive since it doesn’t actually delete the spam, it puts it into a bin for future review. So, legitimate comments can be rescued by an alert blog admin.

It works pretty well. Three spams actually got through all the layers last week, the first time any have gotten through in quite some time. Somewhere, a spammer popped a bottle of bubbly.

So comment spam is pretty well thwarted. Hooray! Unfortunately, for a while I had a pretty big problem. Search engine robots were being denied. I fell off Google and Yahoo! and all the rest, and traffic to this site dwindled.

Note: according to this article, Bad Behavior has been updated to avoid the following problem. Yay! You should still install the CloudFlare plugin and the Apache module if you are able.

Here’s what was going on:

  1. Googlebot said ‘hey, muddledramblings.com, show me page x’.
  2. The request must get past CloudFlare. No problem. They see it’s the real Google bot and pass the request on to my server.
  3. Bad Behavior is next. They look at the incoming message and see something that claims to be a Google bot but It’s not coming from Google. It’s coming through CloudFlare. Bad Behavior says, “You are a lying sack of dingo dung and a false Google bot. You are obviously evil and you may not pass.” Google is shut out. The other legitimate robots are cut off as well.

This problem is pretty easy to fix, but not quite as easy as WordPress admins would like to hope. CloudFlare has code that you can install on your server that will straighten the whole problem out. Basically it tweaks incoming messages so that the original source appears instead of CloudFlare. This bit of fix-it code is available as a WordPress plugin, so you can install the plugin and rest easy.

But that’s the thing that tripped me up and is not explained in the docs. In the case of working with Bad Behavior, the WordPress Plugin is not enough.

The catch is that Bad Behavior does its magic before the CloudFlare plugin can do its magic. So, even with the CloudFlare plugin firmly installed, Bad Behavior will reject Google bot and all his pals.

There are two simple solutions: 1) Install the CloudFlare Apache module, which kicks in before anything else is run. This is preferable to the WordPress plugin anyway, because it’s a system-wide solution. 2) If you don’t have that level of control over your server, turn off Bad Behavior. It’s a shame to lose that layer of protection, but not devastating; there’s some overlap between what CloudFlare stops and what Bad Behavior stops. You still have two layers and your own alert management to fall back on.

3 thoughts on “Bad Behavior, CloudFlare and Google Bot

  1. This happened to me as well. So that’s the main reason why Google and Yahoo bots can’t enter my blog. I can’t even use the Fetch as Googlebot feature inside the webmastertools. It always says “Unreachable”. But after disabling my firewall and other layers of security, the bots can now crawl my blog in peace. :)

  2. Pingback:

    Vote -1 Vote +1
    Then there’s Incapsula « Muddled Ramblings and Half-Baked Ideas

  3. Pingback:

    Vote -1 Vote +1
    CloudFlare + Bad Behavior + Akismet = WP SPAM triple wammy « B2B-TechCopy Technology Marketing Blog

Leave a Reply

Your email address will not be published. Required fields are marked *