Then there’s Incapsula

I’ve written about CloudFlare in the past. I think it’s a no-brainer for small-time bloggers like me who control their own domain name registry. My writing has attracted the attention of another company, Incapsula, who offer a similar service.

Incapsula would love for me to give them a try, so I can write about them, too. They’re under the impression that I have some sort of influence in the world. Ha! They’ve even offered me a free upgrade to the ‘pro’ level of the service. One really cool thing about the upgrade: out-of-the-box SSL, which means you don’t have to get your own certificate to handle commerce. Certificates can be a real hassle, and a considerable expense.

The thing is, I’m pretty happy with CloudFlare. As of today, people on IPv6 can read these words. (Much like telephone numbers in some areas, the world is running out of IP addresses.) I’ve worked out one kink with the system and things are running smoothly. Does Incapsula have code to install on the server to make it play well with others? I don’t know.

Also, I don’t really need any of the advanced services of either system. I don’t do e-commerce, which could be a compelling reason to switch and grab my free upgrade.

I have a couple of terrifically minor quibbles about CloudFlare’s user interface and flexibility blocking IP ranges, but nothing worth even mentioning here. Logically, I should just stick with CloudFlare and leave it at that.

Except…

That guy they think I am? The one whose words can shift the balance of power in an emerging new market? I’m not that guy. I’ll never be that guy unless I devote myself to the task, and I’ve got other things to write about that are probably more interesting to most of you. But still I want to be the guy they think I am. I want to write the CloudFlare vs. Incapsula smackdown article to which all the pundits refer.

To do something like that, I’d have to set up a site to use Incapsula, but I don’t want to rock the Muddled Boat. I have jerryseeger.com, but what sort of test do I get out of a site that no one ever visits? It’s a site where acceleration hardly matters because the whole thing is so simple, and there’s no sign of e-commerce on the horizon. The thing barely even gets spammed.

Still, I have to think of something… the public demands it!

1

Bad Behavior, CloudFlare and Google Bot

This blog has several layers of protection from the evils of the outside world, but those layers don’t always get along. One problem that I had is pretty common among CloudFlare users, and the documentation provided by the relevant players has a hole in it – a key nugget of information that can make all the difference.

The nugget follows in due course.

My first line of defense from ne’er-do-wells and miscreants is CloudFlare. They stop most of the bad guys before they even reach my site. Still, for some sorts of attacks, when there’s doubt it’s better to let the bad guy through. It may turn out to be a good guy.

A program called Bad Behavior is my next line of defense. It sits on my server and quickly spots liars and weasels. For dangerous-looking attacks, that’s the limit. But, when there’s doubt and the site itself is not at risk, Bad Behavior will let the attack through.

At this point, ‘attack’ means ‘comment spam’. Everything else is stopped before it reaches this stage. Most of the comment spam has been stopped as well, but some has been given the benefit of the doubt. That’s where Akismet comes in. This layer spots the rest of the comment spam, and it can be much more aggressive since it doesn’t actually delete the spam, it puts it into a bin for future review. So, legitimate comments can be rescued by an alert blog admin.

It works pretty well. Three spams actually got through all the layers last week, the first time any have gotten through in quite some time. Somewhere, a spammer popped a bottle of bubbly.

So comment spam is pretty well thwarted. Hooray! Unfortunately, for a while I had a pretty big problem. Search engine robots were being denied. I fell off Google and Yahoo! and all the rest, and traffic to this site dwindled.

Note: according to this article, Bad Behavior has been updated to avoid the following problem. Yay! You should still install the CloudFlare plugin and the Apache module if you are able.

Here’s what was going on:

  1. Googlebot said ‘hey, muddledramblings.com, show me page x’.
  2. The request must get past CloudFlare. No problem. They see it’s the real Google bot and pass the request on to my server.
  3. Bad Behavior is next. They look at the incoming message and see something that claims to be a Google bot but It’s not coming from Google. It’s coming through CloudFlare. Bad Behavior says, “You are a lying sack of dingo dung and a false Google bot. You are obviously evil and you may not pass.” Google is shut out. The other legitimate robots are cut off as well.

This problem is pretty easy to fix, but not quite as easy as WordPress admins would like to hope. CloudFlare has code that you can install on your server that will straighten the whole problem out. Basically it tweaks incoming messages so that the original source appears instead of CloudFlare. This bit of fix-it code is available as a WordPress plugin, so you can install the plugin and rest easy.

But that’s the thing that tripped me up and is not explained in the docs. In the case of working with Bad Behavior, the WordPress Plugin is not enough.

The catch is that Bad Behavior does its magic before the CloudFlare plugin can do its magic. So, even with the CloudFlare plugin firmly installed, Bad Behavior will reject Google bot and all his pals.

There are two simple solutions: 1) Install the CloudFlare Apache module, which kicks in before anything else is run. This is preferable to the WordPress plugin anyway, because it’s a system-wide solution. 2) If you don’t have that level of control over your server, turn off Bad Behavior. It’s a shame to lose that layer of protection, but not devastating; there’s some overlap between what CloudFlare stops and what Bad Behavior stops. You still have two layers and your own alert management to fall back on.

How This Blog Works

Over the years, the technology behind this blog has gone from cave-dwelling stone-knives-and-bearskin static pages to cloud-city jet-packs-and-lightsaber dynamic yumminess. That transformation starts with WordPress but does not end there. Not by a long shot.

I started the Muddled Media Empire using a tool called iBlog, because it was free and worked with Apple’s hosting service, which I was already paying for. iBlog’s claim to fame was that it didn’t require a database – every time you made a change it went through and regenerated all pages that were affected. Toward the end, that was getting to be thousands of pages in some cases, each of which had to be uploaded individually. When iBlog’s support and development faltered, it was already past time for me to move on.

WordPress is an enormously popular Web-publishing platform. It comes in two flavors: you can host your blog on their super-duper servers and accept their terms of service and the slightly limited customization options, or you can install the code on your own server and go nuts. I chose the latter, mainly because I wanted to be able to touch the code. I’m a tinkerer.

So I signed up for a cheap Web host and set to work building what you see now. At first things were great, but after a while the host started having issues, and the once-great customer service withered up and vanished. So much for LiveRack. I think they just didn’t want to be in the hosting business anymore. I moved to iPage.

iPage was cheap, but I was crammed onto a server with a bunch of other people and sometimes my blog would take an agonizing time to load. Like, almost a minute. Then there was the time a very popular Geek site linked to my CSS border-radius table and iPage shut me down because the demand on the server was too much. Ouch! My moment in the sun became my moment at the bottom of a well.

I set out to find ways to make this blog more server-friendly and more user-friendly at the same time. Step 1: caching. WordPress doesn’t store Web pages, it stores data and the instructions on how to build a Web page. So, every time you ask to load a page here, WordPress fires up a program that reads from the database and assembles all the parts to the page. The thing is, that takes longer than just finding the requested file and sending it back, the way iBlog did. Caching is a way for the server to say, “hey, wait a minute – I just did this page and nothing’s changed. I’ll just send the same thing I did last time.” That can lead to big savings, both in time and server load.

I looked at a few WordPress cacheing programs and eventually chose W3 Total Cache, because it does far more than just cache data. For instance, it will minify scripts and css files (remove extra spaces and crunch them down) and combine the files together so the browser only has to make one request. It will zip the data, meaning fewer 1’s and 0’s moving down the pipe, and it does a few other things as well, one of which I will get to shortly.

I installed W3 Total Cache, and although some settings broke a couple of javascripts (for reasons I have yet to figure out – I’ll get to that someday), the features I could turn on definitely made a difference. Hooray!

But Muddled Ramblings and Half-Baked Ideas was still way too slow. I continued my search for ways to speed things up. I also began a search for a host that sucked less than iPage. (iPage was also starting to have outages that lasted a day or more. Not acceptable.) I decided I was willing to pay extra to be sure I wasn’t on an overwhelmed machine.

I’m not sure which came first – new server or Amazon Simple Storage Service. S3 is a pretty basic concept – you put your stuff on their super-duper servers, and when people need it they will get it really quickly. Things that don’t change, like images and even some scripts, can live there and your server doesn’t have to worry about them.

This is where W3 Total Cache earned my donation to their cause. You see, you can sign up for Amazon S3, and then put your account info into the proper W3TC panel and Bob’s Your Uncle. W3TC goes through your site, finds images and whatnot, puts them in your S3 bucket, and automatically changes all the links in your Web pages to point to your bucket instead of your own server. (Sometimes I find I have to copy the image to my S3 bucket manually, but that’s a small price to pay.)

Now a lot of the stuff on my blog, like the picture of me with the Utahraptors the other day, sits on a different, high-performance server out there somewhere, and no matter how overwhelmed my server happens to be at the moment those parts will arrive to you lickety-split. Amazon S3 is not free, however – each month I get an invoice for two or three cents. Should Muddled Ramblings suddenly become wildly popular, that number would increase.

About that server – the next stop on my quest for a good host was a place called Green Geeks. I wanted to upgrade to a VPS, which means I get a dedicated slice of a server that acted just like it was my very own machine. There is a lot to like about those, but my blog just wouldn’t run in the base level of RAM they offered. I upgraded and reorganized so that different requests would not take up more ram than they needed. Still, I had outages. Sometimes the server would just stop freeing up memory and eventually choke and die. Since it was a virtual server in a standard configuration, logic says it was caused by something I was doing, but all my efforts to figure it out were fruitless, and Green Geeks ran out of patience trying to help me figure it out.

The server software itself is Apache. At this point I considered using nginx (rhymes with ‘bingin’ ex’) instead. It’s supposedly faster, lighter, and easier to configure. But, I already know Apache. I may move to nginx in the future, but it’s not urgent anymore.

During the GreenGeeks era I came across another service that improves the performance of Web sites while reducing the load on the servers. I recently wrote glowingly about CloudFlare, but I will repeat myself a bit here for completeness. CloudFlare is a service that has a network of servers all over the world, and they stand between you the viewer and my server. They stash bits of my site all around the world, and much of the time they will have a copy of what you need on hand, and won’t even need to trouble my server with a request. About half of all requests to muddledramblings.com are magically and speedily taken care of without troubling my server at all. They also block a couple thousand bogus requests to my server each day, so I don’t have to deal with them (or pay for the bandwidth). It’s sweet, and the base service is free.

Unfortunately, it was not enough to keep my GreenGeeks server from crashing. Once more I began a search for a new host. I found through word of mouth a place called macminicolo. Apple employees get a discount, but I wasn’t an Apple employee yet. It was still a bargain. For what turned out to be the same monthly cost of sharing part of a machine at GreenGeeks, I get an entire server, all to myself, with plenty of RAM. I’ve set up several servers on Mac using MacPorts, and I knew just how to get things up and running well. It costs less than half what a co-located server costs anywhere else I have found (Mac, Windows, or Linux). (Co-location has up-front costs, but in the long term saves money.) So I have that going for me.

The only thing missing is that at GreenGeeks I had a fancy control panel that made it much simpler to share the machine with my friends. I do miss that, but I’m ready now to host friend and family sites at a very reasonable cost.

So there you have it! This is just your typical Apache/WordPress/W3 Total Cache/Amazon S3/CloudFlare site run off a Mac mini located somewhere in Nevada. Load times are less than 5% of what they were a year ago. Five percent! Conservatively. Typically it’s more like 1/50th of the load time. Traffic is up. Life is good.

Now I have no incentive at all to learn more about optimization.

3