How This Blog Works

Over the years, the technology behind this blog has gone from cave-dwelling stone-knives-and-bearskin static pages to cloud-city jet-packs-and-lightsaber dynamic yumminess. That transformation starts with WordPress but does not end there. Not by a long shot.

I started the Muddled Media Empire using a tool called iBlog, because it was free and worked with Apple’s hosting service, which I was already paying for. iBlog’s claim to fame was that it didn’t require a database – every time you made a change it went through and regenerated all pages that were affected. Toward the end, that was getting to be thousands of pages in some cases, each of which had to be uploaded individually. When iBlog’s support and development faltered, it was already past time for me to move on.

WordPress is an enormously popular Web-publishing platform. It comes in two flavors: you can host your blog on their super-duper servers and accept their terms of service and the slightly limited customization options, or you can install the code on your own server and go nuts. I chose the latter, mainly because I wanted to be able to touch the code. I’m a tinkerer.

So I signed up for a cheap Web host and set to work building what you see now. At first things were great, but after a while the host started having issues, and the once-great customer service withered up and vanished. So much for LiveRack. I think they just didn’t want to be in the hosting business anymore. I moved to iPage.

iPage was cheap, but I was crammed onto a server with a bunch of other people and sometimes my blog would take an agonizing time to load. Like, almost a minute. Then there was the time a very popular Geek site linked to my CSS border-radius table and iPage shut me down because the demand on the server was too much. Ouch! My moment in the sun became my moment at the bottom of a well.

I set out to find ways to make this blog more server-friendly and more user-friendly at the same time. Step 1: caching. WordPress doesn’t store Web pages, it stores data and the instructions on how to build a Web page. So, every time you ask to load a page here, WordPress fires up a program that reads from the database and assembles all the parts to the page. The thing is, that takes longer than just finding the requested file and sending it back, the way iBlog did. Caching is a way for the server to say, “hey, wait a minute – I just did this page and nothing’s changed. I’ll just send the same thing I did last time.” That can lead to big savings, both in time and server load.

I looked at a few WordPress cacheing programs and eventually chose W3 Total Cache, because it does far more than just cache data. For instance, it will minify scripts and css files (remove extra spaces and crunch them down) and combine the files together so the browser only has to make one request. It will zip the data, meaning fewer 1’s and 0’s moving down the pipe, and it does a few other things as well, one of which I will get to shortly.

I installed W3 Total Cache, and although some settings broke a couple of javascripts (for reasons I have yet to figure out – I’ll get to that someday), the features I could turn on definitely made a difference. Hooray!

But Muddled Ramblings and Half-Baked Ideas was still way too slow. I continued my search for ways to speed things up. I also began a search for a host that sucked less than iPage. (iPage was also starting to have outages that lasted a day or more. Not acceptable.) I decided I was willing to pay extra to be sure I wasn’t on an overwhelmed machine.

I’m not sure which came first – new server or Amazon Simple Storage Service. S3 is a pretty basic concept – you put your stuff on their super-duper servers, and when people need it they will get it really quickly. Things that don’t change, like images and even some scripts, can live there and your server doesn’t have to worry about them.

This is where W3 Total Cache earned my donation to their cause. You see, you can sign up for Amazon S3, and then put your account info into the proper W3TC panel and Bob’s Your Uncle. W3TC goes through your site, finds images and whatnot, puts them in your S3 bucket, and automatically changes all the links in your Web pages to point to your bucket instead of your own server. (Sometimes I find I have to copy the image to my S3 bucket manually, but that’s a small price to pay.)

Now a lot of the stuff on my blog, like the picture of me with the Utahraptors the other day, sits on a different, high-performance server out there somewhere, and no matter how overwhelmed my server happens to be at the moment those parts will arrive to you lickety-split. Amazon S3 is not free, however – each month I get an invoice for two or three cents. Should Muddled Ramblings suddenly become wildly popular, that number would increase.

About that server – the next stop on my quest for a good host was a place called Green Geeks. I wanted to upgrade to a VPS, which means I get a dedicated slice of a server that acted just like it was my very own machine. There is a lot to like about those, but my blog just wouldn’t run in the base level of RAM they offered. I upgraded and reorganized so that different requests would not take up more ram than they needed. Still, I had outages. Sometimes the server would just stop freeing up memory and eventually choke and die. Since it was a virtual server in a standard configuration, logic says it was caused by something I was doing, but all my efforts to figure it out were fruitless, and Green Geeks ran out of patience trying to help me figure it out.

The server software itself is Apache. At this point I considered using nginx (rhymes with ‘bingin’ ex’) instead. It’s supposedly faster, lighter, and easier to configure. But, I already know Apache. I may move to nginx in the future, but it’s not urgent anymore.

During the GreenGeeks era I came across another service that improves the performance of Web sites while reducing the load on the servers. I recently wrote glowingly about CloudFlare, but I will repeat myself a bit here for completeness. CloudFlare is a service that has a network of servers all over the world, and they stand between you the viewer and my server. They stash bits of my site all around the world, and much of the time they will have a copy of what you need on hand, and won’t even need to trouble my server with a request. About half of all requests to muddledramblings.com are magically and speedily taken care of without troubling my server at all. They also block a couple thousand bogus requests to my server each day, so I don’t have to deal with them (or pay for the bandwidth). It’s sweet, and the base service is free.

Unfortunately, it was not enough to keep my GreenGeeks server from crashing. Once more I began a search for a new host. I found through word of mouth a place called macminicolo. Apple employees get a discount, but I wasn’t an Apple employee yet. It was still a bargain. For what turned out to be the same monthly cost of sharing part of a machine at GreenGeeks, I get an entire server, all to myself, with plenty of RAM. I’ve set up several servers on Mac using MacPorts, and I knew just how to get things up and running well. It costs less than half what a co-located server costs anywhere else I have found (Mac, Windows, or Linux). (Co-location has up-front costs, but in the long term saves money.) So I have that going for me.

The only thing missing is that at GreenGeeks I had a fancy control panel that made it much simpler to share the machine with my friends. I do miss that, but I’m ready now to host friend and family sites at a very reasonable cost.

So there you have it! This is just your typical Apache/WordPress/W3 Total Cache/Amazon S3/CloudFlare site run off a Mac mini located somewhere in Nevada. Load times are less than 5% of what they were a year ago. Five percent! Conservatively. Typically it’s more like 1/50th of the load time. Traffic is up. Life is good.

Now I have no incentive at all to learn more about optimization.

2

Things I Learned while Moving to a new Web Host

You probably can’t tell, but this site is now being served by a different host. The reasons I switched were many, but once MMHosting got hacked I decided it was time to move. Then when a particular PHP library was not on their servers (one that allows WordPress to read the date of an uploaded image), I actually did the move.

After quite a bit of looking around in which all the dang hosts started to look the same, I chose iPage. They are not quite the cheapest, but they purchase carbon offsets for wind-generated power. They also had a stronger emphasis on security.

At this writing, I still don’t know if my new host has the needed php library. I’ll be finding out when the dust settles. If not, I can install it in my site myself, I suppose, but let’s keep fingers crossed.

So, I learned a few things, and remembered a few others.

  1. Among sftp clients, RBrowser may have my favorite interface but it’s glacially slow with multiple files.
  2. Without ssh access to my site (none of the big hosting companies allow that on their cheap plans), I had to use phpMyAdmin to copy my databases over. Here’s an interesting bit of trivia: if your data has the phrase ‘drop database’ anywhere in it, phpMyAdmin will stop executing the import right there and then. This is to protect you from SQL injection attacks, where people sneak malicious data into your database that later gets executed as an instruction. ‘Drop database’ can be pretty devastating, so the software simply refuses to complete the import, even if the phrase is safe in the text of a post.

    The way phpMyAdmin is configured at my new host, however, when it stops, it doesn’t say why. It doesn’t even admit that anything went wrong, or indicate in any way that not all the data was imported. This can be inconvenient when you have a bulletin board for a product that has a drag-and-drop database feature. (Now it has a drag-and-drop database.)

  3. You can’t tell Safari not to uncompress zip archives it downloads (that I could find), but the original zip files can be found in the trash.
  4. jerssoft_phpb5 and jerssoft_phpbb5 are not the same thing, no matter how many hours you spend banging your head on them.

So iPage has been great (although their control panel is not completely Safari-friendly when it comes to processing payments). I’ve interacted with them in three different ways now — phone, chat, and a support ticket. Two of the interactions were due to the afore-mentioned payment glitch, and once for technical support trying to get my files copied and the site ready to go before I switched the domain registry. Although I didn’t get the answers I was hoping for, the tech was competent and knew what she was doing.

If I continue to be pleased with iPage, I will provide a link for those looking for a Web host. Because we all need Web hosts these days, don’t we?

Appreciating Fonts

The look of this blog when viewed on a Windows machine has always subtly annoyed me. I’ve been using the default font setup for WordPress, which uses Lucida Grande first, and if that is not available it uses Verdana. Verdana to me looks, I don’t know, thin or stretched or something. Loose. Unfortunately most Windows boxes don’t come with Lucida Grande, so Verdana is what most people experience. Today I decided to do something about it.

It’s possible now to tell a broswer to load a font from the Web when displaying a particular page. I could quite easily put @font-face directives in my files, load copies of Lucida Grande onto the server, and I’d be done (except for Internet Explorer, and those people can get by with Verdana). Unfortunately, although technically pretty simple, that course of action would not be legal.

There’s a font on Windows called Lucida Sans Unicode (or something like that) which is very similar to Lucida, but is not nearly as good for italics and bold face. This will be my fall-back solution.

For a while today, however, I thought I might go look for a new font, something that caught the spirit of this blog, yet was easy to read on a screen and had a nice ink density. On top of that, it had to be free or at least reasonably priced, and it had to include good italic and bold versions, and it had to include the wacky Czech diacriticals for those few episodes where I use them, plus the full range of punctuation including a variety of dashes, copyright symbols, and stuff like that.

I came up empty. Making a good font is not at all simple, and the people who make the great ones quite understandably want to be paid for their work. If I found one that measured up to Lucida Grande in usefulness and that would give this site a unique feel, I might be tempted to pony up.

The closest thing I could find was a font called Liberation, which is a favorite in the Linux world. At this writing, those without Lucida Grande will see that font (unless you’re using Internet Explorer). It’s OK, but the text is actually a little smaller for the same font size. That certainly is annoying. I haven’t looked at the text on enough different screens to know for sure, but I think right now the lettering is too small.

How’s it looking for you, my windows-using readers? Do you have any favorite fonts? I think with screen resolutions improving, it’s even possible to consider a serifed font these days.

Drupal and WordPress

There is a lot of talk at Drupalcon about how they stack up against the competition. We are in the weeding-out phase of the Web Content Management System market, when most of the myriad contenders will fall by the wayside. Those who make a living building and using Drupal naturally want their platform to be among the survivors.

Drupal, according to their own assessment, powers about 1% of the Work Wide Web. The Drupalistas estimate that WordPress accounts for just north of 8%. There is another system called Joomla that is roughly even with Drupal. These three look to be the survivors in the Great-Web-site-in-a-box sweepstakes.

Honestly, I was a little surprised that Drupal considered WordPress to be a competitor. Sure, they both want to be used for more and more of the Web, but does Lego consider Tonka to be a competitor? Here’s the deal: Drupal says WordPress is the most popular Content Management System (CMS). I say WordPress is not a CMS at all.

That’s not to say WordPress isn’t a fine tool, in fact, this blog uses WordPress. But would I use WordPress for my current paying gig? No. Honestly I dread the day when WordPress becomes a big, fancy CMS like Drupal. That’s not what it’s for. There is a reason WordPress is the big dog, and it’s not because you can build sophisticated Web applications with it, it’s because you can install WordPress, find a nice skin, and get your stuff on the Web in an attractive and intuitive way. WordPress is a publishing platform, and a pretty good one at that.

Drupal, on the other hand, begins to shine at the next level up in terms of sophistication. It is the Lego to WordPress’ Tonka. There is considerably more design up front, and much critical functionality must be added as external modules that don’t come with the main (“core”) code. (Some of these things will be added to core in Drupal 7.) Maintenance of a Drupal site is more labor-intensive as well, as updating the parts is more complicated than with WordPress.

In exchange for the added complexity, you get a lot more flexibility. That’s not to say that WordPress can’t be used to make sophisticated Web sites, but generally speaking WordPress is optimized in putting a defined sort of information (like blog posts) on the screen in a very flexible way. There are hundreds of ways to add other pre-defined data types (for instance, there are shopping cart plugins), and all that works really well and most people are going to be happy with that.

Drupal is the tool you use when the data types in WordPress won’t do it for you. In Drupal you are building a Web application, where with WordPress you are using a Web application. Step one in building a site with Drupal is designing the data and the relationships between the various data types. Drupal allows you to design your data without having to design your database. Some of the ways Drupal implements your data design are pretty hokey, but it works. You can create pretty sophisticated data models without knowing a thing about how a database works – or even what kind of database you’re using. (In fact, you’re better off not knowing how the data is structured, because things can move unexpectedly as you tweak your design.) You are also presented with an almost dizzying set of options to decide who is allowed to see, edit, or create each little piece of each data type you define.

Once you get your content types defined, then you can move on to how to get actual content into the system (handled pretty much automatically), and how to present specific subsets of your data on the screen. To get at the data one often uses views, which are built using a tool that generates (frustratingly limited) database queries and then processes the results with a gratifying set of options tailored to each data type.

Then it comes time to put stuff on the screen. To control where things go and when, there are regions, blocks, panels, panes, pages, and so forth in a nonintuitive overlapping of roles. Blocks and regions and pages are built in, but the profusion of other options is a testament to the limited way they work together. For all the flexibility of Drupal, GUI building is still clumsy, though getting better (I’m told).

At last we come to the task of making the output pretty. For this purpose Drupal uses a maze of performance-sucking php template files that are invoked using a system of names that allows one to set up the display of information at just about any level of granularity. Many of these templates go hand-in-hand with specially-named preprocessor functions that allow you to customize how data is prepared for presentation.

Drupal separated the preparation of data and the presentation of data to allow people with different skill sets to do the different tasks. The template files can be done with only a minimal amount of php, while the preprocessors are where the real logic is implemented, unencumbered by HTML and CSS. This also has the effect of putting the risky code out of reach of those who aren’t expert in Web security. All good things.

I used the phrase “performance-sucking” above, and I meant it. The designers of Drupal made a conscious decision to emphasize good architecture and flexibility over fast execution. This was the same decision Google faced a few years back, as they developed ever-more-sophisticated pattern matching algorithms. While competitors kept things simple to reduce server load, the folks at Google decided that the cost of processing cycles and storage was tending toward free, and chose to emphasize the quality of the information they provided instead. Similarly, Drupal has decided to make things in a structurally sound way and spend the processor cycles and disk reads necessary to support that.

Drupal 7 will be even slower, but will be more scalable (they say). What that means is that although the software is not as fast as it could be, its behavior is predictable as demand increases, and it is easer to scale up your site as things go huge. Good structure pays greater and greater dividends as things get bigger.

All that stuff Drupal has makes it a more complicated to get up and running, and for a simple site (or even one of moderate complexity but with a relatively straightforward data model), WordPress is going to get you to the promised land with a lot less pain.

I am led to believe that the WordPress community feels it needs to compete with Drupal just as much as Drupal thinks they need to compete with WordPress. Toward this end WordPress 3.0 will have new features that answer some of Drupal’s flexibility advantages. All I can say is “PLEASE, WordPress, don’t try to be everything Drupal is.” That WordPress is not everything Drupal is constitutes its greatest advantage. Stay with your market, WordPress!

New Features here at MR&HBI!

First, allow me to call your attention to the episode immediately before this one. You might notice the little icon is a camera. “huh,” you might be saying to yourself, “I don’t remember seeing that one before.” Very observant, Buckaroo! It’s for a new category, Photography, that I added. “But,” the even more observant amongst you might say, “There are already a handful of episodes in that category.” Right again, Wisenstein! I recategorized a couple episodes that were under The Great Adventure and found a couple in Idle Chit-Chat that were better filed under the new category. I expect there are plenty more; the trick is finding them.

The icon is actually my camera sitting on an opened unabridged dictionary. That may seem staged, but that’s actually where we keep the camera these days. Yes, we have an unabridged dictionary open on a stand at all times. No, that does not make us geeks.

Second, way down at the bottom of the sidebar, there’s a section called Other Muddled Stats (or something like that). That’s a wordpress widget I made that counts all the words in all the episodes, and keeps a tally of how many comments there have been as well. I plan to add other stats as well next time I have the hood open. Perhaps the number of times I’ve said “You don’t have to thank me,” or the number of times I’ve blamed the Chinese for things. (Hm… haven’t done that in a while…) Anything you’d like to know? The number of letters typed? Words in comments? Most prolific commenters? If it’s on these pages, I can count it.

The WordPress plugin itself is hand-crafted by yours truly. I started by downloading a different word-counting plugin, but it counted the words on every page load and didn’t have a sidebar widget. All it was was a database query and a loop. My version only counts when the relevant value changes – it only counts words when a new episode is posted, for instance. Once I tidy it up I’ll be adding it to the WordPress repository, so others can also gather useless stats about their blogs. It’s all about sharing the love.

Haloscan comments to WordPress – the nitty gritty.

As I mentioned in the previous episode, I recently had to move more than 8000 comments from my old comment system, Haloscan, and import them into WordPress. Haloscan served me well back in the day, but they are going away, and all my more recent comments are in the WordPress system anyway. Nice to have them all in one place.

The process turned out to be pretty easy. I found a script for importing comments from a different system, modified it, modified it some more, found a fundamental problem with it, fixed that, and in the end not much of code remained from the example, except the part where the WordPress logo is displayed on the screen. I assume that part came from the code the guy copied to make the code that I copied.

Along the way I learned a couple of things. PHP is a pretty flexible language, but running a loop that sets up 8500 data structures and runs 25500 database queries exposes PHP’s primary weakness: memory management. The whiz kids who invented PHP designed it for a load/compile/execute/exit-and-clean-up flow. Memory allocated during execution is cleaned up when the program is done running (usually when the Web page is delivered). When you try to do heavy lifting with PHP, you have to start paying attention to getting your memory back before the traditional clean-up time.

The code I started with did a direct database query to add the comment to the comments table, but that got things out of sync with other tables. (The posts table keeps track of the number of comments that apply to it, presumably for performance reasons.) I dug into the core WordPress code and found the method they call to post comments, and I made my code call that function. I have no idea what all the bookkeeping chores are that function does, and really I don’t care as long as they get done.

I didn’t worry about performance too much at first (after all, it only has to run once), but one of the database queries I did was really expensive (scanning all the posts for a specific set of characters). Even running on my local server it was slow, and I knew that if I tried something like that on my actual Web host alarms would go off and they’d shut me down for a while. I did a little optimization on that front, and it was enough.

The following script has some Muddle-specific code in it, but it might come in handy for others who need to move Haloscan comments to a new system. The part that parses Haloscan XML is pretty generic and would work for anyone, the part that saves the comments might be useful as a guide as well. The main difference others will have to deal with is where to get proper post_id based on the thread field in the XML. In my case I had a link in each blog episode back to the Haloscan thread.

The HTML bit in the middle of the file is not essential; but it puts a nice WordPress logo on the screen when the script starts up. I inherited that from the script I started with.

NOTE: While this script has code in it specific to me, I am available to customize it for others who need to move their code from Haloscan into another environment, or, for that matter, from any structured source into WordPress. Drop me a line!

<?php
 
if (!file_exists('../wp-config.php')) die("There doesn't seem to be a wp-config.php file. You must install WordPress before you import any comments.");
require('../wp-config.php');
 
function saveCommentToWP($comment, $dbRef, &$postThreads) {
    //echo "here's where the comment save happens <br/><br />";
    $thread = $comment['thread'];
    $postID = $postThreads[$thread];
    if (!isset($postThreads[$thread])) {
        $query = "SELECT * FROM wp_posts WHERE post_content LIKE '%".$thread."%' AND post_status='publish'";
        $postID = $dbRef->get_var($query, 0);
        $postThreads[$thread] = $postID ? $postID : 0;
        if ($postThreads[$thread] == 0)
            echo ("<br />Thread $thread has no post!");
        else
            echo "<br />Thread $thread";
        flush();       // got to have real-time updates!
    }
 
    if ($postID && $postID != 0) {
        $userId = $comment['email'] == [email protected]' ? 1 : 0;
 
        //set up the data the way wp_insert_comment expects it.
        $wp_commentData = array();
        $wp_commentData['comment_post_ID'] = (int) $postID;
        $wp_commentData['user_id'] = (int) $userId;
        $wp_commentData['comment_parent'] = 0;
        $wp_commentData['comment_author_IP'] = $comment['ip'];
        $wp_commentData['comment_agent'] = 'Haloscan';
        $wp_commentData['comment_date'] = $comment['datetime'];
        $wp_commentData['comment_date_gmt'] = $comment['datetime'];
        $wp_commentData['comment_approved'] = '1';
        $wp_commentData['comment_content'] = $comment['text'];
        $wp_commentData['comment_author'] = $comment['name'];
        $wp_commentData['comment_author_email'] = $comment['email'];
        $wp_commentData = wp_filter_comment($wp_commentData);
 
        $comment_ID = wp_insert_comment($wp_commentData);
 
        //echo ("<strong>saved comment $comment_ID</strong>");
    }
 
    // try to reclaim some memory
    unset($wp_commentData);
    unset($comment);
}
 
header( 'Content-Type: text/html; charset=utf-8' );
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<title>WordPress &rsaquo; Import Comments from RSS</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style media="screen" type="text/css">
    body {
        font-family: Georgia, "Times New Roman", Times, serif;
        margin-left: 20%;
        margin-right: 20%;
    }
    #logo {
        margin: 0;
        padding: 0;
        background-image: url(http://wordpress.org/images/logo.png);
        background-repeat: no-repeat;
        height: 60px;
        border-bottom: 4px solid #333;
    }
    #logo a {
        display: block;
        text-decoration: none;
        text-indent: -100em;
        height: 60px;
    }
    p {
        line-height: 140%;
    }
    </style>
</head><body> 
<h1 id="logo"><a href="http://wordpress.org/">WordPress</a></h1> 
 
<?php
 
// Bring in the data
$reader = new XMLReader();
if ($reader->open('export-8.xml')) {
    $postThreads = array();
    $thread = '';
    while ($reader->read()) {
        //echo "<br />read node type: ".$reader->nodeType.';     '.$reader->name.': '.$reader->value;
        if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'thread') {
            $thread = $reader->getAttribute('id');
        }
        if ($thread) {
            if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'comment') {
                // begin building comment
                $comment = array('thread' => $thread);
                $reader->read();
                while ( !($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'comment') ) {
                    if ($reader->nodeType == XMLReader::ELEMENT) {
                        $property = $reader->name;
                        $reader->read(); // assumes text element following element tag has the data
                        $comment[$property] = $reader->value;
                    }
                    $reader->read();
                }
                saveCommentToWP($comment, $wpdb, $postThreads);
            }
        }
    }
    $reader->close();
}
 
?>
 
 
</body>
</html>

In with the Old

I got a message today that Haloscan is closing down. That is the service that provided refreshingly spam-free comments on my old blog. A year ago I finally abandoned iBlog for WordPress, and I’m glad I did. At the time, however, I didn’t want to tackle moving the old comments over into the new system. In my conversion I embedded a link into each of the old episodes to the legacy comment system, and left it at that.

It is fortunate I found out about Haloscan when I did. Another week and 8500 comments would have been lost forever. That’s a big part of the underlayer of this blog, the part people sink gradually into as they hang around more, and they realize that this isn’t just about me. There are some pretty interesting conversations, observations, poems, and even stories in those comments. With the timer running I set to work to get the comments out of Haloscan and into WordPress.

The move turned out to be pretty straightforward. (Simpler, perhaps, than it had been to put the links into the posts.) I’ll go into the technical details in an episode tomorrow, but for now, why don’t you pop into the archives for 2004 or so and find an old episode with good comments? Maybe you’ll find something interesting someone said once. Maybe you’ll see the name of someone you haven’t thought of in a while. Maybe you’ll see something you want to comment on, even.