1,000,003 Words!

Screen Shot 2016-02-24 at 2.23.02 PM
It has happened. Muddled Ramblings and Half-Baked Ideas has rolled over the odometer and has blasted well beyond the 1,000,003-word line. I decided to celebrate by taking the day off work to throw out a bit of a redesign here; the old code simply did not support some of the cool new WordPress features I’ve been wanting to leverage. A ground-up rebuild is long overdue.

Even when you start with a fairly clean off-the-shelf theme, however, a great deal of fiddling and tweaking ensues. Some of the old widgets, like the colorful tag cloud and the sweet-o-meter, seem to be awol right now, and I’m not sure about the typography for reading my longer-winded treatises.

Also missing, and a little more difficult to bring back, is the poetry feed that was playing in the header. I’d like to bring it back, but at this moment I’m not sure where to put it.

What do you think? Too dark? Please leave comments here on the blog, while I work on getting the styling of the comments on the blog looking right.

Later tonight, after the celebratory single malt, I will compose the Inevitable Retrospective Episode.

1

wp-cli, Where have you been all my life?

WordPress updates can be pretty insecure. FTP was invented back before there was an Internet, and when when no one thought that bad people might be on the same network you’re using (why even have a password if you let everyone see it?). Ah, for those naïve and simple times!

Yet even today most of the Web-site-in-a-box products you can get to run on your GoDaddy account use FTP. I control my own server, and you can bet your boots that FTP is turned right the hell off.

It can be a hassle setting WordPress up to allow its update features to work in a very secure fashion, however. I was wrangling rsa certificates when I ran across another solution: rather than push a button on a web page to run an update, log into the server and run a command there. Simple, effective, secure, without file permission fiddling and authorized_keys files.

wp-cli does way more than updates, too. It is a tool I’ve been pining for for a long time, without even knowing it. Want to install a plugin? wp plugin install "xyz" and you’re done. Back up the ol’ database? They have you covered. Welcome to my tool belt, wp-cli!

If you’re not afraid to type three commands to update your site, rather than trying to maintain a hole in your security in such a way that only you can use it, then this is a great option for you. Check it out at wp-cli.org.

It’s Inside the Building!

You know in that horror movie where the girl is on the phone and there’s some crazy mofo who’s freaking her out but for some reason she doesn’t hang up and eventually it turns out the crazy mofo is already inside the house and really has no reason to call? I had a moment like that tonight. I’ve had a rash of spam lately, all using my Facebook identities. I waited for my spam-catchers to get a clue, but the comments kept coming. “Fine,” thought I, “I’ll just block the addresses they’re coming from.”

I fired up my diagnostics, and found the source. localhost. My server thought the comments were coming from itself! Double-plus ungood, to quote Orwell. Extra double-plus. My spam-detecting software, it turns out, recognized the evil of the comments, but was immediately overridden by the administrator. By me, or a vile piece of software pretending to be me.

I just changed a lot of passwords. I hope I can remember them later. I also set a switch that requires that all comments be approved before they go live. Alas, this is likely more an inconvenience to legit comment traffic, as the evil robot has already proven capable of emulating me and giving permission.

I also spastically updated all my wordpress plugins (I do this fairly often anyway) — including, perhaps significantly or not, the one that passes comments between here and Facebook. Later, going back, I see nothing in that plugin’s update info to the tune of “closed egregious spam hole.” But the attack vector seems to be through my Facebook identities. It may be that the conduit trusted the origin of the messages too much.

So now I wait and watch, and your comments will take a little longer to reach the page. Hopefully I can loosen things up soon.

Pardon the Dust – again


A warning sign I saw between Calgary and Edmonton.

I’m putting in a new comment system that hopefully will answer a couple of annoyances I found with the old. It may look wonky compared to everything else. I’ll probably just turn it on, see how things look, and turn it off again in a few hours once I know the effort it’s going to take to get it looking right.

In the meantime, leave a comment and tell me what you think!

Call me g2-587217eb4d0b8b1710372695336f2a58

The other day I got an email notification that someone had requested a password change for my wordpress.com account. WTF? I almost never even log into wordpress.com. But, if someone knew my member name, they could try to hack my account and I assumed that this message was a result of such shenanigans. I figured an actual change of my password was in order, just to be safe.

Then I noticed the user name on the account: g2-587217eb4d0b8b1710372695336f2a58

That’s actually not my user name at wordpress.com. Someone (a robot, obviously) created an account with my email address. Huh. I logged into the fake account, changed its password so whoever created it wouldn’t get access to it, and looked around for a way to delete the account entirely. I couldn’t find one.

I know you had a snafu that led to people’s passwords being stored less securely, and therefore a spate of “reset your password” messages issued forth, but this message was absolutely not the same as those. I am fanatic about protecting my email password (as I write about here), and I have changed it recently. There is no other sign that my email account has been compromised.

I logged in as the bogus user and checked to see if any comments or posts had been made; it appeared not. So, I set the password to something ridiculous and promptly forgot it.

The only problem is, when I leave comments on people’s wordpress.com blogs, after I put in my email it auto-fills the rest with data from the bogus account.

So, two things:
1) how did there come to be an account with an email address the bad guy almost certainly didn’t have access to?
2) how can I make that bogus account go away entirely, and never bother me again?

Damn Lies and Statistics

I read recently that WordPress “powers” more than 14% of the top 1,000,000 Web sites. (“Powers” in quotes because actually it’s electricity that powers them — lots of electricity.)

This site is also a WordPress site, and I started to wonder: Am I in the top million? A million, is, after all, a very big number, and this site does get regular traffic.

Which all begs the question, how the hell do you define “top Web site” and how does anyone know what they are? Presumably “top” sites are the ones that get the most visits, but even “visit” is tricky to pin down, and once you have a working definition there’s still the question of how the heck you measure it. Throw in game sites where a visit can last for hours — does that count for more than someone dropping in to see if there’s a new episode up in their favorite blog?

How about traffic from robots? When a robot tries to spam this site, does that count? How would the counting mechanism differentiate that from a legitimate visit?

For that matter, what’s a “site”? Does wordpress.org count as a single site, or is each blog hosted there counted individually? Is the difference whether the owner bothered to register their own domain?

All that aside, the slightly-depressing truth is that this is probably not one of the top million sites, no matter how you figure it, even counting spam-bot visits. Yep, there are probably more than one friggin’ million Web sites more popular than this one. Most of those sites will have a specific purpose — sites for businesses both local and international, political and news sites, comics, and so on (and of course porn).

I have a hard enough time sticking to a single topic in a given episode that the idea of staying on a subject for the whole damn blog is ridiculous. But I digress.

Most content? I’d probably be in the top million in that category. There’s a lot of stuff here. Oldest still-active sites? I might even crack the million line with that measure. How many sites have been continuously active since 2003? That’s like, a century in Internet time.

So I probably get the top-million most persistent award, if nothing else. Maybe I should make that a tagline for the site when I un-Flash the banner: “One of the million most persistent Web sites in the world!”

2

Bad Behavior, CloudFlare and Google Bot

This blog has several layers of protection from the evils of the outside world, but those layers don’t always get along. One problem that I had is pretty common among CloudFlare users, and the documentation provided by the relevant players has a hole in it – a key nugget of information that can make all the difference.

The nugget follows in due course.

My first line of defense from ne’er-do-wells and miscreants is CloudFlare. They stop most of the bad guys before they even reach my site. Still, for some sorts of attacks, when there’s doubt it’s better to let the bad guy through. It may turn out to be a good guy.

A program called Bad Behavior is my next line of defense. It sits on my server and quickly spots liars and weasels. For dangerous-looking attacks, that’s the limit. But, when there’s doubt and the site itself is not at risk, Bad Behavior will let the attack through.

At this point, ‘attack’ means ‘comment spam’. Everything else is stopped before it reaches this stage. Most of the comment spam has been stopped as well, but some has been given the benefit of the doubt. That’s where Akismet comes in. This layer spots the rest of the comment spam, and it can be much more aggressive since it doesn’t actually delete the spam, it puts it into a bin for future review. So, legitimate comments can be rescued by an alert blog admin.

It works pretty well. Three spams actually got through all the layers last week, the first time any have gotten through in quite some time. Somewhere, a spammer popped a bottle of bubbly.

So comment spam is pretty well thwarted. Hooray! Unfortunately, for a while I had a pretty big problem. Search engine robots were being denied. I fell off Google and Yahoo! and all the rest, and traffic to this site dwindled.

Note: according to this article, Bad Behavior has been updated to avoid the following problem. Yay! You should still install the CloudFlare plugin and the Apache module if you are able.

Here’s what was going on:

  1. Googlebot said ‘hey, muddledramblings.com, show me page x’.
  2. The request must get past CloudFlare. No problem. They see it’s the real Google bot and pass the request on to my server.
  3. Bad Behavior is next. They look at the incoming message and see something that claims to be a Google bot but It’s not coming from Google. It’s coming through CloudFlare. Bad Behavior says, “You are a lying sack of dingo dung and a false Google bot. You are obviously evil and you may not pass.” Google is shut out. The other legitimate robots are cut off as well.

This problem is pretty easy to fix, but not quite as easy as WordPress admins would like to hope. CloudFlare has code that you can install on your server that will straighten the whole problem out. Basically it tweaks incoming messages so that the original source appears instead of CloudFlare. This bit of fix-it code is available as a WordPress plugin, so you can install the plugin and rest easy.

But that’s the thing that tripped me up and is not explained in the docs. In the case of working with Bad Behavior, the WordPress Plugin is not enough.

The catch is that Bad Behavior does its magic before the CloudFlare plugin can do its magic. So, even with the CloudFlare plugin firmly installed, Bad Behavior will reject Google bot and all his pals.

There are two simple solutions: 1) Install the CloudFlare Apache module, which kicks in before anything else is run. This is preferable to the WordPress plugin anyway, because it’s a system-wide solution. 2) If you don’t have that level of control over your server, turn off Bad Behavior. It’s a shame to lose that layer of protection, but not devastating; there’s some overlap between what CloudFlare stops and what Bad Behavior stops. You still have two layers and your own alert management to fall back on.

How This Blog Works

Over the years, the technology behind this blog has gone from cave-dwelling stone-knives-and-bearskin static pages to cloud-city jet-packs-and-lightsaber dynamic yumminess. That transformation starts with WordPress but does not end there. Not by a long shot.

I started the Muddled Media Empire using a tool called iBlog, because it was free and worked with Apple’s hosting service, which I was already paying for. iBlog’s claim to fame was that it didn’t require a database – every time you made a change it went through and regenerated all pages that were affected. Toward the end, that was getting to be thousands of pages in some cases, each of which had to be uploaded individually. When iBlog’s support and development faltered, it was already past time for me to move on.

WordPress is an enormously popular Web-publishing platform. It comes in two flavors: you can host your blog on their super-duper servers and accept their terms of service and the slightly limited customization options, or you can install the code on your own server and go nuts. I chose the latter, mainly because I wanted to be able to touch the code. I’m a tinkerer.

So I signed up for a cheap Web host and set to work building what you see now. At first things were great, but after a while the host started having issues, and the once-great customer service withered up and vanished. So much for LiveRack. I think they just didn’t want to be in the hosting business anymore. I moved to iPage.

iPage was cheap, but I was crammed onto a server with a bunch of other people and sometimes my blog would take an agonizing time to load. Like, almost a minute. Then there was the time a very popular Geek site linked to my CSS border-radius table and iPage shut me down because the demand on the server was too much. Ouch! My moment in the sun became my moment at the bottom of a well.

I set out to find ways to make this blog more server-friendly and more user-friendly at the same time. Step 1: caching. WordPress doesn’t store Web pages, it stores data and the instructions on how to build a Web page. So, every time you ask to load a page here, WordPress fires up a program that reads from the database and assembles all the parts to the page. The thing is, that takes longer than just finding the requested file and sending it back, the way iBlog did. Caching is a way for the server to say, “hey, wait a minute – I just did this page and nothing’s changed. I’ll just send the same thing I did last time.” That can lead to big savings, both in time and server load.

I looked at a few WordPress cacheing programs and eventually chose W3 Total Cache, because it does far more than just cache data. For instance, it will minify scripts and css files (remove extra spaces and crunch them down) and combine the files together so the browser only has to make one request. It will zip the data, meaning fewer 1’s and 0’s moving down the pipe, and it does a few other things as well, one of which I will get to shortly.

I installed W3 Total Cache, and although some settings broke a couple of javascripts (for reasons I have yet to figure out – I’ll get to that someday), the features I could turn on definitely made a difference. Hooray!

But Muddled Ramblings and Half-Baked Ideas was still way too slow. I continued my search for ways to speed things up. I also began a search for a host that sucked less than iPage. (iPage was also starting to have outages that lasted a day or more. Not acceptable.) I decided I was willing to pay extra to be sure I wasn’t on an overwhelmed machine.

I’m not sure which came first – new server or Amazon Simple Storage Service. S3 is a pretty basic concept – you put your stuff on their super-duper servers, and when people need it they will get it really quickly. Things that don’t change, like images and even some scripts, can live there and your server doesn’t have to worry about them.

This is where W3 Total Cache earned my donation to their cause. You see, you can sign up for Amazon S3, and then put your account info into the proper W3TC panel and Bob’s Your Uncle. W3TC goes through your site, finds images and whatnot, puts them in your S3 bucket, and automatically changes all the links in your Web pages to point to your bucket instead of your own server. (Sometimes I find I have to copy the image to my S3 bucket manually, but that’s a small price to pay.)

Now a lot of the stuff on my blog, like the picture of me with the Utahraptors the other day, sits on a different, high-performance server out there somewhere, and no matter how overwhelmed my server happens to be at the moment those parts will arrive to you lickety-split. Amazon S3 is not free, however – each month I get an invoice for two or three cents. Should Muddled Ramblings suddenly become wildly popular, that number would increase.

About that server – the next stop on my quest for a good host was a place called Green Geeks. I wanted to upgrade to a VPS, which means I get a dedicated slice of a server that acted just like it was my very own machine. There is a lot to like about those, but my blog just wouldn’t run in the base level of RAM they offered. I upgraded and reorganized so that different requests would not take up more ram than they needed. Still, I had outages. Sometimes the server would just stop freeing up memory and eventually choke and die. Since it was a virtual server in a standard configuration, logic says it was caused by something I was doing, but all my efforts to figure it out were fruitless, and Green Geeks ran out of patience trying to help me figure it out.

The server software itself is Apache. At this point I considered using nginx (rhymes with ‘bingin’ ex’) instead. It’s supposedly faster, lighter, and easier to configure. But, I already know Apache. I may move to nginx in the future, but it’s not urgent anymore.

During the GreenGeeks era I came across another service that improves the performance of Web sites while reducing the load on the servers. I recently wrote glowingly about CloudFlare, but I will repeat myself a bit here for completeness. CloudFlare is a service that has a network of servers all over the world, and they stand between you the viewer and my server. They stash bits of my site all around the world, and much of the time they will have a copy of what you need on hand, and won’t even need to trouble my server with a request. About half of all requests to muddledramblings.com are magically and speedily taken care of without troubling my server at all. They also block a couple thousand bogus requests to my server each day, so I don’t have to deal with them (or pay for the bandwidth). It’s sweet, and the base service is free.

Unfortunately, it was not enough to keep my GreenGeeks server from crashing. Once more I began a search for a new host. I found through word of mouth a place called macminicolo. Apple employees get a discount, but I wasn’t an Apple employee yet. It was still a bargain. For what turned out to be the same monthly cost of sharing part of a machine at GreenGeeks, I get an entire server, all to myself, with plenty of RAM. I’ve set up several servers on Mac using MacPorts, and I knew just how to get things up and running well. It costs less than half what a co-located server costs anywhere else I have found (Mac, Windows, or Linux). (Co-location has up-front costs, but in the long term saves money.) So I have that going for me.

The only thing missing is that at GreenGeeks I had a fancy control panel that made it much simpler to share the machine with my friends. I do miss that, but I’m ready now to host friend and family sites at a very reasonable cost.

So there you have it! This is just your typical Apache/WordPress/W3 Total Cache/Amazon S3/CloudFlare site run off a Mac mini located somewhere in Nevada. Load times are less than 5% of what they were a year ago. Five percent! Conservatively. Typically it’s more like 1/50th of the load time. Traffic is up. Life is good.

Now I have no incentive at all to learn more about optimization.

2

Things I Learned while Moving to a new Web Host

You probably can’t tell, but this site is now being served by a different host. The reasons I switched were many, but once MMHosting got hacked I decided it was time to move. Then when a particular PHP library was not on their servers (one that allows WordPress to read the date of an uploaded image), I actually did the move.

After quite a bit of looking around in which all the dang hosts started to look the same, I chose iPage. They are not quite the cheapest, but they purchase carbon offsets for wind-generated power. They also had a stronger emphasis on security.

At this writing, I still don’t know if my new host has the needed php library. I’ll be finding out when the dust settles. If not, I can install it in my site myself, I suppose, but let’s keep fingers crossed.

So, I learned a few things, and remembered a few others.

  1. Among sftp clients, RBrowser may have my favorite interface but it’s glacially slow with multiple files.
  2. Without ssh access to my site (none of the big hosting companies allow that on their cheap plans), I had to use phpMyAdmin to copy my databases over. Here’s an interesting bit of trivia: if your data has the phrase ‘drop database’ anywhere in it, phpMyAdmin will stop executing the import right there and then. This is to protect you from SQL injection attacks, where people sneak malicious data into your database that later gets executed as an instruction. ‘Drop database’ can be pretty devastating, so the software simply refuses to complete the import, even if the phrase is safe in the text of a post.

    The way phpMyAdmin is configured at my new host, however, when it stops, it doesn’t say why. It doesn’t even admit that anything went wrong, or indicate in any way that not all the data was imported. This can be inconvenient when you have a bulletin board for a product that has a drag-and-drop database feature. (Now it has a drag-and-drop database.)

  3. You can’t tell Safari not to uncompress zip archives it downloads (that I could find), but the original zip files can be found in the trash.
  4. jerssoft_phpb5 and jerssoft_phpbb5 are not the same thing, no matter how many hours you spend banging your head on them.

So iPage has been great (although their control panel is not completely Safari-friendly when it comes to processing payments). I’ve interacted with them in three different ways now — phone, chat, and a support ticket. Two of the interactions were due to the afore-mentioned payment glitch, and once for technical support trying to get my files copied and the site ready to go before I switched the domain registry. Although I didn’t get the answers I was hoping for, the tech was competent and knew what she was doing.

If I continue to be pleased with iPage, I will provide a link for those looking for a Web host. Because we all need Web hosts these days, don’t we?

Appreciating Fonts

The look of this blog when viewed on a Windows machine has always subtly annoyed me. I’ve been using the default font setup for WordPress, which uses Lucida Grande first, and if that is not available it uses Verdana. Verdana to me looks, I don’t know, thin or stretched or something. Loose. Unfortunately most Windows boxes don’t come with Lucida Grande, so Verdana is what most people experience. Today I decided to do something about it.

It’s possible now to tell a broswer to load a font from the Web when displaying a particular page. I could quite easily put @font-face directives in my files, load copies of Lucida Grande onto the server, and I’d be done (except for Internet Explorer, and those people can get by with Verdana). Unfortunately, although technically pretty simple, that course of action would not be legal.

There’s a font on Windows called Lucida Sans Unicode (or something like that) which is very similar to Lucida, but is not nearly as good for italics and bold face. This will be my fall-back solution.

For a while today, however, I thought I might go look for a new font, something that caught the spirit of this blog, yet was easy to read on a screen and had a nice ink density. On top of that, it had to be free or at least reasonably priced, and it had to include good italic and bold versions, and it had to include the wacky Czech diacriticals for those few episodes where I use them, plus the full range of punctuation including a variety of dashes, copyright symbols, and stuff like that.

I came up empty. Making a good font is not at all simple, and the people who make the great ones quite understandably want to be paid for their work. If I found one that measured up to Lucida Grande in usefulness and that would give this site a unique feel, I might be tempted to pony up.

The closest thing I could find was a font called Liberation, which is a favorite in the Linux world. At this writing, those without Lucida Grande will see that font (unless you’re using Internet Explorer). It’s OK, but the text is actually a little smaller for the same font size. That certainly is annoying. I haven’t looked at the text on enough different screens to know for sure, but I think right now the lettering is too small.

How’s it looking for you, my windows-using readers? Do you have any favorite fonts? I think with screen resolutions improving, it’s even possible to consider a serifed font these days.

Drupal and WordPress

There is a lot of talk at Drupalcon about how they stack up against the competition. We are in the weeding-out phase of the Web Content Management System market, when most of the myriad contenders will fall by the wayside. Those who make a living building and using Drupal naturally want their platform to be among the survivors.

Drupal, according to their own assessment, powers about 1% of the Work Wide Web. The Drupalistas estimate that WordPress accounts for just north of 8%. There is another system called Joomla that is roughly even with Drupal. These three look to be the survivors in the Great-Web-site-in-a-box sweepstakes.

Honestly, I was a little surprised that Drupal considered WordPress to be a competitor. Sure, they both want to be used for more and more of the Web, but does Lego consider Tonka to be a competitor? Here’s the deal: Drupal says WordPress is the most popular Content Management System (CMS). I say WordPress is not a CMS at all.

That’s not to say WordPress isn’t a fine tool, in fact, this blog uses WordPress. But would I use WordPress for my current paying gig? No. Honestly I dread the day when WordPress becomes a big, fancy CMS like Drupal. That’s not what it’s for. There is a reason WordPress is the big dog, and it’s not because you can build sophisticated Web applications with it, it’s because you can install WordPress, find a nice skin, and get your stuff on the Web in an attractive and intuitive way. WordPress is a publishing platform, and a pretty good one at that.

Drupal, on the other hand, begins to shine at the next level up in terms of sophistication. It is the Lego to WordPress’ Tonka. There is considerably more design up front, and much critical functionality must be added as external modules that don’t come with the main (“core”) code. (Some of these things will be added to core in Drupal 7.) Maintenance of a Drupal site is more labor-intensive as well, as updating the parts is more complicated than with WordPress.

In exchange for the added complexity, you get a lot more flexibility. That’s not to say that WordPress can’t be used to make sophisticated Web sites, but generally speaking WordPress is optimized in putting a defined sort of information (like blog posts) on the screen in a very flexible way. There are hundreds of ways to add other pre-defined data types (for instance, there are shopping cart plugins), and all that works really well and most people are going to be happy with that.

Drupal is the tool you use when the data types in WordPress won’t do it for you. In Drupal you are building a Web application, where with WordPress you are using a Web application. Step one in building a site with Drupal is designing the data and the relationships between the various data types. Drupal allows you to design your data without having to design your database. Some of the ways Drupal implements your data design are pretty hokey, but it works. You can create pretty sophisticated data models without knowing a thing about how a database works – or even what kind of database you’re using. (In fact, you’re better off not knowing how the data is structured, because things can move unexpectedly as you tweak your design.) You are also presented with an almost dizzying set of options to decide who is allowed to see, edit, or create each little piece of each data type you define.

Once you get your content types defined, then you can move on to how to get actual content into the system (handled pretty much automatically), and how to present specific subsets of your data on the screen. To get at the data one often uses views, which are built using a tool that generates (frustratingly limited) database queries and then processes the results with a gratifying set of options tailored to each data type.

Then it comes time to put stuff on the screen. To control where things go and when, there are regions, blocks, panels, panes, pages, and so forth in a nonintuitive overlapping of roles. Blocks and regions and pages are built in, but the profusion of other options is a testament to the limited way they work together. For all the flexibility of Drupal, GUI building is still clumsy, though getting better (I’m told).

At last we come to the task of making the output pretty. For this purpose Drupal uses a maze of performance-sucking php template files that are invoked using a system of names that allows one to set up the display of information at just about any level of granularity. Many of these templates go hand-in-hand with specially-named preprocessor functions that allow you to customize how data is prepared for presentation.

Drupal separated the preparation of data and the presentation of data to allow people with different skill sets to do the different tasks. The template files can be done with only a minimal amount of php, while the preprocessors are where the real logic is implemented, unencumbered by HTML and CSS. This also has the effect of putting the risky code out of reach of those who aren’t expert in Web security. All good things.

I used the phrase “performance-sucking” above, and I meant it. The designers of Drupal made a conscious decision to emphasize good architecture and flexibility over fast execution. This was the same decision Google faced a few years back, as they developed ever-more-sophisticated pattern matching algorithms. While competitors kept things simple to reduce server load, the folks at Google decided that the cost of processing cycles and storage was tending toward free, and chose to emphasize the quality of the information they provided instead. Similarly, Drupal has decided to make things in a structurally sound way and spend the processor cycles and disk reads necessary to support that.

Drupal 7 will be even slower, but will be more scalable (they say). What that means is that although the software is not as fast as it could be, its behavior is predictable as demand increases, and it is easer to scale up your site as things go huge. Good structure pays greater and greater dividends as things get bigger.

All that stuff Drupal has makes it a more complicated to get up and running, and for a simple site (or even one of moderate complexity but with a relatively straightforward data model), WordPress is going to get you to the promised land with a lot less pain.

I am led to believe that the WordPress community feels it needs to compete with Drupal just as much as Drupal thinks they need to compete with WordPress. Toward this end WordPress 3.0 will have new features that answer some of Drupal’s flexibility advantages. All I can say is “PLEASE, WordPress, don’t try to be everything Drupal is.” That WordPress is not everything Drupal is constitutes its greatest advantage. Stay with your market, WordPress!

New Features here at MR&HBI!

First, allow me to call your attention to the episode immediately before this one. You might notice the little icon is a camera. “huh,” you might be saying to yourself, “I don’t remember seeing that one before.” Very observant, Buckaroo! It’s for a new category, Photography, that I added. “But,” the even more observant amongst you might say, “There are already a handful of episodes in that category.” Right again, Wisenstein! I recategorized a couple episodes that were under The Great Adventure and found a couple in Idle Chit-Chat that were better filed under the new category. I expect there are plenty more; the trick is finding them.

The icon is actually my camera sitting on an opened unabridged dictionary. That may seem staged, but that’s actually where we keep the camera these days. Yes, we have an unabridged dictionary open on a stand at all times. No, that does not make us geeks.

Second, way down at the bottom of the sidebar, there’s a section called Other Muddled Stats (or something like that). That’s a wordpress widget I made that counts all the words in all the episodes, and keeps a tally of how many comments there have been as well. I plan to add other stats as well next time I have the hood open. Perhaps the number of times I’ve said “You don’t have to thank me,” or the number of times I’ve blamed the Chinese for things. (Hm… haven’t done that in a while…) Anything you’d like to know? The number of letters typed? Words in comments? Most prolific commenters? If it’s on these pages, I can count it.

The WordPress plugin itself is hand-crafted by yours truly. I started by downloading a different word-counting plugin, but it counted the words on every page load and didn’t have a sidebar widget. All it was was a database query and a loop. My version only counts when the relevant value changes – it only counts words when a new episode is posted, for instance. Once I tidy it up I’ll be adding it to the WordPress repository, so others can also gather useless stats about their blogs. It’s all about sharing the love.

Haloscan comments to WordPress – the nitty gritty.

As I mentioned in the previous episode, I recently had to move more than 8000 comments from my old comment system, Haloscan, and import them into WordPress. Haloscan served me well back in the day, but they are going away, and all my more recent comments are in the WordPress system anyway. Nice to have them all in one place.

The process turned out to be pretty easy. I found a script for importing comments from a different system, modified it, modified it some more, found a fundamental problem with it, fixed that, and in the end not much of code remained from the example, except the part where the WordPress logo is displayed on the screen. I assume that part came from the code the guy copied to make the code that I copied.

Along the way I learned a couple of things. PHP is a pretty flexible language, but running a loop that sets up 8500 data structures and runs 25500 database queries exposes PHP’s primary weakness: memory management. The whiz kids who invented PHP designed it for a load/compile/execute/exit-and-clean-up flow. Memory allocated during execution is cleaned up when the program is done running (usually when the Web page is delivered). When you try to do heavy lifting with PHP, you have to start paying attention to getting your memory back before the traditional clean-up time.

The code I started with did a direct database query to add the comment to the comments table, but that got things out of sync with other tables. (The posts table keeps track of the number of comments that apply to it, presumably for performance reasons.) I dug into the core WordPress code and found the method they call to post comments, and I made my code call that function. I have no idea what all the bookkeeping chores are that function does, and really I don’t care as long as they get done.

I didn’t worry about performance too much at first (after all, it only has to run once), but one of the database queries I did was really expensive (scanning all the posts for a specific set of characters). Even running on my local server it was slow, and I knew that if I tried something like that on my actual Web host alarms would go off and they’d shut me down for a while. I did a little optimization on that front, and it was enough.

The following script has some Muddle-specific code in it, but it might come in handy for others who need to move Haloscan comments to a new system. The part that parses Haloscan XML is pretty generic and would work for anyone, the part that saves the comments might be useful as a guide as well. The main difference others will have to deal with is where to get proper post_id based on the thread field in the XML. In my case I had a link in each blog episode back to the Haloscan thread.

The HTML bit in the middle of the file is not essential; but it puts a nice WordPress logo on the screen when the script starts up. I inherited that from the script I started with.

NOTE: While this script has code in it specific to me, I am available to customize it for others who need to move their code from Haloscan into another environment, or, for that matter, from any structured source into WordPress. Drop me a line!

<?php
 
if (!file_exists('../wp-config.php')) die("There doesn't seem to be a wp-config.php file. You must install WordPress before you import any comments.");
require('../wp-config.php');
 
function saveCommentToWP($comment, $dbRef, &$postThreads) {
    //echo "here's where the comment save happens <br/><br />";
    $thread = $comment['thread'];
    $postID = $postThreads[$thread];
    if (!isset($postThreads[$thread])) {
        $query = "SELECT * FROM wp_posts WHERE post_content LIKE '%".$thread."%' AND post_status='publish'";
        $postID = $dbRef->get_var($query, 0);
        $postThreads[$thread] = $postID ? $postID : 0;
        if ($postThreads[$thread] == 0)
            echo ("<br />Thread $thread has no post!");
        else
            echo "<br />Thread $thread";
        flush();       // got to have real-time updates!
    }
 
    if ($postID && $postID != 0) {
        $userId = $comment['email'] == '[email protected]' ? 1 : 0;
 
        //set up the data the way wp_insert_comment expects it.
        $wp_commentData = array();
        $wp_commentData['comment_post_ID'] = (int) $postID;
        $wp_commentData['user_id'] = (int) $userId;
        $wp_commentData['comment_parent'] = 0;
        $wp_commentData['comment_author_IP'] = $comment['ip'];
        $wp_commentData['comment_agent'] = 'Haloscan';
        $wp_commentData['comment_date'] = $comment['datetime'];
        $wp_commentData['comment_date_gmt'] = $comment['datetime'];
        $wp_commentData['comment_approved'] = '1';
        $wp_commentData['comment_content'] = $comment['text'];
        $wp_commentData['comment_author'] = $comment['name'];
        $wp_commentData['comment_author_email'] = $comment['email'];
        $wp_commentData = wp_filter_comment($wp_commentData);
 
        $comment_ID = wp_insert_comment($wp_commentData);
 
        //echo ("<strong>saved comment $comment_ID</strong>");
    }
 
    // try to reclaim some memory
    unset($wp_commentData);
    unset($comment);
}
 
header( 'Content-Type: text/html; charset=utf-8' );
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<title>WordPress &rsaquo; Import Comments from RSS</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style media="screen" type="text/css">
    body {
        font-family: Georgia, "Times New Roman", Times, serif;
        margin-left: 20%;
        margin-right: 20%;
    }
    #logo {
        margin: 0;
        padding: 0;
        background-image: url(http://wordpress.org/images/logo.png);
        background-repeat: no-repeat;
        height: 60px;
        border-bottom: 4px solid #333;
    }
    #logo a {
        display: block;
        text-decoration: none;
        text-indent: -100em;
        height: 60px;
    }
    p {
        line-height: 140%;
    }
    </style>
</head><body> 
<h1 id="logo"><a href="http://wordpress.org/">WordPress</a></h1> 
 
<?php
 
// Bring in the data
$reader = new XMLReader();
if ($reader->open('export-8.xml')) {
    $postThreads = array();
    $thread = '';
    while ($reader->read()) {
        //echo "<br />read node type: ".$reader->nodeType.';     '.$reader->name.': '.$reader->value;
        if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'thread') {
            $thread = $reader->getAttribute('id');
        }
        if ($thread) {
            if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'comment') {
                // begin building comment
                $comment = array('thread' => $thread);
                $reader->read();
                while ( !($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'comment') ) {
                    if ($reader->nodeType == XMLReader::ELEMENT) {
                        $property = $reader->name;
                        $reader->read(); // assumes text element following element tag has the data
                        $comment[$property] = $reader->value;
                    }
                    $reader->read();
                }
                saveCommentToWP($comment, $wpdb, $postThreads);
            }
        }
    }
    $reader->close();
}
 
?>
 
 
</body>
</html>

1

In with the Old

I got a message today that Haloscan is closing down. That is the service that provided refreshingly spam-free comments on my old blog. A year ago I finally abandoned iBlog for WordPress, and I’m glad I did. At the time, however, I didn’t want to tackle moving the old comments over into the new system. In my conversion I embedded a link into each of the old episodes to the legacy comment system, and left it at that.

It is fortunate I found out about Haloscan when I did. Another week and 8500 comments would have been lost forever. That’s a big part of the underlayer of this blog, the part people sink gradually into as they hang around more, and they realize that this isn’t just about me. There are some pretty interesting conversations, observations, poems, and even stories in those comments. With the timer running I set to work to get the comments out of Haloscan and into WordPress.

The move turned out to be pretty straightforward. (Simpler, perhaps, than it had been to put the links into the posts.) I’ll go into the technical details in an episode tomorrow, but for now, why don’t you pop into the archives for 2004 or so and find an old episode with good comments? Maybe you’ll find something interesting someone said once. Maybe you’ll see the name of someone you haven’t thought of in a while. Maybe you’ll see something you want to comment on, even.

Why Mazda Should Pay Me To Go On Road Trips

Actually, this episode is here to allow me to play with different gallery plugins for WordPress. There are quite a few and so what happens when you click one of the thumbnails below may change dramatically at any moment.

For test photos I went back through my archives and grabbed a few with a common theme, which turned out to be pictures of the Miata during my epic road trip. Hey, Mazda? If you’re watching, I can sell the Miata lifestyle for you, this time with a redhead in the passenger seat. The open road. The byways of North America. People. Adventure. Wind. Freedom. Marketing gold, baby.

Progress Update: A couple of the lightbox options look pretty sweet, but there are none that I found with an option to fit the images to the user’s browser window. Strange. I looked at the source code for one of them and it even uses the size of the page in some calculations. Still, I like letting people see the full-size versions of the images without leaving and having to click the back button, so some type of lightbox plugin will likely remain.

1