Haloscan comments to WordPress – the nitty gritty.

As I mentioned in the previous episode, I recently had to move more than 8000 comments from my old comment system, Haloscan, and import them into WordPress. Haloscan served me well back in the day, but they are going away, and all my more recent comments are in the WordPress system anyway. Nice to have them all in one place.

The process turned out to be pretty easy. I found a script for importing comments from a different system, modified it, modified it some more, found a fundamental problem with it, fixed that, and in the end not much of code remained from the example, except the part where the WordPress logo is displayed on the screen. I assume that part came from the code the guy copied to make the code that I copied.

Along the way I learned a couple of things. PHP is a pretty flexible language, but running a loop that sets up 8500 data structures and runs 25500 database queries exposes PHP’s primary weakness: memory management. The whiz kids who invented PHP designed it for a load/compile/execute/exit-and-clean-up flow. Memory allocated during execution is cleaned up when the program is done running (usually when the Web page is delivered). When you try to do heavy lifting with PHP, you have to start paying attention to getting your memory back before the traditional clean-up time.

The code I started with did a direct database query to add the comment to the comments table, but that got things out of sync with other tables. (The posts table keeps track of the number of comments that apply to it, presumably for performance reasons.) I dug into the core WordPress code and found the method they call to post comments, and I made my code call that function. I have no idea what all the bookkeeping chores are that function does, and really I don’t care as long as they get done.

I didn’t worry about performance too much at first (after all, it only has to run once), but one of the database queries I did was really expensive (scanning all the posts for a specific set of characters). Even running on my local server it was slow, and I knew that if I tried something like that on my actual Web host alarms would go off and they’d shut me down for a while. I did a little optimization on that front, and it was enough.

The following script has some Muddle-specific code in it, but it might come in handy for others who need to move Haloscan comments to a new system. The part that parses Haloscan XML is pretty generic and would work for anyone, the part that saves the comments might be useful as a guide as well. The main difference others will have to deal with is where to get proper post_id based on the thread field in the XML. In my case I had a link in each blog episode back to the Haloscan thread.

The HTML bit in the middle of the file is not essential; but it puts a nice WordPress logo on the screen when the script starts up. I inherited that from the script I started with.

NOTE: While this script has code in it specific to me, I am available to customize it for others who need to move their code from Haloscan into another environment, or, for that matter, from any structured source into WordPress. Drop me a line!

<?php
 
if (!file_exists('../wp-config.php')) die("There doesn't seem to be a wp-config.php file. You must install WordPress before you import any comments.");
require('../wp-config.php');
 
function saveCommentToWP($comment, $dbRef, &$postThreads) {
    //echo "here's where the comment save happens <br/><br />";
    $thread = $comment['thread'];
    $postID = $postThreads[$thread];
    if (!isset($postThreads[$thread])) {
        $query = "SELECT * FROM wp_posts WHERE post_content LIKE '%".$thread."%' AND post_status='publish'";
        $postID = $dbRef->get_var($query, 0);
        $postThreads[$thread] = $postID ? $postID : 0;
        if ($postThreads[$thread] == 0)
            echo ("<br />Thread $thread has no post!");
        else
            echo "<br />Thread $thread";
        flush();       // got to have real-time updates!
    }
 
    if ($postID && $postID != 0) {
        $userId = $comment['email'] == '[email protected]' ? 1 : 0;
 
        //set up the data the way wp_insert_comment expects it.
        $wp_commentData = array();
        $wp_commentData['comment_post_ID'] = (int) $postID;
        $wp_commentData['user_id'] = (int) $userId;
        $wp_commentData['comment_parent'] = 0;
        $wp_commentData['comment_author_IP'] = $comment['ip'];
        $wp_commentData['comment_agent'] = 'Haloscan';
        $wp_commentData['comment_date'] = $comment['datetime'];
        $wp_commentData['comment_date_gmt'] = $comment['datetime'];
        $wp_commentData['comment_approved'] = '1';
        $wp_commentData['comment_content'] = $comment['text'];
        $wp_commentData['comment_author'] = $comment['name'];
        $wp_commentData['comment_author_email'] = $comment['email'];
        $wp_commentData = wp_filter_comment($wp_commentData);
 
        $comment_ID = wp_insert_comment($wp_commentData);
 
        //echo ("<strong>saved comment $comment_ID</strong>");
    }
 
    // try to reclaim some memory
    unset($wp_commentData);
    unset($comment);
}
 
header( 'Content-Type: text/html; charset=utf-8' );
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<title>WordPress &rsaquo; Import Comments from RSS</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style media="screen" type="text/css">
    body {
        font-family: Georgia, "Times New Roman", Times, serif;
        margin-left: 20%;
        margin-right: 20%;
    }
    #logo {
        margin: 0;
        padding: 0;
        background-image: url(http://wordpress.org/images/logo.png);
        background-repeat: no-repeat;
        height: 60px;
        border-bottom: 4px solid #333;
    }
    #logo a {
        display: block;
        text-decoration: none;
        text-indent: -100em;
        height: 60px;
    }
    p {
        line-height: 140%;
    }
    </style>
</head><body> 
<h1 id="logo"><a href="http://wordpress.org/">WordPress</a></h1> 
 
<?php
 
// Bring in the data
$reader = new XMLReader();
if ($reader->open('export-8.xml')) {
    $postThreads = array();
    $thread = '';
    while ($reader->read()) {
        //echo "<br />read node type: ".$reader->nodeType.';     '.$reader->name.': '.$reader->value;
        if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'thread') {
            $thread = $reader->getAttribute('id');
        }
        if ($thread) {
            if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'comment') {
                // begin building comment
                $comment = array('thread' => $thread);
                $reader->read();
                while ( !($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'comment') ) {
                    if ($reader->nodeType == XMLReader::ELEMENT) {
                        $property = $reader->name;
                        $reader->read(); // assumes text element following element tag has the data
                        $comment[$property] = $reader->value;
                    }
                    $reader->read();
                }
                saveCommentToWP($comment, $wpdb, $postThreads);
            }
        }
    }
    $reader->close();
}
 
?>
 
 
</body>
</html>

3

6 thoughts on “Haloscan comments to WordPress – the nitty gritty.

  1. This is probably too geeky for Bonnie at Frogma, who is lamenting that she has been unable to move her five years’ worth of Haloscan comments over to Blogger. When you offered to assist people in implementing the transfer, did you mean anyone, or just anyone who already has somewhat of a handle on the geeky stuff?

    • I would be happy to help. First thing is for her to go to Haloscan and export her comments before the server shuts down. Once those are safely on her system and backed up somewhere else as well, then getting them into blogger will be pretty straightforward (I think). As with mine, the tricky part is getting them connected to the right blogger posts, but if her blogger posts already have links to her haloscan comments, that won’t be too hard.

  2. Hi Jerry,
    I have all the comments from a (category of )WP.mu blog in an .xml file. and I have your script. The only thing I could figure out to change was the file name, and email address in your script. I’m not exactly seeing anything like comments show up. Heh. Any tips?

    Thanks,
    Cathy

    • if your comments were exported from WordPress, I don’t think you need a script to import them into a different WP install. This script won’t do the trick in any case; the way it figures out which post gets which comment thread is pretty specific. It’s probably possible to alter it to work for your case, and I’d be happy to help, but there might be a better answer out there already.

    • I do the shadows in two different ways on this site – the main blog section and the sidebar use graphcs, and looks fine as long as .png alpha channel support is there, which is just about all browsers these days. The comments section uses CSS3 shadows and rounded corners. As long as one accepts that not all users will see the site in all its glory, then there isn’t much downside to the CSS approach, though Safari and Firefox do the shadows subtly differently.

Leave a Reply to Jerry Cancel reply

Your email address will not be published. Required fields are marked *