Note: For those looking to move from iBlog 2 to wordpress, this article and some follow-up can be found at the iBlog survivors’ forum. The complete script is available there for download. You really don’t have to understand all this stuff.
I started using iBlog several years ago, when it was new and I was new to blogging. It had one advantage over other blogging packages: it came free with my .mac account back in the day and it worked on .mac servers, which are, to put it kindly, inflexible.
Two things have happened in the intervening years: first, all the blogging platforms have gotten much better, including the ability to work on the blog while offline. The second is that iBlog made an abortive step forward to iBlog 2, which was a major improvement, but then the whole company stalled before that release was really finished (although by then I was fully committed to it). I will miss iBlog 2, but not as much as I will enjoy getting my stuff onto a faster, more versatile platform.
After a rather exhaustive search of blogging and CMS systems, I settled on WordPress. While it’s not perfect, it is a straightforward MySQL-Apache-php application that is easy to fiddle with, and some of the customizations I was looking for were much easier with WordPress than with others.
WordPress has a whole bunch of tools and instructions for importing your stuff from other blog systems. None of those did me much good at all, however, as iBlog was too obscure for anyone to worry about. After searching the Internet I found some helpful information, but it all applied to iBlog 1 – most people never made the move to the ill-fated upgrade. I was pretty much on my own.
WordPress can import data in a variety of formats, but it was up to me to get the data out of iBlog in a format WrodPress could understand. The most versatile format was one created by the folks at WordPress, which could include information specific to WordPress. Cool! Decision made, I was on my way.
Except… the folks at WordPress have never bothered to document the structure of their files. Apparently It’s something they’ve been meaning to get around to eventually (though the people writing translation software for the other major blogging software have long since muddled through it). I did what everyone else has had to do to export data: copy one of WordPress’s files and fiddle with it until it works. Not only is this a pain in the patoot, there might be tags that don’t appear in my examples that could nonetheless be useful to me. Oh, well.
I needed my import file to include definitions of categories, and then each of the blog entries, with correct category associations. My example file had a lot of fields that seemed redundant for my purposes, but without documentation I wasn’t going to waste time trying to figure out which tags were required and which weren’t.
Here is a very small (one episode) export file. We’ll go into the details of things like nicename later:
<rss>
<channel>
<title>Muddled Ramblings and Half-Baked Ideas</title>
<link>http://jerssoftwarehut.com/muddled</link>
<description>blog!</description>
<pubDate>Thu, 28 Jun 2007 21:32:21 +0000</pubDate>
<generator>Jers Very Clever Script</generator>
<language>en</language>
<wp:wxr_version>1.0</wp:wxr_version>
<wp:base_site_url>http://jerssoftwarehut.com/muddled</wp:base_site_url>
<wp:base_blog_url>http://jerssoftwarehut.com/muddled</wp:base_blog_url>
<wp:category>
<wp:category_nicename>bars-of-the-world-tour</wp:category_nicename>
<wp:category_parent></wp:category_parent>
<wp:posts_private>0</wp:posts_private>
<wp:links_private>0</wp:links_private>
<wp:cat_name><![CDATA[Bars of the World Tour]]></wp:cat_name>
<wp:category_description><![CDATA[blah blah blah]]></wp:category_description>
</wp:category>
<item>
<title>Delayed by Weather</title>
<link></link>
<pubDate>2007-03-27 18:23:57</pubDate>
<dc:creator><![CDATA[Jerry]]></dc:creator>
<category><![CDATA[Bars of the World Tour]]></category>
<category domain="category" nicename="bars-of-the-world-tour"><![CDATA[Bars of the World Tour]]></category>
<content:encoded><![CDATA[<p>The Weather Channel is calling the roads around here "a big mess", so I'm going to take time out from driving and catch up on some writing. Unfortunately, TWC is also calling for dangerous surf and "rough bar conditions". I'd better leave the laptop in my room.</p>]]></content:encoded>
<excerpt:encoded><![CDATA[&nbsp;]]></excerpt:encoded>
<wp:post_id>1065</wp:post_id>
<wp:post_date>2007-03-27 18:23:57</wp:post_date>
<wp:post_date_gmt>2007-03-27 18:23:57</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>Delayed by Weather</wp:post_name>
<wp:status>publish</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:post_type>post</wp:post_type>
</item>
</channel>
</rss> |
<rss>
<channel>
<title>Muddled Ramblings and Half-Baked Ideas</title>
<link>http://jerssoftwarehut.com/muddled</link>
<description>blog!</description>
<pubDate>Thu, 28 Jun 2007 21:32:21 +0000</pubDate>
<generator>Jers Very Clever Script</generator>
<language>en</language>
<wp:wxr_version>1.0</wp:wxr_version>
<wp:base_site_url>http://jerssoftwarehut.com/muddled</wp:base_site_url>
<wp:base_blog_url>http://jerssoftwarehut.com/muddled</wp:base_blog_url>
<wp:category>
<wp:category_nicename>bars-of-the-world-tour</wp:category_nicename>
<wp:category_parent></wp:category_parent>
<wp:posts_private>0</wp:posts_private>
<wp:links_private>0</wp:links_private>
<wp:cat_name><![CDATA[Bars of the World Tour]]></wp:cat_name>
<wp:category_description><![CDATA[blah blah blah]]></wp:category_description>
</wp:category>
<item>
<title>Delayed by Weather</title>
<link></link>
<pubDate>2007-03-27 18:23:57</pubDate>
<dc:creator><![CDATA[Jerry]]></dc:creator>
<category><![CDATA[Bars of the World Tour]]></category>
<category domain="category" nicename="bars-of-the-world-tour"><![CDATA[Bars of the World Tour]]></category>
<content:encoded><![CDATA[<p>The Weather Channel is calling the roads around here "a big mess", so I'm going to take time out from driving and catch up on some writing. Unfortunately, TWC is also calling for dangerous surf and "rough bar conditions". I'd better leave the laptop in my room.</p>]]></content:encoded>
<excerpt:encoded><![CDATA[&nbsp;]]></excerpt:encoded>
<wp:post_id>1065</wp:post_id>
<wp:post_date>2007-03-27 18:23:57</wp:post_date>
<wp:post_date_gmt>2007-03-27 18:23:57</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>Delayed by Weather</wp:post_name>
<wp:status>publish</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:post_type>post</wp:post_type>
</item>
</channel>
</rss>
But how to create the file? The data for iBlog 2 is distributed over (literally) thousands of files. Writing a program to track down all the information and make sense of it would be a major chore. That’s where AppleScript came in. iBlog’s programmer took the time to provide access to the iBlog data through the Apple Scripting system. I was able to let iBlog read all of its silly scattered files and make sense of them, then provide the data to me in a coherent fashion. So far, so good. All I needed to do was loop through all the episodes, pull out the data I needed, and shovel it into a text file that WordPress could read.
[IMPORTANT NOTE: I’ve tried to go back and reconstruct the scripts as they were at the appropriate stage in development, but the snippets are untested.]
[ALSO IMPORTANT: you don’t really have to understand the code. If you are in this boat, I will help you. You should understand the challenges, but I’m here for you.]
on run
set exportFile to 0
try
set exportFile to open for access “Users:JerryTi:Documents:scripts:” & niceName & “.xml” with write permission
set eof of exportFile to 0
tell application “iBlog” to set cats to the categories of the first blog
repeat with cat in cats
tell application “iBlog” to set catname to (the name of cat) as text
set niceName to the first word of catname
write rssHead to exportFile as «class utf8» — xml/rss header stuff that’s always the same
set catDescription to “blah blah blah”
— write out the category info
tell application “iBlog” to set nextText to “<wp:category>” & newLine & tab & “<wp:category_nicename>” & niceName & “</wp:category_nicename>” & newLine & tab & “<wp:category_parent></wp:category_parent>” & newLine & tab & “<wp:posts_private>0</wp:posts_private>” & newLine & tab & “<wp:links_private>0</wp:links_private>” & newLine & tab & “<wp:cat_name><![CDATA[” & catname & “]]></wp:cat_name>” & newLine & tab & “<wp:category_description><![CDATA[” & catDescription & “]]></wp:category_description>” & newLine & “</wp:category>” & newLine & newLine
write nextTex
t to exportFile as «class utf8» — have to coerce the text from 16-bit unicode
tell application “iBlog” to set ents to the entries of cat
repeat with ent in ents
— get the stuff in iBlog’s world, work with it here
tell application “iBlog”
set titl to (the title of ent)
set desc to (the summary of ent)
set bod to (the body of ent)
set postDate to the post date of ent
end tell
set nextText to (((“<item>” & newLine & tab & “<title>” & titl & “</title>” & newLine & tab & “<link></link>” & newLine & tab & “<pubDate>” & postDate) & “</pubDate>” & newLine & tab & “<dc:creator><![CDATA[Jerry]]></dc:creator>” & newLine & tab & “<category><![CDATA[” & the name of cat & “]]></category>” & newLine & tab & “<category domain=”category” nicename=”” & niceName & “”><![CDATA[” & the name of cat & “]]></category>” & newLine & tab & “<content:encoded><![CDATA[” & bod & “]]></content:encoded>” & newLine & tab & “<excerpt:encoded><![CDATA[” & desc & “]]></excerpt:encoded>” & newLine & tab & “<wp:post_id></wp:post_id>” & newLine & tab & “<wp:post_date>” & postDate) & “</wp:post_date>” & newLine & tab & “<wp:post_date_gmt>” & postDate) & “</wp:post_date_gmt>” & newLine & tab & “<wp:comment_status>open</wp:comment_status>” & newLine & tab & “<wp:ping_status>open</wp:ping_status>” & newLine & tab & “<wp:post_name>” & titl & “</wp:post_name>” & newLine & tab & “<wp:status>publish</wp:status>” & newLine & tab & “<wp:post_parent>0</wp:post_parent>” & newLine & tab & “<wp:post_type>post</wp:post_type>” & newLine & “</item>” & newLine & newLine
write nextText to exportFile as «class utf8»
end repeat
end repeat
write rssTail to exportFile as «class utf8» — xml/rss file closing stuff
on error errStr number errorNumber
if exportFile is not equal to 0 then
close access exportFile
set exportFile to 0
end if
error errStr number errorNumber
end try
if exportFile is not equal to 0 then
close access exportFile
set exportFile to 0
end if
end run
So far things are pretty simple. The script loops through the categories, and in each category it pulls out all the episodes. Only it kept stalling. It turns out that sometimes iBlog took so long to respond that the script gave up waiting. I added
with timeout of 600 seconds
at the start to make the script wait a full ten minutes for iBlog to respond. Yes, iBlog certainly is no jackrabbit of a program.
Now the program ran! The only problem is, the resulting file doesn’t work. Hm. The first thing the importer reports is that it can’t read the dates the way AppleScript formats them. So, I added a function to reformat all the dates to match the example. Then it was importing categories, but not items. Why not?
Um… actually I don’t remember the answer to that one. Let’s just say that it took a lot of fiddling and testing to get it right. Eventually, hurrah! There in my WordPress installation were episodes from iBlog.
And they looked like crap. The thing is, that iBlog included unnecessary HTML tags around the blog title, excerpt, and body. It’s going to be a lot easier to clean them up now, while we’re mucking with each bit of text anyway, so back to AppleScript’s lousy string functions we go to clean up iBlog’s mess. Now, after we get all the data from iBlog, we call a series of functions to clean it all up:
set titl to stripParagraphTags(titl)
set desc to stripParagraphTags(desc)
set postDate to formatDate(postDate)
set bod to fixBlogBodyText(bod, postDate)
The actual functions are available in the attached final script.
Things are looking better, but still not very good. Much of this is due to some junk iBlog did when converting my older episodes into iBlog 2 format. One thing it did was to insert hard line breaks in the text of the blog body. No idea why. Maybe they were there all along and I had no way to see them. WordPress helpfully assumes that if you have a line break in the data it imports, you want a line break when it shows on the screen. So, every line break is replaced by a <br /> tag when imported into WordPress. This will not do. Additionally, iBlog replaced paragraph breaks </p><p> with a pair of break tags: <br /><br />. Once again, the reason for this is a mystery. The latter issue is less important, but we may as well address it while the hood is up.
Back we go into the fixBlogBodyText function, to repair more silly iBlog formatting. The resulting function looks like this:
on fixBlogBodyText(s, postDate)
— this assumes that if an episode is supposed to start with a div, it will have a style or class
if (the offset of “<div>” in s) is equal to 1 then
set s to text 6 thru (the (length of s) – 6) of s
— in some cases there was an extra line feed at the end of the text as well
if the last character of s is “<” then
set s to text 1 thru (the (length of s) – 1) of s
end if
set s to “<p>” & s & “</p>”
end if
— clean up iBlog junk (lots of this stuff is the result of upgrading to iBlog 2 – the conversion was not clean
— replace all line breaks with spaces
set s to replaceAll(s, “
“, ” “)
— replace all double-break tags with paragraph tags
set s to replaceAll(s, “<br /><br />”, “</p>” & newLine & “<p>”)
— replace all old-fashioned double-break tags with paragraph tags
set s to replaceAll(s, “<br><br>”, “</p>” & newLine & “<p>”)
— get rid of some pointless span class info
set s to replaceAll(s, ” class=”Apple-style-span””, “”)
return s
end fixBlogBodyText
note: replaceAll is a utility function I wrote that does pretty much what it says. You will find it in the attached source file. newLine is a variable I defined because left to it’s own devices AppleScript uses the obsolete Mac OS 9 line endings. What’s up with that?
At this point the text is importing mostly nicely. But wait! I was running my tests just working with one category to save time. When I looked at Allison in Anime on WordPress, some really weird things started happening. It turns out that when importing the data, you need line breaks every now and then, otherwise the importer will insert them. That would be nice to put in the documentation somewhere! In one of my episodes, the newline was inserted right in the middle of a <div> tag, which led to all kinds of trouble. So, to the above script I added a line that inserts a line break between </p><p> tags. As long as any one paragraph isn’t too long, I’ll be all right.
set s to replaceAll(s, “</p><p>”, “</p>” & newLine & “<p>”)
And with that, we’ve done it! We’ve written a script that will export all the data from iBlog 2 and format it in a way that WordPress can accept. Time to run it on the whole blog, go take a little break, and come back and see how things went…
Dang. Didn’t work. There’s a maximum file size for import, and my blog is too damn big. Not a huge problem, just a bit of modification to make each category a separate file. Now, at last, the data is imported, the text looks nice, and we’re ready to make the move to our new home.
Except…
The images don’t show up, and links between episodes are broken. Also, it would be nice if people could still read the old Haloscan comments. I guess we’re not done yet.
Image links were the easiest to repair. In iBlog 2 the source code always looks for the image at path /https://muddledramblings.com/wp-content/uploads/iblog/. We just have to find those links and replace them with new info. I used Automator to find all the image files in the iBlog data folders, then I copied them all up to a directory on the WordPress server, and pointed all the links there. Worked like a charm! (Icerabbit goes into more detail on that process here. I used different tools, but the process is the same.)
Links between episodes turned out to be a lot trickier. It came down to this: How do I know what the URL of the episode is going to be when I load it into WordPress? I had to either know what the episode’s id was going to be, or I had to know what its nicename was going to be.
Nicename is a modified title that can be used in URL’s – no spaces and whatnot. “Rumblings from the Secret Labs” becomes “rumblings-from-the-secret-labs”. If I set up wordpress to use the nicename to link to an episode rather than the ID number, it would have some advantages, but I can get long-winded (have you noticed?) and that applies to my episode titles as well. The URL’s for my episodes could get really long. Therefore, I’d rather use the episode’s ID for its permalink. (If you try the icerabbit link above, you will see the nicename version of a link.)
Happily, the import file format allows me to specify the id of episodes I upload. (I don’t know what it does if there’s already an episode with that ID.) After some fiddling I managed to specify reliably what ID to give each episode. Now in my script I make a big table with the iBlog paths to each episode and the ID I will assign it. Before the main loop I have another that builds the table:
— first loop
set postID to firstPostID
set idTableRef to a reference to episodeIDTable
tell application “iBlog” to set cats to the categories of the first blog
repeat with cat in cats
—set cat to item 1 of cats
tell application “iBlog” to set catFolderName to the folder name of cat
—display dialog catFolderName
copy {catFolderName, -1} to the end of idTableRef
tell application “iBlog” to set ents to the entries of cat
repeat with ent in ents
tell application “iBlog” to set episodeFolderName to the folder name of ent
set episodePath to catFolderName & “/” & episodeFolderName
copy {episodePath, postID} to the end of idTableRef
set postID to postID + 1
end repeat
end repeat
Now it’s possible to look up the id of any episode, and build the new link. The lookup code is in the attached script, and also handles the special cases of linking to a category page and to the main page. For category pages, I just hand-built a table of the category ID’s I needed based on previous import tests.
Finally, there is the task of preserving the links to the old comment system. Happily, those Haloscan comments are also connected based on the file path of the episode. (Though it looks like really old comments are not accessible, anyway, which is a bummer.)
In the main loop, after the body text has been cleaned up, tack the link to Haloscan on the end, complete with hooks to allow CSS formatting:
set bod to bod & newLine & newLine & “<div class=”jsOldCommentBlock”><span>Legacy Comment System:</span> <a href=”javascript:HaloScan(‘” & entFolder & “‘);”><script type=”text/javascript”>postCount(‘” & entFolder & “‘); </script></a></div>”
Not mentioned above are functions for logging errors and a few other utililties that are in the main script file. They should be pretty obvious. The script includes code that is specific to issues I encountered, but it should be a good start for anyone who wants to export iBlog 2 data for import into another system. It SHOULD be safe to execute on your iBlog data; it doesn’t change anything on the iBlog side of things. I don’t know if there’s anyone else in the world even using iBlog 2 anymore, but if you would like help with this script, let me know.
Sharing improves humanity: