Migrating a static, legacy site to WordPress
I have wanted for quite a while to move the Romsey Abbey Choir website to a web-based content management system. Using Dreamweaver to edit the pages on my computer before uploading them via FTP to my Web host had long since ceased to be an optimal solution. Having created new pages, I had to create links to them, which was rather a chore. I was also maintaining an RSS news feed by manually editing code each time – even more of a chore. It would be quicker, easier and safer, I felt, to leave navigation to the CMS.
My natural choice is WordPress. It feels intuitive to use and, having now reached version 3 with a huge installed base, conveys a sense of maturity.
At the start of this year, I tried – and failed – to get on with Silverstripe, a CMS hailing from New Zealand, which I intended to use for another site. It looked promising, initially; the user interface, reminiscent of Windows Explorer seemed straightforward and I liked the way one could browse for related pages.
At the time, however, the release that would enable you to create hierarchical links (for example {site}/news/2010/{news story} – essential for any site that would be trying to publish news on a regular basis – was only in beta. More funadamentally, still, I couldn’t get its blog module – the news publishing engine, if you will – to work. WordPress did both with ease.
My biggest issue on the choir site was what to do with the legacy content. Having accumulated a large archive of some 300 stories and music lists, I didn’t want to cull it unless really necessary. I could perhaps have used the website capture facility in Adobe Acrobat to create a large and unwieldy PDF. This would have preserved it for posterity but usability and accessibility would not have been great.
I could also have moved it to an ‘archive.romseyabbeychoir.org.uk’ sub-domain of the site with a big ‘this content is no longer being maintained’ message on the home page. In this way, it would still have been available , although the new, parallel site might not rank so well on Google, one of whose algorithm factors is said to be the length of time a site has been in existence. Creating links between articles on ‘new’ and ‘legacy’ sites would also have got complicated.
The breakthrough in this endeavour came when I came across the Import HTML Pages WordPress plugin. In the light of advice from other users on the developer’s website, I tried out the plugin on a new copy of WordPress installed on my laptop with XAMPP running. It did what the developer said it would, scraping all code that appeared within a specific <div> on my pages into my WordPress database. I encountered two specific issues, although neither were the fault of the plugin itself.
Most themes follow the default WordPress position of reserving the H1 attribute for the site’s title, which appears on every page. For every post or page other than the home page, this is surely wrong: giving every page the same primary heading makes each one that little bit more difficult to distinguish. Every page on my legacy site, however, assigned the H1 to the main heading, making it unique and semantically correct. Thus I had to undertake some retrofitting.
My original content was structured:
<div id="content">
<h1>Title</h1>
<p>Content starts here....
Importing this verbatim would have meant that the H1 title would have been imported to the body of the page. The <title> attribute could have been imported as the page’s new title (encoded as an H2 by WordPress). As mine were often verbose for SEO purposes, this wouldn’t have been ideal, either.
My workaround was add an extra div to my legacy pages:
<div id="content">
<h1>Title</h1>
</div>
<div id="contentstart">
<p>Content starts here....
I then instructed the plugin to harvest <div id=”content”> as the string that WordPress would use as the title of the page and that of <div id=”contentstart”> as the body of the page. It worked, giving me a database of sensibly structured articles.
Viewing the list, however, I noticed that the chronological order of my legacy stories was completely jumbled up. As I moved pages between computers over the years, they acquired new file timestamps , which the plugin recording as the ‘post created’ date for each post during the import process. In an archive of news stories, time is the most important ordering criterion, Hence there was nothing for it but to spend several hours laboriously going through my archive to restore the ‘date modified’ timestamp on every single file back to when I first wrote the story. Bulk Rename Utilitiy proved invaluable for this.
Having done so and imported my files, though, I can report that it worked for me. With the database now live on the web server, I can now maintain the site in a much more flexible and efficient way and, thanks to the Import HTML Pages plugin, I have also been able to retained my legacy content, with all its SEO benefits. Now I just need to tag it all…
Amusingly (well, to me anyway), my archive now stretches back to 2000 – a full three years before WordPress was actually launched!