A blog about .NET, Graffiti, Community Server, and Kevin's life

New .Text to CS 2007 Blogs migration tool available

There seems to be a small resurgence in migrating .Text to Community Server after the CS 2007 release. For those of you that are still on .Text and want to upgrade to CS 2007, I've released a new migration tool to help you.

I've also been thinking lately about how to handle migration, or content import/export, in future versions of Community Server blogs. This is something I've been working on for a while, and I think I finally have a good plan. But first, a little recap on how I got there.

The Past

Back in the day of Community Server 1.0 there were many .Text bloggers who wanted to upgrade to CS. Several people worked on solutions for this, most of which did a direct database conversion using scripts or DTS. I was one of those bloggers running .Text who really wanted to try out this spiffy new Community Server platform if only I could find a way to migrate my content. Partly as a way to get familiar with it, I decided to write my own simple conversion utility to migrate content from .Text to CS 1.0 using the Community Server API and a few custom sprocs.

As more people used it and submitted feature requests and bug reports, my simple migration tool grew and grew into a quite large wizardish WinForm UI. I updated it to support CS 1.1, but that was the final supported version. During the CS 2.0 development, I decided to throw that away and start over on grand new suite of Import/Export WinForm utilities. The biggest reason was that I wanted to support migration from other blog platforms besides .Text, which would have been difficult to wedge into the original utility.

My goal was to create an extendable XML format (later dubbed Community Server eXtended Atom - CSXA) that would completely describe CS weblogs and weblog data. You would be able to export and import these CSXA files from/to Community Server, and export data from other platforms like .Text, Movable Type, WordPress, etc. to CSXA so you could then import it into CS. There would be a seperate WinForm app for each step rather than one big monolithic one. I decided to base the XML format on the Atom 1.0 spec since that already contained common blog data fields, was well defined (unlike RSS), and provided an easy way to extend it using XML namespaces.

Unfortunately the proceeding project then turned into a case study of how not to do software project in your spare time. First to "save time" I downloaded the incomplete source to the Atom.NET project on SourceForge (which has been abandoned for several years), and dove headfirst into finishing/fixing it to use as a base for my Atom file rendering and parsing. The deeper I got into that, the more I realized how bad it was, and how much still needed to be done to get it working with the 1.0 spec. Instead of throwing it away and writing my own, which in hindsight would have saved a huge amount of time and headaches, I kept plowing ahead.

By last spring I had three relatively reliable working pieces that I used for a few large conversion projects internally. These were the .Text to CSXA exporter, a MTImport file to CSXA exporter, and a CSXA to Community Server 2.1 importer. But I still had to create nice polished WinForm UI's for each piece before they could be released publicly, which is the part that I suck at the most. Unfortunately this also coincided with me getting an XBox 360, so the UI work dragged on very very slowly over the past year. :P

The Present

But to help those who are trying to migrate from .Text to Community Server right now, I decided to release much of my existing WinForms utilities "as is." They have been updated to work with Community Server 2007 SP1 (only) and have had a small amount of testing but please keep in mind they are rough around the edges and may still have bugs - especially in the UI. Use at your own risk, and always test first with a backup database.

You can find the zip file in the downloads section of this site. It includes the .Text-to-CSXA exporter and CSXA-to-CS2007 importer together in the same folder with all the binaries and config files you need already there. A simple set of instructions is included as well, and you can email me if you have questions after reading them.

The Future

There are two big goals I have for implementing an import/export system in future versions of Community Server.

  1. Be web based. I now believe WinForm utils are a suboptimal choice when working with web application data, and make it hard to update and expand going forward.
  2. Be flexible and allow the community to write "plug-ins" so that CS blogs can import/export any file format or API.

I think #2 is especially relevant as there are now several good formats and APIs that can be used to migrate or backup blog data:

  • The CSXA format I mentioned earlier, which can fully describe CS blog data.
  • BlogML project has grown quite popular and has been used by many people to migrate posts between platforms. The accomplishments of BlogML have been impressive so far. The limitation is that it does not support every type of CS post data, including a way to preserve PostIDs.
  • Automatic recently decided to add very similar functionality to WordPress, but based their format off of the RSS 2.0 and called it WordPress eXtended Rss (WXR). I had previously been calling the CS Atom-based format "Atom+CS", but decided to change it to CSXA as I really like the eXtended nomenclature and it works much better as an extension name.
  • The Atom Publishing Protocol is used by Blogger to export content, and will likely be used by more and more blog platforms once the spec gets locked down.
  • The MTImport format is outdated and very limited, but is supported by several of the most popular blogging platforms.

With a dynamic plug-in based system, similar to the to the CS Spam Rules, you could add Import/Export plug-ins to your site as needed depending on what formats you are interested in using. I'll likely work on this at first as a prototype add-on for CS 2007, with the possibility of it being included in a future CS version.

» Similar Posts

  1. Live Blogging the CSDC - Part 2
  2. Live Blogging the CSDC - Part 3
  3. Announcing the DotText-CS-Converter tool

» Trackbacks & Pingbacks

  1. blog bits Bill "The Bruiser" Bosacker lists all of the cool features of his new CS2007 Event

    Dave Burke — June 3, 2007 10:34 PM
  2. Kevin Harder releases a new .Text to CS2007 blogs migration tool. Kevin also contemplates future import

    Dave Burke's Community Server Bits — June 3, 2007 10:47 PM
  3. Pingback from  New .Text to CS 2007 Blogs migration tool available - Kevin Harder

  4. The migration from .Text to CS 2007 is finally complete! All content, comments, trackbacks etc. have

    Kevin Gearing's Blog — July 10, 2007 1:56 PM

» Comments

  1. Robert McLaws avatar

    Ugh, Kevin... why would you create ANOTHER format? That's just as bad a WXR. Why reinvent the wheen when BlogML already has everything you need?

    Robert McLaws — June 2, 2007 1:01 PM
  2. Robert McLaws avatar

    I meant "why reinvent the wheel", but 'wheel' and 'when' somehow mached together inside my brain.

    I was pretty certain that BlogML lets you describe post data. But seriously, you should have spent your time trying to make BlogML better, instead of inventing a new format.

    And the reason BlogML doesn't export full CS data is because there are few blog engines that have the full functionality of CS.

    Seriously though, if you're talking about bringing import/export abilities into the CS platform, you should consider bringing your work to BlogML instead of building your own format, because BlogML already has a lot more traction, and it's made up of dedicated CS users. Besides that, there are already enough export formats in the marketplace... do we really need one more?

    I'm not really going to expect you to though, because you did the same thing last time when you built your own .Text-to-CS converter instead of working with Jayson and me on making ours better.

    Robert McLaws — June 2, 2007 6:34 PM
  3. Mitch Denny avatar

    Hi there,

    I successfully built a converter a while ago when I moved from .Text to WordPress. I exported directly from the DotText database in BlogML format, and then transformed that into WXR format.

    Interestingly the main issues I had were dates (inconsistent handling by WordPress) and word press not being able to handle XML files with more than 20 posts in them.

    Mitch Denny — June 2, 2007 8:58 PM
  4. Kevin avatar

    Hi Robert,

    I probably wasn’t clear enough in my long winded ramblings. I am not just deciding to invent a new format instead of choosing to improve BlogML. The csxa format that I had worked on already exists, and I’ve been using it internally to migrate blogs for some time. What is unfinished (and I let slip for way to long) was the WinForm UI work around that. Back in the fall of 2005, when we were working on CS 2.0, I decided to go the route of extending the Atom format to fully describe CS blog data or importing and exporting rather than continue to improve upon the .Text-to-CS importer tool.

    At that time, BlogML was just getting started, and it was having a hard time getting traction with major blog platform developers for the exact same reason that you wrote about. Darren was asked "why are you reinventing the wheel" creating yet another blog syndication format, when there are already two very widely implemented and extensible formats: RSS and Atom.  Why not just extend one of them to meet your purposes?

    But BlogML continued and seems to be  under pretty active development. I'm glad because I think it is very useful as a "lowest common denominator" format that lets you migrate your core data between very different blog platforms.  Every blog platform has different types of data beyond the basic post/comment ones, and so it’s natural that there will be different formats/versions targeted at these different platforms. So I don’t view the WXR format as bad at all, I think it has a valid reason to exist (fully describing WP post data) and is based on a widely used format that many tools can already work with instead of creating one from scratch.  Similarly, the CSXA format could be used when moving your data from one CS site to another, or for backing up your data.

    I have accepted that there are multiple formats in use and that the numbers won’t diminish (and could certainly grow) in the future. Which is exactly why I now want to concentrate on making it easier for CS users to work with ANY of them, rather than pushing a single one. A plug-in type system would work great for this, as independent developers could create or improve on them as needed.  So a CS blogger could easily work with whatever format is optimal for what he/she wants to accomplish – which could be using BlogML to move to/from DasBlog/SubText, or WXR for WordPress, or MTImport from MovableType/TypePad, or AtomPP for Blogger, or CSXA for CS, etc.  

    Kevin — June 2, 2007 11:13 PM
  5. Robert McLaws avatar

    But a pluggable system adds another layer of complexity to the problem of exporting your data, when people are looking to make it simpler, not harder. I have people complaining that WordPress.com have no way of using my WordPress exporter, so they have to WXR their posts down to a local install before they can run the conversion, then they have to download and run BlogML to get things imported to their new site. Yous system only complicates that by making the destination have your system installed in addition to BlogML if they target BlogML.

    And actually, WXR doesn't "fully-describe" WP post date. There is actually a bunch of stuff in the WordPress database that it totally overlooks, which is why my WordPress-to-BlogML converter is a lot richer than just doing a XSLT transform on a WXR file.

    As a Telligent employee, it just would have been nice if you worked with the BlogML project, which is made up of CS MVPs and dedicated CS users, rather than competing against them by building your own. And why is "active development" a bad thing?

    Our project could have benefitted a lot with your help. The market has enough formats already, and your efforts to split the conversion market don't make *my* efforts to get WordPress to replace WXR with BlogML any easier.

    Robert McLaws — June 3, 2007 1:23 AM
  6. Kevin avatar

    I think you have it backwards regarding the extra layer of complexity. I want to make it easier for bloggers to switch to CS regardless of what platform/host they are on now, not more difficult.

    Do you really think that Automatic would abandon their WXR format for BlogML anytime in the near future after just investing the time/resources to create and publicize WXR?  I have no doubt they will continue to improve their own format.  And I can't see MT or TypePad ditching MTImport for BlogML anytime soon either.  I mean, sure it would make things easier if everyone did that, but I'm trying to be realistic here and place the needs of CS users over politics.

    Lets say a blogger on WordPress.com wants to switch to CS. Which would be simpler for the new CS user -  exporting to WXR and then importing that directly into his CS site?  Or having to install a new instance of WordPress, import WXR into that, and then export as BlogML.  THAT is the extra layer of complexity.

    If BlogML describes the data of other blog platforms better than their vender-supported format, that's great!  So someone who is self hosting a WP install could choose to use BlogML to migrate to CS instead of WXR.  But today, with current tools, most bloggers  don't have the option of directly exporting to BlogML.

    I do want to work with you guys to make the best possible BlogML importer (and exporter) for CS.  But I don't want to pretend that BlogML is the only way to migrate blog data and make CS users jump through hoops to convert/transform other formats into BlogML.

    I also think this is standard industry practices here. Most new products or web 2.0 sites let you import data from *multiple* sources/formats because they want it to be as easy as possible for people to join.

    Kevin — June 3, 2007 10:12 AM

Comments are closed