DocTrain West 2009: Joe Gollner’s keynote “Knowledge Archaeology: Raiders of the Lost Art”

To open DocTrain West, Joe Gollner delivered an insightful and concise opening keynote address about converting unstructured legacy documentation to structured content.

First, he reviewed types of content: “Opaque” content, such as paper copies, is not processable. “Annoying” content insists on unwieldy proprietary formats. “Polluted” content is corrupted or mixes formats. And “tolerable” content comes along as HTML, a Word document or something that is more or less manageable.

For these types of content, there are various strategies to convert it: You can do it manually, get a tool to do it, or outsource it. His best practice in a nutshell is to stay flexible and open to find the best possible mix of tools, specialist help and automation.

Spelled out as a process, it looks something like this:

  1. Decide on the legacy sources and the target schema to convert.
  2. Analyze your sources carefully (and possibly clean them up where necessary).
  3. Map sources to the target schema.
  4. Establish conversion rules (and the gaps to fill by manual editing).
  5. Perform the actual conversion.
  6. If possible and desired, add necessary and useful metadata, links and connections to topics.
  7. Check the converted contents for accuracy, consistency and completeness (according to initial scope).

He also pointed out a few caveats:

  • Do not to underestimate the complexity of the conversion process.
  • Focus on the conversion purpose and business case, because neither structured content nor conversion can be an end in itself.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: