% fortune -ae paul murphy

Word Fulminations

The last two weeks have been unusual for me in that I've gone from gently brushing past the world of Microsoft Word users to actually having to face some of the effects people in that world live with.

In the first instance I submitted an essay to an editor limited to an all Microsoft environment - basically he says that if it's not submitted in Word format they can neither read it nor use it.

Word, I'm told, imports an HTML document more or less correctly, but he couldn't use that facility - apparently because it doesn't convert an HTML input document to the usual internal format and therefore the rest of his publishing pipeline can't work with it.

Since retyping the tables using OpenOffice and saving it as a Word 97 file seemed to work for a Mac user I know with the latest Microsoft Office product, but didn't work for him, he got impatient and I got frustrated - and that's where matters would have been left except for my second encounter with Word that week.

What happened there was that a colleague (let's call him "John", below) charged with the job of bringing several hundred suggestions together into a single coherent document made a complete mess of it, saw the deadline looming, and ducked by passing the problem to me.

He's a Word 10 user and probably two thirds of his input documents came from other Word users - but his strategy of reading them one at a time into a larger document had produced a mess containing every kind of typographical obscenity you can imagine: from bullets inside sentences to multiple font and format switches within sections. Worse, Word reported the resulting document as having over 300 pages but a lot of content seemed to be missing while other stuff was duplicated - often many times.

So I threw away the document he'd spent hours laboring over, used OpenOffice to convert each input document to text, and converted that to both FrameMaker and HTML. The whole thing took about three hours and produced both a web ready version and a print ready, 32 page, two color, PDF.

Since you'd think his approach sensible - i.e. since you'd assume Microsoft would make it easy for its users to upgrade documents as each new release gets rolled out - I spent some time trying to find out why his approach didn't work.

I still don't know the answer to that, but do have a hypothesis. Using "strings" to look through key parts of the pre docx set showed that most (but not all) of the text either duplicated in, or missing from, his file came from originals saved using Word and containing licensing, labeling, reference files, and/or settings from at least two previous Word generations licensed to the same person or organization.

What this suggests to me is that the Word 10 release John was using did a good job of converting generation N word files, but encountered problems if the particular generation N instance used to save a file had been installed to perpetuate documents, settings, and/or preferences from generation N-1.

With that guess in mind I went back and looked at my editor friend's public site: made with Movable Type, apparently last upgraded to 3.2 (released circa 2005), integrated into a Windows/XP based work flow built around Office 2003. So, now of course, I think I can guess what's going on - but the implication is that sticking "to what works for us" in the face of change in the environment around him is costing him both access to better authors ( :) ) and many, many, hours of pointless drudgery each week.

More generally, just how much of this stuff - like the hours John spent clicking and squinting to try to get that document together - just invisibly goes on around us as people who mostly don't talk to us IT folks quietly put in time to cope with the temporally fractured technological environment we've given them?

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.