Converting Word Files From The Command Line

Imagine how convenient it would be if you could convert between different document formats from the command-line. You could take that word document and turn it into HTML. You could print out some HTML and have a link on your page saying "Download this as a word document". You could have your email program automatically convert Word documents into OpenOffice documents.

One application you could see is an open-source replacement of Google Docs. Right now, our rich-text editors like TinyMCE and FCKEditor do a pretty good job of doing wysywig editing of documents. The only problem with them is that they can only generate HTML. So if you want to download your document as a Word file or an OpenOffice file, then you'll have to get the HTML, fire up your heavy-duty OpenOffice application, then do "Open" and "Save As". Very labor intensive. Websites can't even count on people having this application around. Indeed, if they did, they might be using that instead of an online document manager.

So then I heard about something called eyeOS, which provides a webpage that acts like a computer desktop, with a TinyMCE-based word processor. Their pages indicated that you would be able to save your documents as Word docs.

"Oh boy," I thought. "Someone has finally solved the problem of bulk-converting between different document formats."

But I looked at it, and was disappointed. You know what it's doing? It is opening up OpenOffice and clicking on "Open" and "Save As". Seriously! It uses Xvfb to open up OpenOffice behind the scenes, then communicates with it, possibly using DCOP, to open your file and then save it.

You could not have a less efficient process unless you tried.

The thing that drives me crazy about all this -- and especially eyeOS's solution, is that we have this code, people. The secret to converting between these different document formats is not a secret at all. It is part of OpenOffice, which is under the LGPL.

So all it would take is somebody ripping out the conversion code from OpenOffice, and the internal model of documents, and jettisoning all the UI stuff that we don't need. I'm imagining a package like netpbm -- a bunch of little tools that convert from anything to OpenOffice format, and then from OpenOffice format to anything.

It'll be great. We need this. I hope to do it soon.

Your rating: None Average: 1 (1 vote)

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
                          _     _   _           
_ __ ___ _ __ ___ (_) (_) | |_ _ __
| '_ ` _ \ | '_ ` _ \ | | | | | __| | '_ \
| | | | | | | | | | | | | | | | | |_ | | | |
|_| |_| |_| |_| |_| |_| |_| _/ | \__| |_| |_|
|__/
Enter the code depicted in ASCII art style.