In my old days working on a corporate portal, I occasionally had to take someone's HTML, which was generated by Microsoft Word, and drop it into an entry on a CMS. But before I could just cut and paste, I had to run a series of regular expression replacements in vi or Allaire HomeSite to strip out any of the god awful HTML that Word generated. To pay homage to the past, fate recently dropped a Word HTML file in my lap. I was surprised to see how much cleaner the HTML was, but it was still pretty awful and full of excessive markup cruft.

Doing a search today for Word HTML cleanup turns up Textism's excellent Word Cleaner service, which allows you to upload a Word HTML file, and then spits out cleaned HTML ready for you to use in your blog or CMS. It worked out perfectly for me. I only wish this were around 10 years ago!

http://textism.com/wordcleaner/