Did you know that when you save a file in Word 2007’s default .docx format, what you are actually saving is a compressed (or zipped) file containing several different XML documents. The earlier versions of word (.doc) were in binary format. The .docx documents tend to be much smaller than their older counterparts.
Surprisingly, to get under the hood of a .docx document you don’t need to have Word 2007 at all. All you need to do is change the .docx extension to .zip. Better make a copy of the file before you meddle with it, in case something goes wrong.
Now you can open the zip file by double clicking on it if you are using Windows XP or you can unzip it with a tool like Winzip. You would find that there are several .xml files inside the folder. The screenshot below shows the contents of a sample .docx file containing an image.
To send someone the document without the image, merely delete that part of the package without even opening the file in Word. Whenever you open a .docx file that has been renamed from .zip with one of its components, for example an image, removed, Word would try to repair the file, and in the process would put a placeholder in place of the deleted image. You can remove the placeholder by merely double-clicking on it.
If you want to actually edit the text of the document, open only the document.xml component in a text editor like notepad or wordpad and make the changes. Being an xml file it might take you some time to find the text and also understand the structure. If the file contains comments that you want to remove, strip them out by deleting the comments.xml component. Other elements are styles.xml which holds the documents style definitions; headers.xml which has section heads (listed as Header 1, Header 2 and so on) and theme1.xml which hosts any templates used to style the document. Document.xml.rels has the instructions for reassembling the components into the complete document.
So as you can see, in theory, you can edit quite a bit of a .docx word file even without having Word 2007 installed.
If you're new here, you may want to subscribe to my RSS feed or get updates through email. Thanks for visiting!







































March 2nd, 2008 at 11:41 am
Mr. Suresh’s contents are really good and technical and to the point.