A law firm sends a settlement proposal to opposing counsel. The document looks clean — no visible comments, no tracked changes, neatly formatted. But the opposing lawyer downloads the file, unzips it (because every DOCX file is just a ZIP archive), opens the XML inside, and finds the complete revision history: every draft, every deleted paragraph, every internal comment, and the names of every person who touched the document. This isn’t a hypothetical. Metadata leaks from legal documents are well-documented and have led to bar disciplinary actions, blown negotiations, and malpractice claims.
A Microsoft Word document with the .docx extension isn’t a single file — it’s a ZIP archive containing a collection of XML files. You can verify this yourself: rename any .docx file to .zip, extract it, and browse the contents. Inside you’ll find the document content, style definitions, relationships, and critically, several metadata files that most users never think about.
core.xml contains the document’s core properties: creator name, last modified by, creation date, modification date, revision number, and the total editing time in minutes. If three people collaborated on a document over two weeks, all of their names and the exact timestamps of their contributions are recorded here.
app.xml contains application properties: the software that created the document (including version number), the template used, the company name (pulled from your Office installation or Active Directory), page count, word count, and paragraph count.
custom.xml contains any custom document properties — classification levels, department names, project codes, or any other custom metadata your organization’s templates inject automatically.
comments.xml and people.xml contain every comment ever added to the document and a list of every person who commented, including their full names and in some cases email addresses. Even comments that appear deleted in Word may persist in the XML.
And then there’s the most dangerous one of all.
Word’s Track Changes feature is essential for collaborative editing. But it creates a permanent record of every edit — including deletions. When you delete a sentence with Track Changes on, Word doesn’t actually remove the text. It wraps it in a <w:del> XML tag and marks it as deleted, but the full original text remains in the file.
Here’s what most people miss: clicking “Accept All Changes” in Word cleans the visible document, but depending on how and when it was done, remnants of the revision history can persist in the XML. Third-party document inspection tools sometimes find revision data that Word’s own “Check for Issues” inspector missed.
The classic scenario: a lawyer drafts a contract with an initial offer of $2.4 million. The client decides to negotiate down to $1.8 million. The lawyer edits the figure, accepts all changes, and sends the document. But the revision history — complete with the original $2.4 million figure — may still be recoverable from the file’s XML.
This isn’t limited to financial figures. Internal strategy notes (“should we disclose the Q3 shortfall?”), alternative language that was considered and rejected, and commentary from colleagues about the document’s content can all survive in tracked change data.
Try MetaStrip — it's free
Strip metadata from any photo in seconds. No upload, no account.
One of the most overlooked sources of metadata contamination is templates. When you create a document from a template, the new document inherits the template’s metadata — including the original author and company who created the template.
This creates bizarre situations where a document prepared by Firm A carries metadata identifying Firm B as the creator, because Firm A’s template was originally built by someone at Firm B years ago. It happens constantly in legal and consulting environments where templates get shared, copied, and repurposed across organizations.
The template name itself can also be revealing. A document created from Legal_Brief_Litigation_Template_v4.dotx tells the recipient something about your workflow and practice area before they’ve even read the content.
Legal professionals face the most acute risk. The American Bar Association has published multiple ethics opinions addressing lawyers’ duty to remove metadata before sharing documents with opposing parties. Many state bar associations have issued similar guidance. Several courts have held that inadvertently disclosed metadata can constitute waiver of privilege.
Corporate teams sharing proposals, contracts, and reports externally risk exposing internal author names, department structures, editing timelines, and the total time invested in preparing a document. A client receiving a proposal that shows 14 hours of total editing time and 47 revisions may form different expectations about pricing than one that appears freshly prepared.
HR departments sending employment documents, offer letters, or termination notices carry particular risk. A rejection letter that contains revision history showing the position was originally offered to the candidate before being rescinded tells a very different story than the final document alone.
Government agencies and regulated industries face compliance requirements around document metadata. GDPR treats author names and other personally identifying metadata as personal data, meaning it’s subject to data minimization requirements when documents are shared externally.
Word has a built-in Document Inspector (File → Check for Issues → Inspect Document) that can find and remove some metadata categories. It’s a reasonable first step for casual use, but it has limitations: it doesn’t always catch everything in the XML, and it requires you to remember to run it every time.
For reliable, comprehensive metadata removal, dedicated tools that operate on the underlying XML are more thorough. MetaStrip opens your DOCX file in the browser using JSZip, accesses the XML files directly, sanitizes or removes the metadata entries, and repackages the clean document — all without your file ever leaving your device.
The key principle: metadata removal should be a step in your document sharing workflow, not something you remember to do occasionally. For legal professionals, it should be as automatic as running spell-check before sending.
Before sending a Word document externally, consider whether you’ve addressed each of these: author and “last modified by” names, company name and template information, total editing time and revision count, comments and tracked changes (both visible and hidden), custom properties set by your organization’s templates, and embedded objects that might carry their own metadata.
Or you can drop the file into MetaStrip and handle all of them in about two seconds.
The three seconds it takes to strip metadata is a lot less painful than the conversation you’ll have if opposing counsel finds your negotiation notes buried in the XML.
Free for single files. No account, no upload, no tracking.
Open MetaStrip →