Microsoft Office 97 & 2000 Have A Dirty Little Secret

Posted 28 Feb 2002 at 17:59 UTC by jackshck Share This

If you are using Microsoft Office 97 or 2000, every file you create is stamped with a Globally Unique Identifier (GUID). This number matches the MAC address of your Ethernet card. It is unique to each and every ethernet card ever made. This means that every file you create can be traced back to you.

  As you probably already know, I have done extensive research on the Excel 97 file format. Late last year information regarding the GUID was brought to my attention.
  At first I thought it was just an Urban Legend, but after doing some research I found out it was indeed true! Here are some links that talk about it in more detail.

This is scarry stuff. If you are running Office 97 or 2000, I suggest an immediate switch to OpenOffice.

I have not confirmed if this behavior is present in Office XP. If anyone has information regarding XP, please post a reply to this article.

I have some more information, and will post it in a little bit.


Microsoft's Position on the GUID, posted 28 Feb 2002 at 18:23 UTC by jackshck » (Journeyer)

Hello Everyone

Here is an open leter which states Microsoft's Position on the GUID http://www.microsoft.com/presspass/features/1999/03-08custletter2.asp

Here is a portion of the letter:

The unique identifier number inserted into Office 97 documents was designed to help third parties build tools to work with, and reference, Office 97 documents.

Yeah whatever M$

The unique indentifier generated for Office 97 documents contains information that is derived in part from a network card, not from an individual user's identity, and thus it is not possible to reliably determine the author of a document.

Not true. M$ files contain the user name of the person who created the document. This in conjuction with the MAC address can reliably determine the creator of the file

GUID is well known, posted 28 Feb 2002 at 19:46 UTC by atai » (Journeyer)

It seems that Microsoft's use of GUID and the fact it derives from ethernet addresses have been well known for some time.

without negating the conclusions..., posted 28 Feb 2002 at 20:19 UTC by gus3 » (Observer)

There are some factual errors to take into account.

Yes, GUID generation on a Microsoft platform takes the MAC address into account, but it also incorporates the time of day, the OS serial number, etc etc etc. There are several different ingredients that go into GUID generation.

MAC addresses are not globally unique. MAC manufacturers know that there are duplicates, and so take steps in their distribution process to minimize the possibility that cards with the same MAC address don't end up on the same Ethernet. It does happen once in a while, though, and it gives net admins fits.

Microsoft isn't the only platform to use GUID's. Even Linux can create GUID's for disk partitions. Are these something to be afraid of? Not in and of themselves; the concern is how they are used.

One final point: The FAQ mentioned a "problem" of people who have no Ethernet card. This "problem" seemed to solve itself as high-speed access became popular. But now, it's appearing again, as more cable/DSL modem manufacturers are supporting USB.

GUIDs, posted 1 Mar 2002 at 02:59 UTC by neil » (Master)

Comparing a GUID on a filesystem with a GUID embedded in a document misses the point entirely.

This isn't about GUIDs themselves. This is about tagging disseminated documents with GUIDs.

Re: GUID's, posted 1 Mar 2002 at 15:35 UTC by jackshck » (Journeyer)

This isn't about GUIDs themselves. This is about tagging disseminated documents with GUIDs.

Exactly

To be fair..., posted 5 Mar 2002 at 16:17 UTC by julesh » (Master)

I don't think anyone really seriously suspects MS of doing this intentionally, do they? I mean, yes, if the Byte article is right they are collecting enough information to uniquely identify the author of any word document.

They could use this to - for instance - harvest word documents off the web, cross reference with windows produce registrations and find people who haven't registered a copy of office as part of an anti piracy policy. But it doesn't look as though they are doing this.

I strongly suspect it is just an accident. Your Ethernet card's MAC is possibly the most unique value that is easily accessible to the operating system. Of course it will be used in GUID creation (although personally I would have MD5 hashed it or something similar) because it will radically diminish the chances of two identical GUIDs from being created, and for many purposes having two identical GUIDs would be catastrophic.

I can also see the sense in keeping a GUID in a document and changing it every time the document is saved: it makes it easy to look at two files and say 'these are identical files' without comparing the entirity byte for byte.

I can also see why MS might want a GUID on your registration details when you register the operating system, although if I were them I'd be generating the IDs myself when I received the registration... having a unique id is a reasonable thing to do, so of course they used the mechanism that their operating system provides for generating one.

So there are serious concerns here - Microsoft do have the information to identify the author of most word documents. But don't get paranoid and attack them for it - it was almost certainly a coincidence!

I agree with the article's conclusions, posted 6 Mar 2002 at 02:56 UTC by gus3 » (Observer)

My only point is that GUID's are not very useful by themselves. The problem lies in how they are used. Besides, there are other ways of finding out who created an Office 97 file, as jackshck pointed out. Even Star Office and Open Office ask you for your personal information, when you install them. And these are embedded into their documents.

N.B.: I don't know if MS Office requires this information; I know that SO/OO do not.

Very Old Story, posted 11 Mar 2002 at 22:02 UTC by jimwelch » (Journeyer)

Mathews wrote it on March 8, 1999. Methvin & Smith wrote it on Setp 6, 1999. MS has a "patch" to stop doing most of this. As usual MS closes the door after the horse escapes!

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page