> I just noticed in the 7-zip help that one of the formats it can open
> is what it calls "COMPOUND", and includes msi, doc, xls and ppt.
>
> It looks like the entry for msi files in archiver.bb2 (which I probably
> haven't even looked at for a decade) is the one that's triggering ztree
> to open xls files with 7-zip if available. Sure enough, it does the same
> thing with doc and ppt, but also works for pub files. the signature is
> those first three bytes.
>
> It looks like I installed 7-zip about a year ago, so I would never have
> seen this behaviour before then, and probably haven't tried to view one
> since then. The comments in archiver.bb2 say 7-zip msi support was
> introduced in 2009.
>
> I was just a misunderstanding waiting to happen.
>
> It looks like a useful feature. In one ppt file I examined, it showed
> me there was a 20MB "file" called Pictures. I can use this to quickly
> work out which ppt files are bloated with images.
>
> Word files aren't so straight forward to analyse. I looked at two
> image bloated doc files, but the images were in different places in each.
> In one case they appeared to be in the DocumentSummary section!
Microsoft Office files are a serialisation of the objects which make up the file with header (and probably footer). These are the same as the container which is placed around an embedded document e.g. a spreadsheet in a Word document.
I used to work for a company which used a brain-dead email client which used to mangle documents sent to me by contacts in other companies. It was hard (in some cases impossible) to get them to send documents in a way that they wouldn't get mangled (because they couldn't use their own email client), so I developed a way to extract Office documents from the amorphous binary blob.
I don't remember the magic numbers which enabled a binary editor to extract the embedded documents (this was almost 12 years ago), but I would have it stored somewhere.