This document was prepared using Arachnophilia 4.0 (a Web page authoring tool written by Paul Lutus), and dot (a directed graph illustration tool, written by various authors). You can find the dot sources for the images here and here.
The command line I use for dot under Win32 is:
type filename.dot | dot -Tjpeg -o filename.jpg
Enjoy. (Update 2005: I've hacked these jpg files to be png instead, to save bandwidth.)
Introduction
What is structured storage?
Program overview
Class hierarchy design
Coding the interface
The COM interface to structured storage
A structured storage (read-only) wrapper
Front end
Bottom of page
I suppose I could bang on a bit about the history of OLE and structured storage, but let's face it - you don't care and neither do I. All you want to know is how to use this stuff.
The idea of this page is to demonstrate how to read structured storage files using Visual C++ 6.0. To benefit maximally, you will find a good working knowledge of C++ useful, but I'll document any bits that I think some people might struggle with.
It's a way of sticking all your eggs in one basket. Think of a disk file system. Specifically, think of the FAT system used in early MS hacks. You have directories, and files. Directories can contain other (child) directories, and also files. There is a special directory with no parent, which we call the "root directory".
Structured storage is very similar, and deliberately so. It even mimics the
FAT system's use of 512-byte and 4096-byte clusters. In a structured storage
file, we have
storages
, which are analogous to directories, and
streams
, which are analogous to files. Streams are where the data is kept. Storages
are how it is organised. Here's a picture:
Okay, I hope you have the basic idea. Now, what we need is some programmatic ways to navigate to a particular part of the "directory tree" (although we'd better start calling it the "storage tree", hadn't we?).
What we're going to do is create a kind of wrapper around the structured storage interface. It's not a complete wrapper, because we are only going to support reading operations. Nevertheless, I hope it will be sufficiently instructive that it will inspire you to produce your own write-enabled version.
The general idea is to make this an active wrapper. All you have to do is instantiate an object dynamically, handing it the name of a structured storage file as you do so, and - in the blink of an eye - it will load the entire structure of the file into memory (but not the data itself). The organisation of our own objects will reflect that of the structured storage file, but in a much more intuitive way (in my humble opinion!) than that provided by Microsoft. Each node of our own object tree will contain a Microsoft COM interface pointer (either an IStream or an IStorage - more about those later). The code will show how to use those pointers to get at the actual data of a structured storage file. And you don't have to worry about cleaning up afterwards - simply deleting the original object will unravel all the allocations automatically via the standard C++ destructor mechanism.
The usual problem with wrappers like this one is that they are presented all in one go, which means that you hit a steep learning curve straight away. Whilst I cannot promise to eliminate that problem, I've done my best to obviate it, by presenting the code little by little, and by explaining at each stage how it is built up and, to some extent, why each piece is there. I hope you find this approach refreshing. If you want the whole code in one go, you can get it here .
Before you get all excited about this, I must stress that this page is not intended to teach you how to design classes or class hierarchies. Although I have used C++ a fair old bit over the years, I do not claim to be an expert in the language. (Does such a creature as a C++ expert really exist?) I am certainly not the world's leading exponent of OOP. Nevertheless, it's rather easier to do COM in C++ than in C, and I have no wish to make life difficult for myself.
I'm not a great fan of OOP in general or inheritance in particular. Nevertheless, ahead of us there lies the problem of representing two disparate kinds of entity that nevertheless share a significant property - the property of "belonging to" a storage. Yes, a storage can be contained in a storage, and a stream can be contained in a storage. Thus, to represent the children of a storage, we would like a convenient way to represent the idea of "something that is either a storage or a stream" and inheritance appears to give us this rather easily. Enter the "RStorageMember" abstract base class. (It's abstract because I can't think of any good reason to instantiate such a class.)
From this base class, we derive two classes, RStorage and RStream, as this
diagram shows.
I tried to make that diagram kind of UMLish, but don't trust your pension to it. :-)
The whole point of doing this inheritance thing is to enable us to store objects of both the derived classes in a single vector. In case you don't know what a vector is, it's basically an array with attitude. It's one of the STL (Standard Template Library) templates that have been integral to C++ ever since ISO/IEC 14882:1998 standardised the language. It supports normal array indexing such as vec[index] , and also provides functionality such as "add an element to the end of the array" (rather conveniently expanding the array if necessary).
Let's set up the inheritance code first:
class RStorageMember
{
};
class RStorage : public RStorageMember
{
};
class RStream : public RStorageMember
{
};
(No, it won't compile yet.)
In due course, we will want to display the contents of the structured storage file. We may well want to display storages and streams in different ways, so this might be a good candidate for a pure virtual function (which will enforce the abstract base class behaviour we want). So let's add some member function declarations. (For the purpose of this exercise, we'll just use iostreams for display.) Since we want a caller to be able to access these methods, we make them public. This means that, for the moment, the keyword "public" appears at the beginning of the class definition, which seems like a complete waste of time. But later on, we'll put in some more stuff above.
#include <iostream>
using namespace std; /* I don't like doing this, but Visual C++
sometimes messes up templates if
you specify them individually with
using std::vector or whatever */
class RStorageMember
{
public:
virtual void Report(ostream &) = 0;
};
class RStorage : public RStorageMember
{
public:
virtual void Report(ostream &);
};
class RStream : public RStorageMember
{
public:
virtual void Report(ostream &);
};
Well, this still won't compile! But here's what will happen in due course. We will have a vector of RStorageMember objects. Each of them will really be either an RStream or an RStorage , but the vector doesn't need to worry about that. We can, therefore, iterate through the vector, calling the Report() method, and each object will use the Report() method appropriate to its class. If you don't follow me just yet, put in some cout statements (later, please - stay with me for now) that say things like "I'm the RStorage Report method" and "I'm the RStream Report method" , and see what happens when you iterate through the vector (which we're coming to, by degrees).
Now that we have the basic inheritance mechanism in place, we can start to relax a little. But not too much, because we have a problem. We need to record, for any given storage or stream, which storage acts as its parent (think directories again - all files and directories, except the root, have a parent directory). Since a parent has to be a storage (remember, streams cannot contain storages or streams), it makes sense to put the parent member into the common base class. So the compiler needs to know that there is a class called RStorage whilst it is compiling RStorageMember . But RStorage inherits from RStorageMember , so the compiler needs to know about RStorageMember when compiling RStorage ! So we have to compile everything before we compile everything else, if you see what I mean. Fortunately, we can solve this with a forward reference, like so:
#include <iostream>
#include <vector> /* for the STL template, "vector" */
using namespace std;
class RStorage; /* Here is the forward reference */
class RStorageMember
{
protected:
RStorage *Parent; /* pointer to parent storage - NULL if this is the root storage */
public:
virtual void Report(ostream &) = 0;
};
class RStorage : public RStorageMember
{
vector <RStorageMember *> Child; /* Vector of child objects, which
are either storages or streams */
public:
virtual void Report(ostream &);
};
class RStream : public RStorageMember
{
public:
virtual void Report(ostream &);
};
Here we've added a protected section to the base class. This means that we can see the Parent member of the base class within derived class member functions, but not from outside them.
It would be quite useful to be able to display the level of nesting of a storage or stream, so we'll need a Depth counter (either that, or re-calculate depth each time, which seems absurd in comparison). We will also want a function to perform the nesting for us. This function, Tab() , can go in the base class, and can be protected (since only the derived classes will need access to it). We'll hand it an ostream reference, so that it knows where to do its indenting. We will use a file scope constant to control the "tab size", so that we can change it easily if we wish.
Since this program has no idea what kind of data it will be looking at, it will assume the worst - unformatted binary data. So we'll want a hex dumping routine.
Apart from depth, we will also want to track a storage's (or stream's) name, and of course we would like to know the location of its parent. This gives us our constructor syntax:
RStorageMember(RStorage *Parent_, const char *Name_, int Depth_); RStorage(RStorage *Parent_, const char *Name_, int Depth_); RStream(RStorage *Parent_, const char *Name_, int Depth_);These constructors go in the "public" sections of their respective classes, as we'll see the next time I show the code (won't be long now). You will note that I use the convention of a trailing underscore on the parameter names, to keep them distinct from, and yet obviously related to, their class member counterparts.
A word about constructors if I may: there's a really important rule of good practice in C++, known as the "Rule of Three", which (in one popular form) says that "A class with any of {destructor, assignment operator, copy constructor} generally needs all 3". I have flagrantly ignored this rule, in the quest to keep the code even simpler than is actually possible or sensible. I am trying to focus on structured storage. Please feel free to add assignment operators and copy constructors yourself.
Since the process of constructing our description of the structured storage file will involve acquiring resources which will later need to be released, we need destructors. Since we will actually be destroying RStorageMember objects, we need to provide a virtual destructor in that class. This doesn't actually destroy anything itself, but it does allow the inheritance mechanism to cut in so that the destructors for RStorage and RStream objects are called properly.
Since destructors take no arguments, we don't need to explain them, so let's now put in the bits and bobs of code that we've just been discussing:
#include <iostream>
#include <vector>
using namespace std;
const int TabSize = 4;
class RStorage; /* Here is the forward reference */
class RStorageMember
{
protected:
RStorage *Parent;
string Name;
int Depth;
void Tab(ostream &);
public:
RStorageMember(RStorage *Parent_, const char *Name_, int Depth_);
virtual ~RStorageMember();
virtual void Report(ostream &) = 0;
};
class RStorage : public RStorageMember
{
vector <RStorageMember *> Child; /* Vector of child objects, which
are either storages or streams */
public:
RStorage(RStorage *Parent_, const char *Name_, int Depth_);
~RStorage();
virtual void Report(ostream &);
};
class RStream : public RStorageMember
{
public:
RStream(RStorage *Parent_, const char *Name_, int Depth_);
~RStream();
virtual void Report(ostream &);
void HexDump(ostream &, unsigned char *base, size_t size);
};
We can't go very much further without introducing some COM interfaces. It is not my intention here to document the interfaces themselves. That is Microsoft's job, and you should have no major problems in uncovering that documentation . If you go to that page (and if Micros~1 haven't changed things around between my writing this and your reading it), you will see a treeview on the left side of the page. Select "Component Development" and then "Structured Storage" to get to the top level of the COM interface docs for structured storage.
(I studied those pages for a long time, but they didn't help me as much as I would have liked - hence this Web page!)
The COM interfaces we care about at present are IStorage and IStream . No prizes for guessing what they interface to.
It ought to come as no surprise that we will need to store IStorage and IStream interface information in our own classes, but we won't instantiate them ourselves. We have to use pointers instead. Nor will we initialise these pointers ourselves (except perhaps to NULL). Rather, we will use Microsoft API calls, passing the pointers' addresses to the appropriate routines. That way, the API can initialise them on our behalf. So we will add the following declarations to the "private" sections of RStorage and RStream respectively:
IStorage *Storage;and
IStream *Stream;
(To pick up the appropriate definitions, we must add <windows.h> to the mix. While we're at it, we'll add <cctype>, because we'll be using isprint() later on.)
We will also need, for the RStream class, an instance of a STATSTG structure (more about that later), and a way to interpret failure codes (for storages and streams alike). And we'll need ways to tell an RStorage object to load its child objects into memory and to open a child stream.
Finally, we will need some way to report errors. I've implemented a very simplistic exception-handling scheme, which is really intended for debugging, and you should replace it with a more robust subsystem in your own good time. Adding those (and <string> , for the obvious reason that we'll be using STL strings later on) gives us our complete class interface and exception-handling mechanism:
#include <windows.h>
#include <iostream>
#include <string>
#include <vector>
#include <cctype>
using namespace std;
const int TabSize = 4;
struct Exception
{
string method;
int line;
string error;
Exception(const char *method_,
int line_,
const char *error_)
:
method(method_),
line(line_),
error(error_) /* see below */
{
}
};
/* In case you haven't encountered initialiser lists before, let me explain
* briefly. The colon after the function declarator means "initialiser
* list follows". The initialiser list is a comma-separated list of
* items of the form member_name(initial_value) - typically, the initial value
* is either one of the parameters to the constructor, or a numeric literal.
* In this case, we have "copy the value of method_ into method, the value
* of line_ into line, and the value of error_ into error". Note the trailing
* underscores, which is one way to let us use similar names to stress the
* similarity of a member and its initialiser, whilst noting the fact that they
* are in fact different objects.
*
* Since we've managed to do all of the necessary work for the constructor simply
* by using the initialiser list, the function body is of course empty. This
* confused the living daylights out of me the first time I saw it, which is why
* I've gone to some trouble to explain it here.
*/
class RStorage;
class RStorageMember
{
protected:
RStorage *Parent;
string Name;
int Depth;
void Tab(ostream &);
public:
RStorageMember(RStorage *Parent_, const char *Name_, int Depth_);
virtual ~RStorageMember();
virtual void Report(ostream &) = 0;
};
class RStorage : public RStorageMember
{
IStorage *Storage;
vector <RStorageMember *> Child;
public:
virtual void Report(ostream &);
RStorage(RStorage *Parent_, const char *Name_, int Depth_);
~RStorage();
int LoadChildObjects();
HRESULT OpenStream(IStream **str, string &Name_);
string GetReason(HRESULT Result);
};
class RStream : public RStorageMember
{
IStream *Stream;
STATSTG StreamStats;
public:
RStream(RStorage *Parent_, const char *Name_, int Depth_);
~RStream();
virtual void Report(ostream &);
void HexDump(ostream &, unsigned char *base, size_t size);
string GetReason(HRESULT Result);
};
Don't worry about the IStream and IStorage stuff just yet. All will become marginally clearer than it is at present.
Okay, we now have an interface, but no implementation. Let's have a bit of a think about how we want this to work. Firstly, we're not going to bother with any of that GUI stuff. Too much like hard work, right? So how about this for a front end:
void Usage(const char *progname)
{
cerr << "This program dumps an OLE compound file." << endl;
cerr << "Usage: " << progname << " exportfilename" << endl;
}
int main(int argc, char **argv)
{
if(argc > 1)
{
try
{
RStorage *Root = new RStorage(NULL, argv[1], 0);
Root->Report(cout);
delete Root;
}
catch(Exception &e)
{
cerr << e << endl;
}
}
else
{
Usage(argv[0]);
}
return 0;
}
As you can see, this is about as simple as it gets. If a command-line argument is offered, treat it as the name of a structured storage file to be read and displayed. Otherwise, display an error and quit. Assuming we have a name, we use it to construct an RStorage object to which we get a pointer. This RStorage object represents our personal interface to the root of the structured storage file. Assuming that works, we will tell this object to load its child objects (and it will tell any child storages to load their child objects, and they... - this is a very natural recursive algorithm).
Once the loading is complete, we tell the root object to report on itself and its child objects. Another recursive algorithm. Once the reporting is done, we can quit. (Well, first we must destroy the object tree, which is another recursive algorithm!)
If recursion scares you, don't let it. Recursion basically means that a function calls itself. The idea is that the implementation "marks its place" when it encounters a recursive call (just as it would for any other kind of call), and re-enters the function. It can do this over and over if need be. When, eventually, the function encounters a state where the recursion is not required (the so-called "base case"), it will exit normally in due course, back to the previous invocation (just like in the case of a normal function call), and so on. (Think of pushing onto, and popping off, a stack.)
Now that we have a front end (and thus know where we're going), we can start to implement the class. And the first thing we need to do is look at the constructor for RStorageMember . Here it is:
RStorageMember::RStorageMember(RStorage *Parent_, const char *Name_, int Depth_) : Parent(Parent_), Depth(Depth_), Name(Name_)
{
}
As you can see, we've used another initialiser list. This is suitable for
initialising pointers and ints and stuff, but it doesn't work for arrays.
Therefore, I had to copy the name of the root storage "by hand" using
strcpy
in the body of the constructor - and then I realised I could make it a C++
string, which doesn't count as an array, so I did, and the constructor is a
little more elegant as a result.
This constructor isn't a whole lot of use on its own, because we can't instantiate RStorageMember objects. But it becomes useful when we add in the constructor for RStorage . Now is the time not to panic! Study the comments carefully in this next bit:
RStorage::RStorage(RStorage *Parent_,
const char *Name_,
int Depth_)
:
RStorageMember(Parent_, Name_, Depth_)
/* This is an initialiser list with a difference! What is happening here is that
* the parameters passed to the RStorage constructor are being handed on
* to the RStorageMember constructor. When the RStorageMember part of the RStorage
* object is complete, control returns to this constructor, and the function body
* is entered.
*/
{
Storage = NULL;
WCHAR tmpName[MAX_PATH];
/* Structured storage filenames use wide characters. If you use wide characters
* too, you should be able to thread them in quite easily. Here, though, we
* are just using a simple little console app and the normal Windows ANSI
* character set, so we have to convert the name of the file we want to open.
* We do that via a simple call to a WinAPI function.
*/
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, (LPCTSTR)Name.c_str(), -1, tmpName, MAX_PATH);
HRESULT Result;
if(NULL == Parent)
{
/* We want the root. We now call the WinAPI function
* StgOpenStorage() to open the root node. As you can see, we pass
* to this function the address of the Storage pointer. This means
* that the IStorage::OpenStorage() function can point it somewhere
* different and have that change stick when control is returned to us.
* And that's precisely what happens. As a result of this call, our
* pointer is set up correctly to allow us to interface with the underlying
* structured storage object.
*/
Result = StgOpenStorage(tmpName,
NULL,
STGM_READ | STGM_SHARE_EXCLUSIVE,
NULL,
0,
&Storage);
/* If that worked, let's load everything else. */
if(S_OK == Result)
{
LoadChildObjects();
}
}
else
{
/* This is not the root - it is a named child storage of the storage
* that is the parent (well, duh!), so we need to go up to the parent,
* and use its Storage member's OpenStorage() method to open this storage.
* (Think "need to go into subdirectory, so need to be in that
* subdirectory's parent directory".)
*/
Result = Parent->Storage->OpenStorage(tmpName,
NULL,
STGM_READ | STGM_SHARE_EXCLUSIVE,
NULL,
0,
&Storage);
}
if(S_OK != Result)
{
/* Whether this is a root node or not, something went wrong, so we
* need to report it. This code calls the GetReason() method, which
* we haven't written yet. Don't worry about it - it just returns
* an error string (e.g. "Couldn't open file") corresponding
* to the passed-in HRESULT value. (HRESULT is just an integer type.)
*/
string s = "Can't open ";
if(NULL == Parent)
{
s += "root ";
}
s += "storage ";
s += Name;
if(NULL != Parent)
{
s += ", Parent ";
s += Parent->Name;
}
s += ": ";
s += GetReason(Result);
/* Now that we've assembled the string, we can create and throw
* an exception object. Who catches it? Well, who cares? It's Not
* Our Problem. (Okay, okay, main() catches it...)
*/
Exception e("RStorage::RStorage", __LINE__, s.c_str());
throw e;
}
}
That's about it for the
RStorage
constructor. If we wanted, we
could now write stubs for all the functions we haven't yet implemented, and run
the program. This would enable us to
open
a root storage, but that's all. We don't yet have the code in place to read
the contents of that root storage.
(Yes, it's true that we also have the code to construct a non-root RStorage correctly. But we can't do anything with that code just yet.)
What is left? Well, we need to supply a constructor for RStream , and we need to find a way to open a stream. Then we need to write some destructors, and then we need to write a reporting routine or three. Okay, let's get on with constructing an RStream object:
RStream::RStream(RStorage *Parent_,
const char *Name_,
int Depth_)
:
RStorageMember(Parent_, Name_, Depth_)
{
/* The StreamStats object (an instance of STATSTG, which is part of the Microsoft
* structured storage interface) will be populated with various bits of
* useful information after the call to Stat(). One of the things it will do
* is populate the pwcsName member with the stream name. It will dynamically
* allocate enough memory to store that name. Later on, we'll have to release
* that memory ourselves. If, for some reason, Stat() fails to allocate
* that memory, it would be foolish of us to try to release it, yes? By
* setting it to NULL, we give ourselves a chance to recover. (In fact, passing
* a null pointer to the releasing routine we will use is a well-defined no-op.)
*/
StreamStats.pwcsName = NULL;
/* OpenStream() is a method in our own RStorage class. We need to pass the
* address of our IStream pointer to that routine, so that it can pass it
* on to the appropriate Microsoft interface routine. We can't call the
* Microsoft routine here because we would need access to a non-public member
* of the RStorage class, which is obviously not going to happen; so we
* meekly pass over the address of our Storage member instead.
*/
HRESULT Result = Parent->OpenStream(&Stream, Name);
if(S_OK != Result)
{
string s = "Stream couldn't be opened. Reason: ";
s += GetReason(Result);
Exception e("RStream::RStream", __LINE__, s.c_str());
throw e;
}
/* Stat(), which is available through Microsoft's IStream interface, populates
* a STATSTG structure with various snippets of useful information - notably
* the number of bytes of data managed by the IStream. (Yes, that's a bit vital.)
*/
Result = Stream->Stat(&StreamStats, STATFLAG_DEFAULT);
if(S_OK != Result)
{
string s = "No stats for stream. Reason: ";
s += GetReason(Result);
Exception e("RStream::RStream", __LINE__, s.c_str());
throw e;
}
}
Now that we've got the
RStream
constructor, we're not far short of a working program. Let's take a look at
the source for the
RStorage::OpenStream
method we just called. As you can see, for our meagre purposes we are not
bothering with wide characters, but the Microsoft APIs rather irritatingly
insist on using them, so we have to convert. Other than that, all we have to do
is pass the
IStream **
(the one handed to us by
RStream::RStream
) straight through to the
IStorage::OpenStream
method. (
IStorage::OpenStream()
is analogous to
fopen()
.)
HRESULT RStorage::OpenStream(IStream **str, string &Name_)
{
if(NULL == Storage)
{
Exception e("RStorage::OpenStream", __LINE__, "Null Storage");
throw e;
}
WCHAR tmpName[MAX_PATH];
MultiByteToWideChar(CP_ACP,
MB_PRECOMPOSED,
(LPCTSTR)Name_.c_str(),
-1,
tmpName,
MAX_PATH);
return Storage->OpenStream(tmpName,
NULL,
STGM_READ | STGM_SHARE_EXCLUSIVE,
0,
str);
}
I was going to deal with the destructors next, but it is probably wiser to
conclude the discussion on building up the picture before we start tearing it
down again. :-)
This next routine populates the Child vector of an RStorage object. It starts off by finding out how many child objects there are (both storages and streams). Hence the IEnumSTATSTG pointer. I don't think you have to worry about this too much - just think of it as a magic cookie that we need to capture so that we can use its Next method to iterate through the child objects. As we uncover each child object, we dynamically allocate some memory for an RStorage (or RStream ) shadow of it, and add that pointer into our Child vector. If it happens to be a storage that we've uncovered, we then recurse into it, telling it to load its own children. Finally, we release our magic cookie from its jar.
int RStorage::LoadChildObjects()
{
int Children = 0;
IEnumSTATSTG *estg;
if(NULL == Storage)
{
Exception e("RStorage::LoadChildObjects",
__LINE__,
"Storage is NULL");
throw e;
}
/* count the child objects */
if(S_OK != Storage->EnumElements(0, NULL, 0, &estg))
{
Exception e("RStorage::LoadChildObjects",
__LINE__,
"EnumElements failed");
throw e;
}
STATSTG stgstruct = {0};
unsigned long Fetched = 0;
/* iterate through the child objects */
while(S_OK == estg->Next(1, &stgstruct, &Fetched))
{
char *c = new char[MAX_PATH];
WideCharToMultiByte(CP_ACP,
NULL,
stgstruct.pwcsName,
-1,
c,
MAX_PATH,
NULL,
NULL);
/* And this is where we use the fact that an RStream is an RStorageMember
* and an RStorage is an RStorageMember, so that we can stick them all into
* the same container. Later on, you'll see how powerful this is (when we
* come to reporting).
*/
if(STGTY_STREAM == stgstruct.type)
{
/* Create a new stream object, and stick it at the end of the Child vector */
RStream *newStream = new RStream(this, c, Depth + 1);
Child.push_back(newStream);
}
else if(STGTY_STORAGE == stgstruct.type)
{
/* Create a new storage object, and stick it at the end of the Child vector */
RStorage *newStorage = new RStorage(this, c, Depth + 1);
Child.push_back(newStorage);
/* Recursively load this storage's children */
newStorage->LoadChildObjects();
}
delete [] c;
++Children;
}
/* Open the jar and let the magic cookie fly away */
estg->Release();
return Children;
}
We're nearly done. But just before we get onto the interesting bit - the
reporting - we ought to tidy up a loose end or three. It's time to write some
destructors.
The first of these is trivial:
RStorageMember::~RStorageMember()
{
}
This destructor has to exist (because otherwise the derived class destructors
don't work properly), but it doesn't actually have to do anything.
The RStream destructor is a tiny bit more involved. You will recall that we have a pointer that we need to release. To do this, we have to use a special COM routine:
RStream::~RStream()
{
CoTaskMemFree(StreamStats.pwcsName);
Stream->Release();
}
Again, from the point of view of getting the data we want, this is just
overhead. I don't plan to explain it to you. Just be sure to incorporate it
into your program.
The RStorage destructor is still more involved. We don't just have the object itself to worry about; we must also remember to delete all its child objects.
To do this, we could use an iterator, but it's just as easy to use a loop counter in this case:
RStorage::~RStorage()
{
size_t i = Child.size();
while(i-- > 0)
{
delete Child[i];
Child.pop_back();
}
Storage->Release();
}
As you can see, we just take each child in turn, blow it away with
delete
(thus invoking its destructor), and then remove the dead pointer from the
vector.
We're nearly done. The next bit is cool. It's the actual reading and display of the data.
The first thing we'll want is an indenting routine. Here it is:
void RStorageMember::Tab(ostream &o)
{
int i;
for(i = 0; i < Depth * TabSize; i++)
{
o << ' ';
}
}
This is pretty straightforward. It uses the
Depth
member of the base class to establish how deeply this WhateverItIs is nested
within the hierarchy, and prints out
TabSize
spaces for each level, on the ostream object passed to it (probably
cout
).
Here's a hex dumper we can use for the actual data:
void RStream::HexDump(ostream &o, unsigned char *base, size_t size)
{
DWORD i;
string Literal = "";
char Hex[] = "0123456789ABCDEF";
for(i = 0; i < size; i++)
{
if(0 == i % 16)
{
if(0 != i)
{
o << "| " << Literal;
Literal = "";
o << endl;
}
Tab(o);
}
if(isprint(base[i]))
{
Literal += base[i];
}
else
{
Literal += '.';
}
o << Hex[(base[i] & 0xF0) >> 4];
o << Hex[(base[i] & 0x0F)];
o << ' ';
}
i %= 16;
while(i % 16 != 0)
{
o << " ";
i++;
}
o << "| " << Literal << endl << endl;
}
We can use this routine in a generalised reporting facility for RStream , which I'll show you now. Please note that it's in two parts (which probably indicates a sloppy design, but never mind that for now) - first, it gets the information out of the IStream interface, and then it displays various bits of it. Feel free to remove any report bits that you don't care about, or even add more, but don't mess with the code that actually retrieves the data, or you're not going to see much output.
void RStream::Report(ostream &o)
{
const char *t[] =
{
"Invalid",
"Storage",
"Stream",
"LockBytes",
"Property"
};
char *c = new char[MAX_PATH];
WideCharToMultiByte(CP_ACP,
NULL,
StreamStats.pwcsName,
-1,
c,
MAX_PATH,
NULL,
NULL);
Tab(o);
o << "Stream: " << Name << endl;
++Depth;
Tab(o);
o << "pwcsName = " << c << endl;
delete [] c; /* we only needed this so that we could display the name */
Tab(o);
o << "type = " << t[StreamStats.type] << endl;
Tab(o);
o << "cbSize = " << StreamStats.cbSize.LowPart << endl;
HRESULT Result;
LARGE_INTEGER li = {0};
/* Seek() is analogous to fseek() */
Result = Stream->Seek(li, STREAM_SEEK_SET, NULL);
if(S_OK != Result)
{
string s = "Couldn't seek on stream. Reason: "; s += GetReason(Result);
Exception e("RStream::Report", __LINE__, s.c_str()); throw e;
}
unsigned char *p = new unsigned char [StreamStats.cbSize.LowPart];
ULONG BytesRead = 0;
/* Read() is analogous to fread() */
Result = Stream->Read(p, StreamStats.cbSize.LowPart, &BytesRead);
if(S_OK != Result)
{
delete [] p;
string s = "Couldn't seek on stream. Reason: "; s += GetReason(Result);
Exception e("RStream::Report", __LINE__, s.c_str()); throw e;
}
HexDump(o, p, BytesRead);
delete [] p;
--Depth;
}
In contrast, reporting a storage is trivial. Having said that, if you haven't used an STL iterator before, you might learn something:
void RStorage::Report(ostream &o)
{
Tab(o);
o << "Storage: " << Name << endl;
vector <RStorageMember *> ::const_iterator it;
for(it = Child.begin(); it != Child.end(); ++it)
{
(*it)->Report(o);
}
}
As you can see, the code to display the storage name itself is trivial. Of slightly more interest is the iteration through the vector. Child.begin() returns an iterator value that indicates the first item in the vector. Child.end() returns an iterator value that is just past the last item in the vector. *it yields the object stored in the vector element that the iterator is tracking through (so first time round it returns the first object, second time through it returns the second, and so on), and the neat thing is that we can pretend that *it actually is the object, so in this case we can treat *it as an RStorageMember pointer, and dereference it accordingly.
All the objects in the vector are RStorageMember objects. RStorageMember has a Report method, which we try to call. Since it's a pure virtual function, it can be overridden by functions with the same name and parameter list in derived classes. Thus, if the object in question is really an RStream , the RStream::Report method will be called, whereas if it's really an RStorage , the RStorage::Report method will be called instead (in which case we have recursion, but that's okay because it turns out that we do actually need to recurse into child storages).
Sorry to bang on about inheritance even though you may well know all about it, but like I said, I know how frustrating it can be to read this stuff and think "what on earth is he talking about?"
Now, if you'd like a copy of the complete program (including any bits I've missed), you can find the source right here . Unfortunately, it doesn't have all those lovely chunky comments that I've included in this HTML version, but I doubt whether your compiler will mind about that too much.
Final disclaimer: This was not intended to be a lesson in C++, but a lesson in how to use C++ to talk to Micros~1 structured storage files. Since I've completely and utterly refused to handle Unicode properly, you may well find that some files (e.g. Word docs) come out with rather strange stream and storage names. Well, that's fixable. Petzold should sort you out with Unicode (Chapter 1, I believe!).