[vtk-developers] VTK XML File Formats

Michael Halle mhalle at bwh.harvard.edu
Sat Dec 28 11:43:50 EST 2002


First, let me say thanks for all the work Brad's done on the XML file
formats, and to Kitware for making it possible.  Thanks!

I've looked over the documentation some, and here are a couple of
comments I have, mostly about data types and portablility.

My sense is that the introduction of the new file formats are the time
to add some better options for portable interchange.  There are two
barriers to that goal at this point: portable data and portable
typing.  Right now, we don't have either in the binary case and only
portable data in the ascii case.  This is a somewhat difficult problem,
but it really should get fixed.

Let's deal with typing first.  Once a file is committed to disk, what an
"unsigned char," or worse an "unsigned long," is (in the C sense)
undefined: we don't know how big it is.  That piece of information is
important in the file, which in general is an out-of-memory
representation.  And it's all because K&R decided not to provide
standard constructs or typedefs for fixed size types, a fairly
shocking omission when you look back on it.

There's now a reasonable standard for type names and definitions:  the
xmlschema schema.  Here's the link to the recommendation:
http://www.w3.org/TR/xmlschema-2/

In this standard, int is 32 bit signed, unsignedInt is 32 bit
unsigned, long is 64 bit signed, unsignedLong is 64 bit unsigned, byte
is 8 bit signed, unsignedByte is 8 bits unsigned.  New types (simple or
complex) can be defined later.  The schema itself provides exact
definitions for the text representations of the types, but I see know
reason not to standardize on the type names for both text *and* binary
representations of types in files.

Readers should responsible for converting external types into internal
ones.  Similarly, writers should do the encoding, perhaps
intelligently.  If you have only five Id references, you don't need to
have a 64 bit type for them just because your internal IdType is 64 bit.
 In the worst case, the encoding isn't general, and we're no
worse off (actually still better off) than we are now.  For most
machines in the world, the type encoding is essentially trivial.

This brings us to data portability.  Getting the types right takes care
of some of the problems of data portability, but not all of them. In the
binary representation, we still need to worry about big/little endian
and floating point representations.  In the XML format,
machine-specific data is being encoded as base64.  (An aside: VTK calls
this format "binary", the xmlschema definition is "base64Binary",
which makes more sense to me.  I'd suggest changing the VTK name.
Similarly, "ascii" also doesn't sound right to me either -- there must
be another standard name for text format. "text"?)

Since base64 encoding/decoding is already eating cycles, why not just
use Sun's XDR representation (big endian, IEEE floats/doubles) at
least as an option for writing to external files?  There's an RFC,
several mature open implementations, and most vendors implement it
natively as part of their NFS support.  This solves the entire problem
of portable reading/writing as I see it.  I believe it should at least
be an option to output datatypes in XDR format: if you want a
bitwise-exact representation of your in-memory data, you could make a
NonportableOn() bit on your writer (which would set an attribute in the
file to signal a non-portable encoding).  For the huge majority of
applications, any change in representation is minor or nonexistent.

(One implementation note: XDR pads all messages out to four-byte
boundaries, which in general makes sense.  You'd have to be a little
careful when encoding/decoding.)

With these relatively minor changes in types and encoding, 99% of the
XML VTK files in the world would be portable, and that's good.

I'd be happy to provide any other input on this topic I can.

--Mike

Michael Halle
mhalle at bwh.harvard.edu






More information about the vtk-developers mailing list