[vtk-developers] VTK XML File Formats

John Shalf jshalf at lbl.gov
Mon Dec 30 22:41:03 EST 2002


Hi Michael,
this sounds a lot like you would like to recreate the NetCDF or HDF 
file formats.  Both have robust file type definitions for binary data.  
  In particular HDF5 has very flexible type definitions, automatic 
translation between many machine-specific formats, and even a robust 
XML representation for its type schema.
	http://hdf.ncsa.uiuc.edu/
	http://hdf.ncsa.uiuc.edu/HDF5/XML/
I think it would be very straightforward to do 1-to-1 bi-directional 
encoding of Brad's XML description of data into HDF5.  HDF5 also 
supports parallel I/O, so the mapping will work well under both 
circumstances.

 From the standpoint of encoding binary data using XDR, I think that 
subject has been covered by the NetCDF and HDF people in the past.  The 
original file formats in both cases were based on XDR encoding, but 
over time, it became clear that the XDR encoding mechanism is 
extraordinarily inefficient.  Profiling tools showed that it was indeed 
the rate-limiting step for file read/write performance.  HDF created 
its own lighter-weight machine-independent binary encoding scheme that 
improved performance.  HDF5 offers even more flexibility by allowing 
you to write data as machine-native format and translate on read (or 
the reverse when you translate on write).  Rather than taking the 
translation hit twice in order to encode/decode the platform-neutral 
format, HDF5 offers an additional ability to select when you have the 
type-translation hit.  It also supports complex datatypes (tensors, or 
more heterogeneous structures).

Anyways, one tack you could consider is to use HDF5 or NetCDF as your 
platform-neutral binary format.  It is very amenable to XML 
representations.  Both have been around for many years and are very 
familiar to the scientific community.

-john

On Saturday, December 28, 2002, at 08:43 AM, Michael Halle wrote:

> First, let me say thanks for all the work Brad's done on the XML file
> formats, and to Kitware for making it possible.  Thanks!
>
> I've looked over the documentation some, and here are a couple of
> comments I have, mostly about data types and portablility.
>
> My sense is that the introduction of the new file formats are the time
> to add some better options for portable interchange.  There are two
> barriers to that goal at this point: portable data and portable
> typing.  Right now, we don't have either in the binary case and only
> portable data in the ascii case.  This is a somewhat difficult problem,
> but it really should get fixed.
>
> Let's deal with typing first.  Once a file is committed to disk, what 
> an
> "unsigned char," or worse an "unsigned long," is (in the C sense)
> undefined: we don't know how big it is.  That piece of information is
> important in the file, which in general is an out-of-memory
> representation.  And it's all because K&R decided not to provide
> standard constructs or typedefs for fixed size types, a fairly
> shocking omission when you look back on it.
>
> There's now a reasonable standard for type names and definitions:  the
> xmlschema schema.  Here's the link to the recommendation:
> http://www.w3.org/TR/xmlschema-2/
>
> In this standard, int is 32 bit signed, unsignedInt is 32 bit
> unsigned, long is 64 bit signed, unsignedLong is 64 bit unsigned, byte
> is 8 bit signed, unsignedByte is 8 bits unsigned.  New types (simple or
> complex) can be defined later.  The schema itself provides exact
> definitions for the text representations of the types, but I see know
> reason not to standardize on the type names for both text *and* binary
> representations of types in files.
>
> Readers should responsible for converting external types into internal
> ones.  Similarly, writers should do the encoding, perhaps
> intelligently.  If you have only five Id references, you don't need to
> have a 64 bit type for them just because your internal IdType is 64 
> bit.
>  In the worst case, the encoding isn't general, and we're no
> worse off (actually still better off) than we are now.  For most
> machines in the world, the type encoding is essentially trivial.
>
> This brings us to data portability.  Getting the types right takes care
> of some of the problems of data portability, but not all of them. In 
> the
> binary representation, we still need to worry about big/little endian
> and floating point representations.  In the XML format,
> machine-specific data is being encoded as base64.  (An aside: VTK calls
> this format "binary", the xmlschema definition is "base64Binary",
> which makes more sense to me.  I'd suggest changing the VTK name.
> Similarly, "ascii" also doesn't sound right to me either -- there must
> be another standard name for text format. "text"?)
>
> Since base64 encoding/decoding is already eating cycles, why not just
> use Sun's XDR representation (big endian, IEEE floats/doubles) at
> least as an option for writing to external files?  There's an RFC,
> several mature open implementations, and most vendors implement it
> natively as part of their NFS support.  This solves the entire problem
> of portable reading/writing as I see it.  I believe it should at least
> be an option to output datatypes in XDR format: if you want a
> bitwise-exact representation of your in-memory data, you could make a
> NonportableOn() bit on your writer (which would set an attribute in the
> file to signal a non-portable encoding).  For the huge majority of
> applications, any change in representation is minor or nonexistent.
>
> (One implementation note: XDR pads all messages out to four-byte
> boundaries, which in general makes sense.  You'd have to be a little
> careful when encoding/decoding.)
>
> With these relatively minor changes in types and encoding, 99% of the
> XML VTK files in the world would be portable, and that's good.
>
> I'd be happy to provide any other input on this topic I can.
>
> --Mike
>
> Michael Halle
> mhalle at bwh.harvard.edu
>
>
>
> _______________________________________________
> vtk-developers mailing list
> vtk-developers at public.kitware.com
> http://public.kitware.com/mailman/listinfo/vtk-developers




More information about the vtk-developers mailing list