[vtkusers] parallel mesh reading and writing

Wed Jan 24 04:09:14 EST 2007

Karl,

I've been writing particle data from vtk using HDF5 in parallel and I 
would highly recommend HDF5 as a starting point for meshes. Here are two 
reasons (but keep reading for objections later)
1) HDF5 is going to be supported during our lifetimes and probably 
longer than that. The fact that the creators supply an API which enables 
anyone to easily query and extract the data makes it a winner (much like 
netCDF in the past)
2) Writing in parallel can produce a single HDF5 file (if you want) and 
not a collection of files which are indexed by processor - This makes 
the visualization of data afterwards an order of magnitude easier if the 
number of processors is different for visualization than it was for 
generation. (vtk's XML collection format does allow you to do this too - 
at the expense of sometimes some wasted memory as points or cells might 
be duplicated).

The vtk XML based multiblock/multifile collection data is very good - 
and there's a lot to be said for using it. You can get a program up and 
running in a day which will write N files out from N processes and a 
single vtm file from process zero which references all the blocks. 
ParaView will then be able to read the data in directly and the amount 
of work by you will be minimal.

However if you are going to do the job properly you'll want to worry 
about points being duplicated between processes and having a single 
master index of points,cells and the rest. What tends to happen with the 
vtk XML collection files is that unless you are careful, points might be 
present in multiple blocks and be written out from multiple machines 
causing duplication. ParaView itself is pretty bad in this respect when 
you extract datasets from multiblock data and writes them out - you end 
up with all the points from the entire collection of datasets being 
written when in fact only a small subset of them are used by the block 
you are writing.

If you have distinct meshes/blocks which are present on each processor 
and they are completely self contained, don't overlap significantly with 
the meshes on neighbouring processors then you will be fine just writing 
an xml vtu file from each processor and then use a piece of code like 
this (polydata example) from process zero. It simply creates (one for 
each timestep) a vtm (multiblock) collection text file (xml) which lists 
the blocks from each other processor. this can then be loaded into 
paraview and all the blocks will be picked up correctly. For 
unstructured data, simply use a vtkXMLUnstructuredGridWriter (or 
whatever it is called) instead of a vtkXMLPolyDataWriter on each node 
and change the *.vtp below into *.vtu.
    sprintf(name, "/scratch/biddisco/results/b%02i/Particles.%04i.vtp", 
this->UpdatePiece, this->ActualTimeStep);
    if (this->UpdatePiece==0) {
      sprintf(name2, "/scratch/biddisco/results/Particles.%04i.vtm", 
this->ActualTimeStep);
      vtkstd::ofstream mainfile(name2);
      mainfile << "<?xml version=\"1.0\"?>" << vtkstd::endl;
      mainfile << "<VTKFile type=\"vtkMultiGroupDataSet\" 
version=\"0.1\" byte_order=\"BigEndian\" 
compressor=\"vtkZLibDataCompressor\"> " << vtkstd::endl;
      mainfile << "  <vtkMultiGroupDataSet> " << vtkstd::endl;
      for (int p=0; p<this->UpdateNumPieces; p++) {
        sprintf(name3, "b%02i/Particles.%04i.vtp", p, this->ActualTimeStep);
        mainfile << "  <DataSet group=\"0\" dataset=\"" << p << "\" 
file=\"" << name3 << "\"/> " << vtkstd::endl;
      }
      mainfile << "  </vtkMultiGroupDataSet> " << vtkstd::endl;
      mainfile << "</VTKFile> " << vtkstd::endl;
    }

If all your processing is to be done using ParaView or vtk-like 
applications, then this procedure will get you a long way and you'll 
have it working before tonight. If you need more advanced processing in 
other applications, then HDF5 might be the choice. For any VTK data type 
I would do this
Create one dataset for points, one for Cells holding the points Ids, one 
for the celltype, etc etc. If the meshes are all distinct and not 
overlapping/duplicating, it will be enough to just gather the indexes 
from each process and create an HDF5 memory space on each node and stuff 
the data into the file (I'm assuming you are already familiar with HDF5 
memory and data spaces etc). This would be analogous to the case above 
where the simulation domain is broken into distinct zones or blocks and 
each can be processed individually. You may end up with some boundary 
cells duplicated, but you could easily work it to all go nicely into a 
single hdf5 file.
If the meshes were more complex in the way they overlap/interact then we 
may have trouble - because then you'll want to gather all the points to 
one process, do some sort of MergePoints operation to remove duplicates, 
then cross reference all the cell point indexes etc etc - this could 
take a lot of memory and time and not be worth the effort (the KdTree 
probably does much of this already - as a by the way).

I started this email with the intention of recommending HDF5 - but I'm 
ending it with vtk's XML collection as my preferred choice as it will be 
less work for complex stuff - and the readers and writers already exist. 
If your data is already in vtk form, then the xml will save you a lot of 
work.

I welcome input from anyone else considering putting meshes into hdf5

JB

-- 
John Biddiscombe,                            email:biddisco @ cscs.ch
http://www.cscs.ch/about/BJohn.php
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland      | Fax:  +41 (91) 610.82.82