[vtkusers] parallel mesh reading and writing
John Biddiscombe
biddisco at cscs.ch
Wed Jan 24 04:09:14 EST 2007
Karl,
I've been writing particle data from vtk using HDF5 in parallel and I
would highly recommend HDF5 as a starting point for meshes. Here are two
reasons (but keep reading for objections later)
1) HDF5 is going to be supported during our lifetimes and probably
longer than that. The fact that the creators supply an API which enables
anyone to easily query and extract the data makes it a winner (much like
netCDF in the past)
2) Writing in parallel can produce a single HDF5 file (if you want) and
not a collection of files which are indexed by processor - This makes
the visualization of data afterwards an order of magnitude easier if the
number of processors is different for visualization than it was for
generation. (vtk's XML collection format does allow you to do this too -
at the expense of sometimes some wasted memory as points or cells might
be duplicated).
The vtk XML based multiblock/multifile collection data is very good -
and there's a lot to be said for using it. You can get a program up and
running in a day which will write N files out from N processes and a
single vtm file from process zero which references all the blocks.
ParaView will then be able to read the data in directly and the amount
of work by you will be minimal.
However if you are going to do the job properly you'll want to worry
about points being duplicated between processes and having a single
master index of points,cells and the rest. What tends to happen with the
vtk XML collection files is that unless you are careful, points might be
present in multiple blocks and be written out from multiple machines
causing duplication. ParaView itself is pretty bad in this respect when
you extract datasets from multiblock data and writes them out - you end
up with all the points from the entire collection of datasets being
written when in fact only a small subset of them are used by the block
you are writing.
If you have distinct meshes/blocks which are present on each processor
and they are completely self contained, don't overlap significantly with
the meshes on neighbouring processors then you will be fine just writing
an xml vtu file from each processor and then use a piece of code like
this (polydata example) from process zero. It simply creates (one for
each timestep) a vtm (multiblock) collection text file (xml) which lists
the blocks from each other processor. this can then be loaded into
paraview and all the blocks will be picked up correctly. For
unstructured data, simply use a vtkXMLUnstructuredGridWriter (or
whatever it is called) instead of a vtkXMLPolyDataWriter on each node
and change the *.vtp below into *.vtu.
sprintf(name, "/scratch/biddisco/results/b%02i/Particles.%04i.vtp",
this->UpdatePiece, this->ActualTimeStep);
if (this->UpdatePiece==0) {
sprintf(name2, "/scratch/biddisco/results/Particles.%04i.vtm",
this->ActualTimeStep);
vtkstd::ofstream mainfile(name2);
mainfile << "<?xml version=\"1.0\"?>" << vtkstd::endl;
mainfile << "<VTKFile type=\"vtkMultiGroupDataSet\"
version=\"0.1\" byte_order=\"BigEndian\"
compressor=\"vtkZLibDataCompressor\"> " << vtkstd::endl;
mainfile << " <vtkMultiGroupDataSet> " << vtkstd::endl;
for (int p=0; p<this->UpdateNumPieces; p++) {
sprintf(name3, "b%02i/Particles.%04i.vtp", p, this->ActualTimeStep);
mainfile << " <DataSet group=\"0\" dataset=\"" << p << "\"
file=\"" << name3 << "\"/> " << vtkstd::endl;
}
mainfile << " </vtkMultiGroupDataSet> " << vtkstd::endl;
mainfile << "</VTKFile> " << vtkstd::endl;
}
If all your processing is to be done using ParaView or vtk-like
applications, then this procedure will get you a long way and you'll
have it working before tonight. If you need more advanced processing in
other applications, then HDF5 might be the choice. For any VTK data type
I would do this
Create one dataset for points, one for Cells holding the points Ids, one
for the celltype, etc etc. If the meshes are all distinct and not
overlapping/duplicating, it will be enough to just gather the indexes
from each process and create an HDF5 memory space on each node and stuff
the data into the file (I'm assuming you are already familiar with HDF5
memory and data spaces etc). This would be analogous to the case above
where the simulation domain is broken into distinct zones or blocks and
each can be processed individually. You may end up with some boundary
cells duplicated, but you could easily work it to all go nicely into a
single hdf5 file.
If the meshes were more complex in the way they overlap/interact then we
may have trouble - because then you'll want to gather all the points to
one process, do some sort of MergePoints operation to remove duplicates,
then cross reference all the cell point indexes etc etc - this could
take a lot of memory and time and not be worth the effort (the KdTree
probably does much of this already - as a by the way).
I started this email with the intention of recommending HDF5 - but I'm
ending it with vtk's XML collection as my preferred choice as it will be
less work for complex stuff - and the readers and writers already exist.
If your data is already in vtk form, then the xml will save you a lot of
work.
I welcome input from anyone else considering putting meshes into hdf5
JB
--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/about/BJohn.php
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82
More information about the vtkusers
mailing list