[Paraview] ParaView file formats
Jean Favre
jfavre at cscs.ch
Wed May 21 07:31:13 EDT 2008
The thread initiated yesterday on file formats gives me an opportunity
to share some results with the list.
on the question of which file format to use to store large CFD
simulation results, I ran the following tests with EnSight Gold, the
VTK-XML formats, and Xdmf. Here are my findings and my final
implementation to store a time-varying rectilinear grid of over 1
billion cell.
A) The original format used was EnSight Gold Binary.
It worked well for small data; it does not work at all for data above
the 1 billion threshold.
I had many troubles to read the file because:
each scalar field is stored in a file larger than 4Gb
the FileSize variable is stored in VTK as a 32-bit int. See
http://www.vtk.org/Bug/view.php?id=4687. Even after changing it to long
integers, I had problems later running out of memory (VTK does not
check) in the following lines of code:
if (component == 0)
{
scalars = vtkFloatArray::New();
scalars->SetNumberOfComponents(numberOfComponents);
scalars->SetNumberOfTuples(numPts);
}
scalarsRead = new float[numPts];
this->ReadFloatArray(scalarsRead, numPts);
// bad memory allocations go unchecked and the loop below crashes
for (i = 0; i < numPts; i++)
{
scalars->SetComponent(i, component, scalarsRead[i]);
}
I'll add also that reading EnSight data in parallel actually duplicates
the whole mesh in memory before each pvserver can extract its own piece
(AFAIK). This also lead to many crashes because of running out of memory.
B) After removing the loop above, by-passing the second array
allocation, and doing more cleanup, I am able to read the data. I tried
to save it in the VTK-XML format. This does not work either because it
uses some 32-bit counters which go into overflow. This was also reported
by Brad, see http://www.vtk.org/Bug/view.php?id=6938
C) Second option was to save the data with the vtkXdmfWriter. It works
fine, but takes 30 minutes per timestep for a billion cell grid.
D) I took over the writing of my own HDF5 data and re-used the Xdmf
wrapper. Writing takes now 30 seconds for two scalar fields. Each file
is 9.4 Gb.
E) My final solution now includes the time-dependent support recently
added by J. Clarke, J Biddiscombe et al. in the Xdmf stuff. I write my
own HDF5 datasets, and I am happilly reading the solution in parallel on
24 pvservers. Reading (using the Xdmf reader) does not show the same
overhead than the writing part. However, we think that the vtk-Xdmf
writing is slow because data is copied from the vtk array into an Xdmf
array, one value at a time - if this restriction were removed (and the
Xdmf writer used the vtk pointer directly), it would probably be as good
as hand written hdf5 code.
All of my tests were done with the CVS version of ParaView3.
Jean
Swiss National Supercomputing Center
More information about the ParaView
mailing list