[Paraview] ParaView file formats

Wed May 21 07:31:13 EDT 2008

The thread initiated yesterday on file formats gives me an opportunity 
to share some results with the list.

on the question of which file format to use to store large CFD 
simulation results, I ran the following tests with EnSight Gold, the 
VTK-XML formats, and Xdmf. Here are my findings and my final 
implementation to store a time-varying rectilinear grid of over 1 
billion cell.

A) The original format used was EnSight Gold Binary.
It worked well for small data; it does not work at all for data above 
the 1 billion threshold.

I had many troubles to read the file because:

each scalar field is stored in a file larger than 4Gb
the FileSize variable is stored in VTK as a 32-bit int. See 
http://www.vtk.org/Bug/view.php?id=4687. Even after changing it to long 
integers, I had problems later running out of memory (VTK does not 
check) in the following lines of code:

      if (component == 0)
        {
        scalars = vtkFloatArray::New();
        scalars->SetNumberOfComponents(numberOfComponents);
        scalars->SetNumberOfTuples(numPts);
        }

      scalarsRead = new float[numPts];
      this->ReadFloatArray(scalarsRead, numPts);

// bad memory allocations go unchecked and the loop below crashes

      for (i = 0; i < numPts; i++)
        {
        scalars->SetComponent(i, component, scalarsRead[i]);
        }

I'll add also that reading EnSight data in parallel actually duplicates 
the whole mesh in memory before each pvserver can extract its own piece 
(AFAIK). This also lead to many crashes because of running out of memory.

B) After removing the loop above, by-passing the second array 
allocation, and doing more cleanup, I am able to read the data. I tried 
to save it in the VTK-XML format. This does not work either because it 
uses some 32-bit counters which go into overflow. This was also reported 
by Brad, see http://www.vtk.org/Bug/view.php?id=6938

C) Second option was to save the data with the vtkXdmfWriter. It works 
fine, but takes 30 minutes per timestep for a billion cell grid.

D) I took over the writing of my own HDF5 data and re-used the Xdmf 
wrapper. Writing takes now 30 seconds for two scalar fields. Each file 
is 9.4 Gb.

E) My final solution now includes the time-dependent support recently 
added by J. Clarke, J Biddiscombe et al. in the Xdmf stuff. I write my 
own HDF5 datasets, and I am happilly reading the solution in parallel on 
24 pvservers. Reading (using the Xdmf reader) does not show the same 
overhead than the writing part. However, we think that the vtk-Xdmf 
writing is slow because data is copied from the vtk array into an Xdmf 
array, one value at a time - if this restriction were removed (and the 
Xdmf writer used the vtk pointer directly), it would probably be as good 
as hand written hdf5 code.

All of my tests were done with the CVS version of ParaView3.

Jean
Swiss National Supercomputing Center