[Paraview] ParaView file formats

Dominik Szczerba domi at vision.ee.ethz.ch
Wed May 21 07:56:36 EDT 2008


My 2 cents to the discussion how to store large (CFD) data.
I know nothing of the Ensight format so I comment on HDF5 and Xdmf.

The idea behind Xdmf is great, but the actual implementation is very 
poor. Especially to newcomers, I suggest to use self made HDF5 files and 
implement a custom reader for VTK/PV instead of using Xdmf. The very 
annoying reason is that Xdmf is only for the C world and if you generate 
some of your files using Matlab or FORTRAN you will run into serious 
compatibility problems. Reading/writing Xdmf from under Fortran is no 
fun, to say the least, while HDF5 is easy. For Matlab Xdmf is easy, only 
very inefficient (matrix transposing). An elegant solution would be to 
add support for column-major ordering of matrices in Xdmf *AS WELL AS* 
efficiently implementing the required conversions. Currently it is not 
the case, so you are safer doing things yourself to avoid storage lock-in.

regards,
Dominik


Jean Favre wrote:
> The thread initiated yesterday on file formats gives me an opportunity 
> to share some results with the list.
> 
> on the question of which file format to use to store large CFD 
> simulation results, I ran the following tests with EnSight Gold, the 
> VTK-XML formats, and Xdmf. Here are my findings and my final 
> implementation to store a time-varying rectilinear grid of over 1 
> billion cell.
> 
> A) The original format used was EnSight Gold Binary.
> It worked well for small data; it does not work at all for data above 
> the 1 billion threshold.
> 
> I had many troubles to read the file because:
> 
> each scalar field is stored in a file larger than 4Gb
> the FileSize variable is stored in VTK as a 32-bit int. See 
> http://www.vtk.org/Bug/view.php?id=4687. Even after changing it to long 
> integers, I had problems later running out of memory (VTK does not 
> check) in the following lines of code:
> 
>      if (component == 0)
>        {
>        scalars = vtkFloatArray::New();
>        scalars->SetNumberOfComponents(numberOfComponents);
>        scalars->SetNumberOfTuples(numPts);
>        }
>        scalarsRead = new float[numPts];
>      this->ReadFloatArray(scalarsRead, numPts);
> 
> // bad memory allocations go unchecked and the loop below crashes
> 
>      for (i = 0; i < numPts; i++)
>        {
>        scalars->SetComponent(i, component, scalarsRead[i]);
>        }
> 
> I'll add also that reading EnSight data in parallel actually duplicates 
> the whole mesh in memory before each pvserver can extract its own piece 
> (AFAIK). This also lead to many crashes because of running out of memory.
> 
> B) After removing the loop above, by-passing the second array 
> allocation, and doing more cleanup, I am able to read the data. I tried 
> to save it in the VTK-XML format. This does not work either because it 
> uses some 32-bit counters which go into overflow. This was also reported 
> by Brad, see http://www.vtk.org/Bug/view.php?id=6938
> 
> C) Second option was to save the data with the vtkXdmfWriter. It works 
> fine, but takes 30 minutes per timestep for a billion cell grid.
> 
> D) I took over the writing of my own HDF5 data and re-used the Xdmf 
> wrapper. Writing takes now 30 seconds for two scalar fields. Each file 
> is 9.4 Gb.
> 
> E) My final solution now includes the time-dependent support recently 
> added by J. Clarke, J Biddiscombe et al. in the Xdmf stuff. I write my 
> own HDF5 datasets, and I am happilly reading the solution in parallel on 
> 24 pvservers. Reading (using the Xdmf reader) does not show the same 
> overhead than the writing part. However, we think that the vtk-Xdmf 
> writing is slow because data is copied from the vtk array into an Xdmf 
> array, one value at a time - if this restriction were removed (and the 
> Xdmf writer used the vtk pointer directly), it would probably be as good 
> as hand written hdf5 code.
> 
> All of my tests were done with the CVS version of ParaView3.
> 
> Jean
> Swiss National Supercomputing Center
> 
> 
> _______________________________________________
> ParaView mailing list
> ParaView at paraview.org
> http://www.paraview.org/mailman/listinfo/paraview

-- 
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi


More information about the ParaView mailing list