[Paraview] Need advice with parallel file format

Burlen Loring bloring at lbl.gov
Mon May 5 14:46:13 EDT 2014


Ahh, right, I assumed you were using VTK's classes to write the data, 
although if you're writing them yourself, you'll still want to emulate 
the "unencoded appended" format to get the best performance. see 
SetModeTo* and EncodeAppendedData* funcs 
(http://www.vtk.org/doc/nightly/html/classvtkXMLWriter.html)

There's also some essential info in the file format doc( 
http://www.vtk.org/VTK/img/file-formats.pdf). Search for "Appended".

One way to get a handle on what's happening with the various modes and 
options is to examine the files you can produce in PV. For example open 
up PV, create a sphere source(or if you prefer some unstructured data), 
under file menu save the data and chose one of the pvt* options Compare 
the files produced for binary and appended modes etc...

On 05/05/2014 11:34 AM, Mohammad Mirzadeh wrote:
> Burlen,
>
> Thanks a lot for your comments.
>
>     This is not an answer to your question but there is a usage caveat
>     w/ VTK XML files that I want to make sure you're aware of. When
>     you use that format make sure you set mode to "appended" and
>     "encode" off. This is the combination to produce binary files
>     which are going to be faster and very likely smaller too. You
>     probably already know that, but just in case ...
>
> I write the data itself as binary inside the .vtu file itself. Is this 
> what you mean by appended mode? I cannot see any 'mode' keyword the 
> xml file. Same as for encode; I don't have it in the xml file.
>
>     now to get to your question:
>
>>     1) Go with a single parallel HDF5 file that includes data for all
>>     time-steps. This makes it all nice and portable except there are
>>     two issues. i) It looks like doing MPI-IO might not be as
>>     efficient as separate POSIX IO, especially on large number of
>>     processors. ii) ParaView does not seem to be able to read HDF5
>>     files in parallel
>     comment: If I were you I'd avoid putting all time steps in a
>     single file, or any solution where files get too big. Once files
>     occupy more than ~80% of a tape drive you'll have very hard time
>     getting them on and off archive systems. see this:
>     http://www.nersc.gov/users/data-and-file-systems/hpss/storing-and-retrieving-data/mistakes-to-avoid/
>     My comment assumes that you actually use such systems. But you
>     probably will need to if you generate large datasets at common HPC
>     centers.
>
>
> That's actually a very good point I was not thinking of! Thanks for 
> sharing.
>
>     I've seen some AMR codes get elaborate in their HDF5 formats and
>     run into serious performance issues as a result. So my comment
>     here is that if you go with HDF5, keep the format as simple as
>     possible! and of course file sizes small enough to be archived ;-)
>
>     Burlen
>
>
>
>     On 05/05/2014 10:48 AM, Mohammad Mirzadeh wrote:
>>     They are represented as unstructured grid. As a sample run, a
>>     100M grid point on 256 proc produces almost 8.5G file. We intent
>>     to push the limits close to 1B at most at this time with #
>>     processors up to a few thousands. However, it would be good to
>>     have something that could scale to larger problems as well
>>
>>
>>     On Sat, May 3, 2014 at 1:28 AM, Stephen Wornom
>>     <stephen.wornom at inria.fr <mailto:stephen.wornom at inria.fr>> wrote:
>>
>>         Mohammad Mirzadeh wrote:
>>
>>             Hi I am at a critical point in deciding I/O format for my
>>             application. So far my conclusion is to use parallel HDF5
>>             for restart files as they are quite flexible and portable
>>             across systems.
>>
>>             When it comes to visualization, however, i'm not quite
>>             sure. Up until now I've been using pvtu along with vtu
>>             files and although they generally work fine, one easily
>>             gets in trouble when running big simulations on large
>>             number of processors as the number of files can easily
>>             get out of control and even simplest utility commands
>>             (e.g. ls) takes minutes to finish!
>>
>>             After many thinking I've come to a point to decide
>>             between two strategies:
>>
>>             1) Go with a single parallel HDF5 file that includes data
>>             for all time-steps. This makes it all nice and portable
>>             except there are two issues. i) It looks like doing
>>             MPI-IO might not be as efficient as separate POSIX IO,
>>             especially on large number of processors. ii) ParaView
>>             does not seem to be able to read HDF5 files in parallel
>>
>>             2) Go with the same pvtu+vtu strategy except take
>>             precautions to avoid file explosions. I can think of two
>>             strategies here: i) use nested folders to separate vtu
>>             files from pvtu and also each time step ii) create an IO
>>             group communicator with much less processors that do the
>>             actual IO.
>>
>>             My questions are 1) Is the second approach necessarily
>>             more efficient than MPI-IO used in HDF5? and 2) Is there
>>             any plan to support parallel IO for HDF5 files in paraview?
>>
>>
>>             _______________________________________________
>>             Powered by www.kitware.com <http://www.kitware.com>
>>
>>             Visit other Kitware open-source projects at
>>             http://www.kitware.com/opensource/opensource.html
>>
>>             Please keep messages on-topic and check the ParaView Wiki
>>             at: http://paraview.org/Wiki/ParaView
>>
>>             Follow this link to subscribe/unsubscribe:
>>             http://www.paraview.org/mailman/listinfo/paraview
>>
>>         Are your meshes structured or unstructured? How many vertices
>>         in your meshes?
>>
>>         Stephen
>>
>>         -- 
>>         stephen.wornom at inria.fr <mailto:stephen.wornom at inria.fr>
>>         2004 route des lucioles - BP93
>>         Sophia Antipolis
>>         06902 CEDEX
>>
>>         Tel: 04 92 38 50 54
>>         Fax: 04 97 15 53 51
>>
>>
>>
>>
>>     _______________________________________________
>>     Powered bywww.kitware.com  <http://www.kitware.com>
>>
>>     Visit other Kitware open-source projects athttp://www.kitware.com/opensource/opensource.html
>>
>>     Please keep messages on-topic and check the ParaView Wiki at:http://paraview.org/Wiki/ParaView
>>
>>     Follow this link to subscribe/unsubscribe:
>>     http://www.paraview.org/mailman/listinfo/paraview
>
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/c8f9691f/attachment-0001.html>


More information about the ParaView mailing list