[Paraview] Need advice with parallel file format

Mohammad Mirzadeh mirzadeh at gmail.com
Mon May 5 14:34:24 EDT 2014


Burlen,

Thanks a lot for your comments.


>  This is not an answer to your question but there is a usage caveat w/ VTK
> XML files that I want to make sure you're aware of.  When you use that
> format make sure you set mode to "appended" and "encode" off. This is the
> combination to produce binary files which are going to be faster and very
> likely smaller too. You probably already know that, but just in case ...
>
>
I write the data itself as binary inside the .vtu file itself. Is this what
you mean by appended mode? I cannot see any 'mode' keyword the xml file.
Same as for encode; I don't have it in the xml file.

now to get to your question:
>
> 1) Go with a single parallel HDF5 file that includes data for all
> time-steps. This makes it all nice and portable except there are two
> issues. i) It looks like doing MPI-IO might not be as efficient as separate
> POSIX IO, especially on large number of processors. ii) ParaView does not
> seem to be able to read HDF5 files in parallel
>
> comment: If I were you I'd avoid putting all time steps in a single file,
> or any solution where files get too big. Once files occupy more than ~80%
> of a tape drive you'll have very hard time getting them on and off archive
> systems. see this:
> http://www.nersc.gov/users/data-and-file-systems/hpss/storing-and-retrieving-data/mistakes-to-avoid/My comment assumes that you actually use such systems. But you probably
> will need to if you generate large datasets at common HPC centers.
>
>
That's actually a very good point I was not thinking of! Thanks for sharing.


>  I've seen some AMR codes get elaborate in their HDF5 formats and run into
> serious performance issues as a result. So my comment here is that if you
> go with HDF5, keep the format as simple as possible! and of course file
> sizes small enough to be archived ;-)
>
> Burlen
>
>
>
> On 05/05/2014 10:48 AM, Mohammad Mirzadeh wrote:
>
> They are represented as unstructured grid. As a sample run, a 100M grid
> point on 256 proc produces almost 8.5G file. We intent to push the limits
> close to 1B at most at this time with # processors up to a few thousands.
> However, it would be good to have something that could scale to larger
> problems as well
>
>
> On Sat, May 3, 2014 at 1:28 AM, Stephen Wornom <stephen.wornom at inria.fr>wrote:
>
>> Mohammad Mirzadeh wrote:
>>
>>>  Hi I am at a critical point in deciding I/O format for my application.
>>> So far my conclusion is to use parallel HDF5 for restart files as they are
>>> quite flexible and portable across systems.
>>>
>>> When it comes to visualization, however, i'm not quite sure. Up until
>>> now I've been using pvtu along with vtu files and although they generally
>>> work fine, one easily gets in trouble when running big simulations on large
>>> number of processors as the number of files can easily get out of control
>>> and even simplest utility commands (e.g. ls) takes minutes to finish!
>>>
>>> After many thinking I've come to a point to decide between two
>>> strategies:
>>>
>>> 1) Go with a single parallel HDF5 file that includes data for all
>>> time-steps. This makes it all nice and portable except there are two
>>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>>> POSIX IO, especially on large number of processors. ii) ParaView does not
>>> seem to be able to read HDF5 files in parallel
>>>
>>> 2) Go with the same pvtu+vtu strategy except take precautions to avoid
>>> file explosions. I can think of two strategies here: i) use nested folders
>>> to separate vtu files from pvtu and also each time step ii) create an IO
>>> group communicator with much less processors that do the actual IO.
>>>
>>> My questions are 1) Is the second approach necessarily more efficient
>>> than MPI-IO used in HDF5? and 2) Is there any plan to support parallel IO
>>> for HDF5 files in paraview?
>>>
>>>
>>>   _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at:
>>> http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.paraview.org/mailman/listinfo/paraview
>>>
>> Are your meshes structured or unstructured? How many vertices in your
>> meshes?
>>
>> Stephen
>>
>> --
>> stephen.wornom at inria.fr
>> 2004 route des lucioles - BP93
>> Sophia Antipolis
>> 06902 CEDEX
>>
>> Tel: 04 92 38 50 54
>> Fax: 04 97 15 53 51
>>
>>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/0d0782f1/attachment.html>


More information about the ParaView mailing list