[Paraview] [EXTERNAL] Re: Need advice with parallel file format

Mohammad Mirzadeh mirzadeh at gmail.com
Mon May 5 13:50:40 EDT 2014


Well the issue with HDF5 as a vis format is paraview does not seem to be
able to load it in parallel. Instead, it seems that paraview loads the
whole file on rank 0 and then tries to broadcast to other processors. This
would significantly limit the size of vis file ...


On Sat, May 3, 2014 at 7:06 AM, Moreland, Kenneth <kmorel at sandia.gov> wrote:

>  Reason 1 applies just as much to Vis as restart. More so as you usually
> do Vis on a different number of processors than the sim.
>
>  You may want to rethink reason 2. It may not seem like much now, but
> restarts can take a significant proportion of the run time. And ironically
> the longer they take, the more you have to run them (there is a whole
> theory behind that).
>
>  At any rate, lots do smart people have worked (and continue to work) on
> making io libraries like HDF5 fast. In general, I would expect HDF5 to be
> much faster than the VTK formats.
>
>
> -Ken
>
>  Sent from my iPad so blame autocorrect.
>
> On May 2, 2014, at 10:35 PM, "Mohammad Mirzadeh" <mirzadeh at gmail.com>
> wrote:
>
>   I don't have any specific plan for that but the rationale for using
> HDF5 for restart is twofold:
>
>  1) The restart file could be read later by a different set of processors
> and preferably include useful meta information about the run (date/time,
> SHA1 of git commit, run parameters, etc)
> 2) The restart file is assumed to be written less frequently compared to
> vis and thus performance loss should not be a big issue (hopefully)
>
>  Also, parallel loading of vis file is necessary as ParaView seems to
> default to loading everything on rank 0 which would severely limit the size
> of vis file (I'd be happy to be proven wrong on this one). All that said,
> I'm willing to move away from HDF5 if that proves to be too costly for
> restart files as well. It just seems to me, after two days of searching
> online, that working with parallel HDF5 (and MPI-IO in general) is tricky
> and subject to performance loss and large number of processors. (Again I'd
> be happy to learn from others' experience here)
>
>
> On Fri, May 2, 2014 at 7:12 PM, Moreland, Kenneth <kmorel at sandia.gov>wrote:
>
>> What are you doing for your restart files? You said those are HDF5 and
>> they must be at least as large as anything you output for Vis. Presumably
>> you got that working pretty well (or are committed to getting it to work
>> well). Why not write the Vis output similarly?
>>
>> -Ken
>>
>> Sent from my iPad so blame autocorrect.
>>
>> > On May 2, 2014, at 6:50 PM, "Mohammad Mirzadeh" <mirzadeh at gmail.com>
>> wrote:
>> >
>> > Hi I am at a critical point in deciding I/O format for my application.
>> So far my conclusion is to use parallel HDF5 for restart files as they are
>> quite flexible and portable across systems.
>> >
>> > When it comes to visualization, however, i'm not quite sure. Up until
>> now I've been using pvtu along with vtu files and although they generally
>> work fine, one easily gets in trouble when running big simulations on large
>> number of processors as the number of files can easily get out of control
>> and even simplest utility commands (e.g. ls) takes minutes to finish!
>> >
>> > After many thinking I've come to a point to decide between two
>> strategies:
>> >
>> > 1) Go with a single parallel HDF5 file that includes data for all
>> time-steps. This makes it all nice and portable except there are two
>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>> POSIX IO, especially on large number of processors. ii) ParaView does not
>> seem to be able to read HDF5 files in parallel
>> >
>> > 2) Go with the same pvtu+vtu strategy except take precautions to avoid
>> file explosions. I can think of two strategies here: i) use nested folders
>> to separate vtu files from pvtu and also each time step ii) create an IO
>> group communicator with much less processors that do the actual IO.
>> >
>> > My questions are 1) Is the second approach necessarily more efficient
>> than MPI-IO used in HDF5? and 2) Is there any plan to support parallel IO
>> for HDF5 files in paraview?
>>  > _______________________________________________
>> > Powered by www.kitware.com
>> >
>> > Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>> >
>> > Please keep messages on-topic and check the ParaView Wiki at:
>> http://paraview.org/Wiki/ParaView
>> >
>> > Follow this link to subscribe/unsubscribe:
>> > http://www.paraview.org/mailman/listinfo/paraview
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/ccd10c71/attachment-0001.html>


More information about the ParaView mailing list