[Paraview] Need advice with parallel file format

Mohammad Mirzadeh mirzadeh at gmail.com
Fri May 2 22:34:49 EDT 2014


I don't have any specific plan for that but the rationale for using HDF5
for restart is twofold:

1) The restart file could be read later by a different set of processors
and preferably include useful meta information about the run (date/time,
SHA1 of git commit, run parameters, etc)
2) The restart file is assumed to be written less frequently compared to
vis and thus performance loss should not be a big issue (hopefully)

Also, parallel loading of vis file is necessary as ParaView seems to
default to loading everything on rank 0 which would severely limit the size
of vis file (I'd be happy to be proven wrong on this one). All that said,
I'm willing to move away from HDF5 if that proves to be too costly for
restart files as well. It just seems to me, after two days of searching
online, that working with parallel HDF5 (and MPI-IO in general) is tricky
and subject to performance loss and large number of processors. (Again I'd
be happy to learn from others' experience here)


On Fri, May 2, 2014 at 7:12 PM, Moreland, Kenneth <kmorel at sandia.gov> wrote:

> What are you doing for your restart files? You said those are HDF5 and
> they must be at least as large as anything you output for Vis. Presumably
> you got that working pretty well (or are committed to getting it to work
> well). Why not write the Vis output similarly?
>
> -Ken
>
> Sent from my iPad so blame autocorrect.
>
> > On May 2, 2014, at 6:50 PM, "Mohammad Mirzadeh" <mirzadeh at gmail.com>
> wrote:
> >
> > Hi I am at a critical point in deciding I/O format for my application.
> So far my conclusion is to use parallel HDF5 for restart files as they are
> quite flexible and portable across systems.
> >
> > When it comes to visualization, however, i'm not quite sure. Up until
> now I've been using pvtu along with vtu files and although they generally
> work fine, one easily gets in trouble when running big simulations on large
> number of processors as the number of files can easily get out of control
> and even simplest utility commands (e.g. ls) takes minutes to finish!
> >
> > After many thinking I've come to a point to decide between two
> strategies:
> >
> > 1) Go with a single parallel HDF5 file that includes data for all
> time-steps. This makes it all nice and portable except there are two
> issues. i) It looks like doing MPI-IO might not be as efficient as separate
> POSIX IO, especially on large number of processors. ii) ParaView does not
> seem to be able to read HDF5 files in parallel
> >
> > 2) Go with the same pvtu+vtu strategy except take precautions to avoid
> file explosions. I can think of two strategies here: i) use nested folders
> to separate vtu files from pvtu and also each time step ii) create an IO
> group communicator with much less processors that do the actual IO.
> >
> > My questions are 1) Is the second approach necessarily more efficient
> than MPI-IO used in HDF5? and 2) Is there any plan to support parallel IO
> for HDF5 files in paraview?
> > _______________________________________________
> > Powered by www.kitware.com
> >
> > Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
> >
> > Please keep messages on-topic and check the ParaView Wiki at:
> http://paraview.org/Wiki/ParaView
> >
> > Follow this link to subscribe/unsubscribe:
> > http://www.paraview.org/mailman/listinfo/paraview
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140502/d5df3857/attachment.html>


More information about the ParaView mailing list