[Paraview] Need advice with parallel file format

Mon May 5 18:04:03 EDT 2014

If you care at all about performance you're going to want to switch to 
appended unencoded. This is the fastest most efficient mode and 
generally gives smaller files than the others.

On 05/05/2014 02:41 PM, Mohammad Mirzadeh wrote:
> Thanks for the reference. What exactly is the benefit of having the 
> data in appended mode? I guess right now I'm using  binary mode
>
>
> On Mon, May 5, 2014 at 11:46 AM, Burlen Loring <bloring at lbl.gov 
> <mailto:bloring at lbl.gov>> wrote:
>
>     Ahh, right, I assumed you were using VTK's classes to write the
>     data, although if you're writing them yourself, you'll still want
>     to emulate the "unencoded appended" format to get the best
>     performance. see SetModeTo* and EncodeAppendedData* funcs
>     (http://www.vtk.org/doc/nightly/html/classvtkXMLWriter.html)
>
>     There's also some essential info in the file format doc(
>     http://www.vtk.org/VTK/img/file-formats.pdf). Search for "Appended".
>
>     One way to get a handle on what's happening with the various modes
>     and options is to examine the files you can produce in PV. For
>     example open up PV, create a sphere source(or if you prefer some
>     unstructured data), under file menu save the data and chose one of
>     the pvt* options Compare the files produced for binary and
>     appended modes etc...
>
>
>     On 05/05/2014 11:34 AM, Mohammad Mirzadeh wrote:
>>     Burlen,
>>
>>     Thanks a lot for your comments.
>>
>>         This is not an answer to your question but there is a usage
>>         caveat w/ VTK XML files that I want to make sure you're aware
>>         of. When you use that format make sure you set mode to
>>         "appended" and "encode" off. This is the combination to
>>         produce binary files which are going to be faster and very
>>         likely smaller too. You probably already know that, but just
>>         in case ...
>>
>>     I write the data itself as binary inside the .vtu file itself. Is
>>     this what you mean by appended mode? I cannot see any 'mode'
>>     keyword the xml file. Same as for encode; I don't have it in the
>>     xml file.
>>
>>         now to get to your question:
>>
>>>         1) Go with a single parallel HDF5 file that includes data
>>>         for all time-steps. This makes it all nice and portable
>>>         except there are two issues. i) It looks like doing MPI-IO
>>>         might not be as efficient as separate POSIX IO, especially
>>>         on large number of processors. ii) ParaView does not seem to
>>>         be able to read HDF5 files in parallel
>>         comment: If I were you I'd avoid putting all time steps in a
>>         single file, or any solution where files get too big. Once
>>         files occupy more than ~80% of a tape drive you'll have very
>>         hard time getting them on and off archive systems. see this:
>>         http://www.nersc.gov/users/data-and-file-systems/hpss/storing-and-retrieving-data/mistakes-to-avoid/
>>         My comment assumes that you actually use such systems. But
>>         you probably will need to if you generate large datasets at
>>         common HPC centers.
>>
>>
>>     That's actually a very good point I was not thinking of! Thanks
>>     for sharing.
>>
>>         I've seen some AMR codes get elaborate in their HDF5 formats
>>         and run into serious performance issues as a result. So my
>>         comment here is that if you go with HDF5, keep the format as
>>         simple as possible! and of course file sizes small enough to
>>         be archived ;-)
>>
>>         Burlen
>>
>>
>>
>>         On 05/05/2014 10:48 AM, Mohammad Mirzadeh wrote:
>>>         They are represented as unstructured grid. As a sample run,
>>>         a 100M grid point on 256 proc produces almost 8.5G file. We
>>>         intent to push the limits close to 1B at most at this time
>>>         with # processors up to a few thousands. However, it would
>>>         be good to have something that could scale to larger
>>>         problems as well
>>>
>>>
>>>         On Sat, May 3, 2014 at 1:28 AM, Stephen Wornom
>>>         <stephen.wornom at inria.fr <mailto:stephen.wornom at inria.fr>>
>>>         wrote:
>>>
>>>             Mohammad Mirzadeh wrote:
>>>
>>>                 Hi I am at a critical point in deciding I/O format
>>>                 for my application. So far my conclusion is to use
>>>                 parallel HDF5 for restart files as they are quite
>>>                 flexible and portable across systems.
>>>
>>>                 When it comes to visualization, however, i'm not
>>>                 quite sure. Up until now I've been using pvtu along
>>>                 with vtu files and although they generally work
>>>                 fine, one easily gets in trouble when running big
>>>                 simulations on large number of processors as the
>>>                 number of files can easily get out of control and
>>>                 even simplest utility commands (e.g. ls) takes
>>>                 minutes to finish!
>>>
>>>                 After many thinking I've come to a point to decide
>>>                 between two strategies:
>>>
>>>                 1) Go with a single parallel HDF5 file that includes
>>>                 data for all time-steps. This makes it all nice and
>>>                 portable except there are two issues. i) It looks
>>>                 like doing MPI-IO might not be as efficient as
>>>                 separate POSIX IO, especially on large number of
>>>                 processors. ii) ParaView does not seem to be able to
>>>                 read HDF5 files in parallel
>>>
>>>                 2) Go with the same pvtu+vtu strategy except take
>>>                 precautions to avoid file explosions. I can think of
>>>                 two strategies here: i) use nested folders to
>>>                 separate vtu files from pvtu and also each time step
>>>                 ii) create an IO group communicator with much less
>>>                 processors that do the actual IO.
>>>
>>>                 My questions are 1) Is the second approach
>>>                 necessarily more efficient than MPI-IO used in HDF5?
>>>                 and 2) Is there any plan to support parallel IO for
>>>                 HDF5 files in paraview?
>>>
>>>
>>>                 _______________________________________________
>>>                 Powered by www.kitware.com <http://www.kitware.com>
>>>
>>>                 Visit other Kitware open-source projects at
>>>                 http://www.kitware.com/opensource/opensource.html
>>>
>>>                 Please keep messages on-topic and check the ParaView
>>>                 Wiki at: http://paraview.org/Wiki/ParaView
>>>
>>>                 Follow this link to subscribe/unsubscribe:
>>>                 http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>             Are your meshes structured or unstructured? How many
>>>             vertices in your meshes?
>>>
>>>             Stephen
>>>
>>>             -- 
>>>             stephen.wornom at inria.fr <mailto:stephen.wornom at inria.fr>
>>>             2004 route des lucioles - BP93
>>>             Sophia Antipolis
>>>             06902 CEDEX
>>>
>>>             Tel: 04 92 38 50 54
>>>             Fax: 04 97 15 53 51
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Powered bywww.kitware.com  <http://www.kitware.com>
>>>
>>>         Visit other Kitware open-source projects athttp://www.kitware.com/opensource/opensource.html
>>>
>>>         Please keep messages on-topic and check the ParaView Wiki at:http://paraview.org/Wiki/ParaView
>>>
>>>         Follow this link to subscribe/unsubscribe:
>>>         http://www.paraview.org/mailman/listinfo/paraview
>>
>>
>>
>>
>>     _______________________________________________
>>     Powered bywww.kitware.com  <http://www.kitware.com>
>>
>>     Visit other Kitware open-source projects athttp://www.kitware.com/opensource/opensource.html
>>
>>     Please keep messages on-topic and check the ParaView Wiki at:http://paraview.org/Wiki/ParaView
>>
>>     Follow this link to subscribe/unsubscribe:
>>     http://www.paraview.org/mailman/listinfo/paraview
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/ffc1e1ae/attachment-0001.html>