[Paraview] Need advice with parallel file format

Mohammad Mirzadeh mirzadeh at gmail.com
Mon May 5 17:41:15 EDT 2014


Thanks for the reference. What exactly is the benefit of having the data in
appended mode? I guess right now I'm using  binary mode


On Mon, May 5, 2014 at 11:46 AM, Burlen Loring <bloring at lbl.gov> wrote:

>  Ahh, right, I assumed you were using VTK's classes to write the data,
> although if you're writing them yourself, you'll still want to emulate the
> "unencoded appended" format to get the best performance. see SetModeTo* and
> EncodeAppendedData* funcs (
> http://www.vtk.org/doc/nightly/html/classvtkXMLWriter.html)
>
> There's also some essential info in the file format doc(
> http://www.vtk.org/VTK/img/file-formats.pdf). Search for "Appended".
>
> One way to get a handle on what's happening with the various modes and
> options is to examine the files you can produce in PV. For example open up
> PV, create a sphere source(or if you prefer some unstructured data), under
> file menu save the data and chose one of the pvt* options Compare the files
> produced for binary and appended modes etc...
>
>
> On 05/05/2014 11:34 AM, Mohammad Mirzadeh wrote:
>
>   Burlen,
>
>  Thanks a lot for your comments.
>
>
>>  This is not an answer to your question but there is a usage caveat w/
>> VTK XML files that I want to make sure you're aware of.  When you use that
>> format make sure you set mode to "appended" and "encode" off. This is the
>> combination to produce binary files which are going to be faster and very
>> likely smaller too. You probably already know that, but just in case ...
>>
>>
> I write the data itself as binary inside the .vtu file itself. Is this
> what you mean by appended mode? I cannot see any 'mode' keyword the xml
> file. Same as for encode; I don't have it in the xml file.
>
>   now to get to your question:
>>
>> 1) Go with a single parallel HDF5 file that includes data for all
>> time-steps. This makes it all nice and portable except there are two
>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>> POSIX IO, especially on large number of processors. ii) ParaView does not
>> seem to be able to read HDF5 files in parallel
>>
>>  comment: If I were you I'd avoid putting all time steps in a single
>> file, or any solution where files get too big. Once files occupy more than
>> ~80% of a tape drive you'll have very hard time getting them on and off
>> archive systems. see this:
>> http://www.nersc.gov/users/data-and-file-systems/hpss/storing-and-retrieving-data/mistakes-to-avoid/My comment assumes that you actually use such systems. But you probably
>> will need to if you generate large datasets at common HPC centers.
>>
>>
>  That's actually a very good point I was not thinking of! Thanks for
> sharing.
>
>
>>  I've seen some AMR codes get elaborate in their HDF5 formats and run
>> into serious performance issues as a result. So my comment here is that if
>> you go with HDF5, keep the format as simple as possible! and of course file
>> sizes small enough to be archived ;-)
>>
>> Burlen
>>
>>
>>
>> On 05/05/2014 10:48 AM, Mohammad Mirzadeh wrote:
>>
>> They are represented as unstructured grid. As a sample run, a 100M grid
>> point on 256 proc produces almost 8.5G file. We intent to push the limits
>> close to 1B at most at this time with # processors up to a few thousands.
>> However, it would be good to have something that could scale to larger
>> problems as well
>>
>>
>> On Sat, May 3, 2014 at 1:28 AM, Stephen Wornom <stephen.wornom at inria.fr>wrote:
>>
>>> Mohammad Mirzadeh wrote:
>>>
>>>>  Hi I am at a critical point in deciding I/O format for my
>>>> application. So far my conclusion is to use parallel HDF5 for restart files
>>>> as they are quite flexible and portable across systems.
>>>>
>>>> When it comes to visualization, however, i'm not quite sure. Up until
>>>> now I've been using pvtu along with vtu files and although they generally
>>>> work fine, one easily gets in trouble when running big simulations on large
>>>> number of processors as the number of files can easily get out of control
>>>> and even simplest utility commands (e.g. ls) takes minutes to finish!
>>>>
>>>> After many thinking I've come to a point to decide between two
>>>> strategies:
>>>>
>>>> 1) Go with a single parallel HDF5 file that includes data for all
>>>> time-steps. This makes it all nice and portable except there are two
>>>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>>>> POSIX IO, especially on large number of processors. ii) ParaView does not
>>>> seem to be able to read HDF5 files in parallel
>>>>
>>>> 2) Go with the same pvtu+vtu strategy except take precautions to avoid
>>>> file explosions. I can think of two strategies here: i) use nested folders
>>>> to separate vtu files from pvtu and also each time step ii) create an IO
>>>> group communicator with much less processors that do the actual IO.
>>>>
>>>> My questions are 1) Is the second approach necessarily more efficient
>>>> than MPI-IO used in HDF5? and 2) Is there any plan to support parallel IO
>>>> for HDF5 files in paraview?
>>>>
>>>>
>>>>   _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>> http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>
>>> Are your meshes structured or unstructured? How many vertices in your
>>> meshes?
>>>
>>> Stephen
>>>
>>> --
>>> stephen.wornom at inria.fr
>>> 2004 route des lucioles - BP93
>>> Sophia Antipolis
>>> 06902 CEDEX
>>>
>>> Tel: 04 92 38 50 54
>>> Fax: 04 97 15 53 51
>>>
>>>
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>>
>>
>>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/fde95e95/attachment.html>


More information about the ParaView mailing list