[Paraview] Need advice with parallel file format
Mohammad Mirzadeh
mirzadeh at gmail.com
Mon May 5 18:07:27 EDT 2014
I see. Thanks for the info.
On Mon, May 5, 2014 at 3:04 PM, Burlen Loring <burlen.loring at gmail.com>wrote:
> If you care at all about performance you're going to want to switch to
> appended unencoded. This is the fastest most efficient mode and generally
> gives smaller files than the others.
>
>
> On 05/05/2014 02:41 PM, Mohammad Mirzadeh wrote:
>
> Thanks for the reference. What exactly is the benefit of having the data
> in appended mode? I guess right now I'm using binary mode
>
>
> On Mon, May 5, 2014 at 11:46 AM, Burlen Loring <bloring at lbl.gov> wrote:
>
>> Ahh, right, I assumed you were using VTK's classes to write the data,
>> although if you're writing them yourself, you'll still want to emulate the
>> "unencoded appended" format to get the best performance. see SetModeTo* and
>> EncodeAppendedData* funcs (
>> http://www.vtk.org/doc/nightly/html/classvtkXMLWriter.html)
>>
>> There's also some essential info in the file format doc(
>> http://www.vtk.org/VTK/img/file-formats.pdf). Search for "Appended".
>>
>> One way to get a handle on what's happening with the various modes and
>> options is to examine the files you can produce in PV. For example open up
>> PV, create a sphere source(or if you prefer some unstructured data), under
>> file menu save the data and chose one of the pvt* options Compare the files
>> produced for binary and appended modes etc...
>>
>>
>> On 05/05/2014 11:34 AM, Mohammad Mirzadeh wrote:
>>
>> Burlen,
>>
>> Thanks a lot for your comments.
>>
>>
>>> This is not an answer to your question but there is a usage caveat w/
>>> VTK XML files that I want to make sure you're aware of. When you use that
>>> format make sure you set mode to "appended" and "encode" off. This is the
>>> combination to produce binary files which are going to be faster and very
>>> likely smaller too. You probably already know that, but just in case ...
>>>
>>>
>> I write the data itself as binary inside the .vtu file itself. Is this
>> what you mean by appended mode? I cannot see any 'mode' keyword the xml
>> file. Same as for encode; I don't have it in the xml file.
>>
>> now to get to your question:
>>>
>>> 1) Go with a single parallel HDF5 file that includes data for all
>>> time-steps. This makes it all nice and portable except there are two
>>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>>> POSIX IO, especially on large number of processors. ii) ParaView does not
>>> seem to be able to read HDF5 files in parallel
>>>
>>> comment: If I were you I'd avoid putting all time steps in a single
>>> file, or any solution where files get too big. Once files occupy more than
>>> ~80% of a tape drive you'll have very hard time getting them on and off
>>> archive systems. see this:
>>> http://www.nersc.gov/users/data-and-file-systems/hpss/storing-and-retrieving-data/mistakes-to-avoid/My comment assumes that you actually use such systems. But you probably
>>> will need to if you generate large datasets at common HPC centers.
>>>
>>>
>> That's actually a very good point I was not thinking of! Thanks for
>> sharing.
>>
>>
>>> I've seen some AMR codes get elaborate in their HDF5 formats and run
>>> into serious performance issues as a result. So my comment here is that if
>>> you go with HDF5, keep the format as simple as possible! and of course file
>>> sizes small enough to be archived ;-)
>>>
>>> Burlen
>>>
>>>
>>>
>>> On 05/05/2014 10:48 AM, Mohammad Mirzadeh wrote:
>>>
>>> They are represented as unstructured grid. As a sample run, a 100M grid
>>> point on 256 proc produces almost 8.5G file. We intent to push the limits
>>> close to 1B at most at this time with # processors up to a few thousands.
>>> However, it would be good to have something that could scale to larger
>>> problems as well
>>>
>>>
>>> On Sat, May 3, 2014 at 1:28 AM, Stephen Wornom <stephen.wornom at inria.fr>wrote:
>>>
>>>> Mohammad Mirzadeh wrote:
>>>>
>>>>> Hi I am at a critical point in deciding I/O format for my
>>>>> application. So far my conclusion is to use parallel HDF5 for restart files
>>>>> as they are quite flexible and portable across systems.
>>>>>
>>>>> When it comes to visualization, however, i'm not quite sure. Up until
>>>>> now I've been using pvtu along with vtu files and although they generally
>>>>> work fine, one easily gets in trouble when running big simulations on large
>>>>> number of processors as the number of files can easily get out of control
>>>>> and even simplest utility commands (e.g. ls) takes minutes to finish!
>>>>>
>>>>> After many thinking I've come to a point to decide between two
>>>>> strategies:
>>>>>
>>>>> 1) Go with a single parallel HDF5 file that includes data for all
>>>>> time-steps. This makes it all nice and portable except there are two
>>>>> issues. i) It looks like doing MPI-IO might not be as efficient as separate
>>>>> POSIX IO, especially on large number of processors. ii) ParaView does not
>>>>> seem to be able to read HDF5 files in parallel
>>>>>
>>>>> 2) Go with the same pvtu+vtu strategy except take precautions to avoid
>>>>> file explosions. I can think of two strategies here: i) use nested folders
>>>>> to separate vtu files from pvtu and also each time step ii) create an IO
>>>>> group communicator with much less processors that do the actual IO.
>>>>>
>>>>> My questions are 1) Is the second approach necessarily more efficient
>>>>> than MPI-IO used in HDF5? and 2) Is there any plan to support parallel IO
>>>>> for HDF5 files in paraview?
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>>> http://paraview.org/Wiki/ParaView
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>
>>>> Are your meshes structured or unstructured? How many vertices in your
>>>> meshes?
>>>>
>>>> Stephen
>>>>
>>>> --
>>>> stephen.wornom at inria.fr
>>>> 2004 route des lucioles - BP93
>>>> Sophia Antipolis
>>>> 06902 CEDEX
>>>>
>>>> Tel: 04 92 38 50 54
>>>> Fax: 04 97 15 53 51
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20140505/e1cb264f/attachment.html>
More information about the ParaView
mailing list