[Paraview] In-situ file/image output on Titan with 18k cores
Hong Yi
hongyi at renci.org
Mon Dec 2 10:29:05 EST 2013
Many thanks for the insight, Ken. Yes, we used transparency of 0.23 to render the channel wall transparent so that we can see how vorticity flows pass through the wall. Looks like data redistribution for visibility reordering for transparency effect was to blame for the performance overhead. We also rendered three views (top, side, and front) for each time point with transparent walls in each view, which also contributes to the performance overhead. We are also planning to move vorticity computation to the solver from ParaView, which should help alleviate coprocessing tax a little bit.
We will see whether we can get by without rendering transparent walls. In the meantime, let me know if you have any suggestions to work around the data redistribution part for rendering transparent surfaces.
Thanks again and best regards,
Hong
________________________________
From: Moreland, Kenneth [kmorel at sandia.gov]
Sent: Wednesday, November 27, 2013 12:06 PM
To: Berk Geveci; Hong Yi
Cc: paraview at paraview.org
Subject: Re: [Paraview] In-situ file/image output on Titan with 18k cores
I also wonder if you are trying to render anything transparent (volume rendering, opacity < 1, or transparency in the color bars). If you do that, then ParaView will redistribute the data to create a visibility reordering. Although the rendering itself has been scaled past 18K cores, the data redistribution (using the D3 filter) has not. In fact, I would not be super surprised that it could take that long even with very little data.
In short, try rendering only opaque surfaces if you have not yet tried that.
-Ken
From: Berk Geveci <berk.geveci at kitware.com<mailto:berk.geveci at kitware.com>>
Date: Wednesday, November 27, 2013 5:46 AM
To: Hong Yi <hongyi at renci.org<mailto:hongyi at renci.org>>
Cc: "paraview at paraview.org<mailto:paraview at paraview.org>" <paraview at paraview.org<mailto:paraview at paraview.org>>
Subject: [EXTERNAL] Re: [Paraview] In-situ file/image output on Titan with 18k cores
Hi Hong,
> 1. It appears IceT-based image compositing for 18k cores takes such a long time that it becomes unpractical to output images in-situ.
> Specifically, in our case, it takes about 14 minutes for coprocessing for one time point that output a composited image while simulation
> alone for one time point only takes about 7 seconds. I have also done a simulation run with in-situ visualization on Titan with 64 cores on a
> much lower resolution mesh (10 million element mesh as opposed on 167 million element mesh for 18k core run), in which case
> coprocessing with image output for 64 cores takes about 25 seconds. Question: is there any way to improve performance of image
> compositing for 18k cores for in-situ visualization?
This doesn't make a lot of sense. Image compositing performance is not strongly tied to the number of polygons. It is much more related to the number of cores and the image size. So 64K cores with small data should not perform so much better than 18K cores with large data. Since Ice-T takes bounding boxes into account when compositing, there may be performance gains when rendering less geometry but not to the extent that you are describing.
On the other hand, I can see Mesa rendering performance being an issue. The 18K run probably has significantly more polygons per MPI rank, specially if the polygons are not distributed somewhat evenly. This is definitely worthwhile investigating. Do you have cycles to run a few more cases? We can instrument things a bit better to see what is taking this much time.
> 2. I also tried to avoid image output, but output polydata extracts using XMLPPolyDataWriter instead on 18k cores. In this case, in-situ
> coprocessing only takes about 20 seconds (compared to 14 minutes with image output). However, too many files are generated to a point
> that breaks the hard limit on maximal number of files in a directory since the parallel writer writes a vtp file from each of 18k cores. So the
> output data files have to be broken up into different directories. However, I got “cannot find file” error when I put a directory name as a
> parameter in coprocessor.CreateWriter() function call in my python script. I tried initially to put “data/vorticity_%t.pvtp” as a parameter, but it
> fails with “cannot find file” error. Not sure whether this is a bug or I need to put absolute full path in rather than a relative path to the current
> directory. Another question is whether there are ways to composite these files generated from different cores into one single file while doing
> coprocessing so only one composite file is generated rather than a huge number of files when running on large number of cores.
We are working on ADIOS based readers and writers that will allow for writing to a single bp file. This should be ready sometime in January. This should makes things much better.
-berk
On Tue, Nov 26, 2013 at 10:31 AM, Hong Yi <hongyi at renci.org<mailto:hongyi at renci.org>> wrote:
>
> I have done several simulation runs linked with ParaView Catalyst for in-situ visualization on Titan with 18k cores and have the following observations/questions hoping to seek input from this list.
>
>
>
> 1. It appears IceT-based image compositing for 18k cores takes such a long time that it becomes unpractical to output images in-situ. Specifically, in our case, it takes about 14 minutes for coprocessing for one time point that output a composited image while simulation alone for one time point only takes about 7 seconds. I have also done a simulation run with in-situ visualization on Titan with 64 cores on a much lower resolution mesh (10 million element mesh as opposed on 167 million element mesh for 18k core run), in which case coprocessing with image output for 64 cores takes about 25 seconds. Question: is there any way to improve performance of image compositing for 18k cores for in-situ visualization?
>
> 2. I also tried to avoid image output, but output polydata extracts using XMLPPolyDataWriter instead on 18k cores. In this case, in-situ coprocessing only takes about 20 seconds (compared to 14 minutes with image output). However, too many files are generated to a point that breaks the hard limit on maximal number of files in a directory since the parallel writer writes a vtp file from each of 18k cores. So the output data files have to be broken up into different directories. However, I got “cannot find file” error when I put a directory name as a parameter in coprocessor.CreateWriter() function call in my python script. I tried initially to put “data/vorticity_%t.pvtp” as a parameter, but it fails with “cannot find file” error. Not sure whether this is a bug or I need to put absolute full path in rather than a relative path to the current directory. Another question is whether there are ways to composite these files generated from different cores into one single file while doing coprocessing so only one composite file is generated rather than a huge number of files when running on large number of cores.
>
> Thanks for any input, suggestions, and comments!
>
>
>
> Regards,
>
> Hong
>
>
> _______________________________________________
> Powered by www.kitware.com<http://www.kitware.com>
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20131202/d903c397/attachment.htm>
More information about the ParaView
mailing list