[Paraview] In-situ file/image output on Titan with 18k cores

Hong Yi hongyi at renci.org
Mon Dec 2 10:37:03 EST 2013


Hi Berk,

Many thanks for the helpful information and the interesting paper. Yes, we used transparency to render the channel walls (see details in the email I just sent out) overlaid with vorticity flows. I'd love to run some tests with more instrumentation to figure out the bottleneck. It'd be great if you could give me some pointers on this such as some instrumentation tools I could leverage, etc.

Best regards,

Hong

________________________________
From: Berk Geveci [berk.geveci at kitware.com]
Sent: Wednesday, November 27, 2013 3:01 PM
To: Hong Yi
Cc: paraview at paraview.org
Subject: Re: [Paraview] In-situ file/image output on Titan with 18k cores

Someone just pointed out that I read Hong's original e-mail wrong. I read 64 as 64K. Sorry about that. There are still some questions about which part of rendering not scaling, which we are investigating off-list. We still need:

- to know if transparency is involved because D3 most likely won't scale
- if we can run some tests with more instrumentation to figure out the bottleneck

Check out this paper:

http://users.soe.ucsc.edu/~pang/visweek/2013/ldav/papers/larendea.pdf

The EDF folks were getting pretty decent scalability up to 3.6K cores even with D3 redistribution of the entire unstructured grid. However, 18K is significantly larger so it is very likely that something will break...

Best,
-berk


On Wed, Nov 27, 2013 at 7:46 AM, Berk Geveci <berk.geveci at kitware.com<mailto:berk.geveci at kitware.com>> wrote:
Hi Hong,

> 1.       It appears IceT-based image compositing for 18k cores takes such a long time that it becomes unpractical to output images in-situ.
> Specifically, in our case, it takes about 14 minutes for coprocessing for one time point that output a composited image while simulation
> alone for one time point only takes about 7 seconds. I have also done a simulation run with in-situ visualization on Titan with 64 cores on a
> much lower resolution mesh (10 million element mesh as opposed on 167 million element mesh for 18k core run), in which case
> coprocessing with image output for 64 cores takes about 25 seconds. Question: is there any way to improve performance of image
> compositing for 18k cores for in-situ visualization?

This doesn't make a lot of sense. Image compositing performance is not strongly tied to the number of polygons. It is much more related to the number of cores and the image size. So 64K cores with small data should not perform so much better than 18K cores with large data. Since Ice-T takes bounding boxes into account when compositing, there may be performance gains when rendering less geometry but not to the extent that you are describing.

On the other hand, I can see Mesa rendering performance being an issue. The 18K run probably has significantly more polygons per MPI rank, specially if the polygons are not distributed somewhat evenly. This is definitely worthwhile investigating. Do you have cycles to run a few more cases? We can instrument things a bit better to see what is taking this much time.

> 2.       I also tried to avoid image output, but output polydata extracts using XMLPPolyDataWriter instead on 18k cores. In this case, in-situ
> coprocessing only takes about 20 seconds (compared to 14 minutes with image output). However, too many files are generated to a point
> that breaks the hard limit on maximal number of files in a directory since the parallel writer writes a vtp file from each of 18k cores. So the
> output data files have to be broken up into different directories. However, I got “cannot find file” error when I put a directory name as a
> parameter in coprocessor.CreateWriter() function call in my python script. I tried initially to put “data/vorticity_%t.pvtp” as a parameter, but it
> fails with “cannot find file” error. Not sure whether this is a bug or I need to put absolute full path in rather than a relative path to the current
> directory. Another question is whether there are ways to composite these files generated from different cores into one single file while doing
> coprocessing so only one composite file is generated rather than a huge number of files when running on large number of cores.

We are working on ADIOS based readers and writers that will allow for writing to a single bp file. This should be ready sometime in January. This should makes things much better.

-berk

On Tue, Nov 26, 2013 at 10:31 AM, Hong Yi <hongyi at renci.org<mailto:hongyi at renci.org>> wrote:
>
> I have done several simulation runs linked with ParaView Catalyst for in-situ visualization on Titan with 18k cores and have the following observations/questions hoping to seek input from this list.
>
>
>
> 1.       It appears IceT-based image compositing for 18k cores takes such a long time that it becomes unpractical to output images in-situ. Specifically, in our case, it takes about 14 minutes for coprocessing for one time point that output a composited image while simulation alone for one time point only takes about 7 seconds. I have also done a simulation run with in-situ visualization on Titan with 64 cores on a much lower resolution mesh (10 million element mesh as opposed on 167 million element mesh for 18k core run), in which case coprocessing with image output for 64 cores takes about 25 seconds. Question: is there any way to improve performance of image compositing for 18k cores for in-situ visualization?
>
> 2.       I also tried to avoid image output, but output polydata extracts using XMLPPolyDataWriter instead on 18k cores. In this case, in-situ coprocessing only takes about 20 seconds (compared to 14 minutes with image output). However, too many files are generated to a point that breaks the hard limit on maximal number of files in a directory since the parallel writer writes a vtp file from each of 18k cores. So the output data files have to be broken up into different directories. However, I got “cannot find file” error when I put a directory name as a parameter in coprocessor.CreateWriter() function call in my python script. I tried initially to put “data/vorticity_%t.pvtp” as a parameter, but it fails with “cannot find file” error. Not sure whether this is a bug or I need to put absolute full path in rather than a relative path to the current directory. Another question is whether there are ways to composite these files generated from different cores into one single file while doing coprocessing so only one composite file is generated rather than a huge number of files when running on large number of cores.
>
> Thanks for any input, suggestions, and comments!
>
>
>
> Regards,
>
> Hong
>
>
> _______________________________________________
> Powered by www.kitware.com<http://www.kitware.com>
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20131202/7be38e2b/attachment-0001.htm>


More information about the ParaView mailing list