[Paraview] distributed stream tracer scalability issue

Tue Aug 25 14:42:46 EDT 2009

Hi John,

Thanks for the insight! We have image and rectilinear grids (current 
size is 512^3 not that big, but these are growing as we get more cpu 
hours), we use seed points from a plane  with a higher than grid 
resolution, which intersects a number of sub domains,. There is some 
potential to integrate in parallel. At this point I am not sure it will 
help (and after reading your and others comments less so). There is a 
big serial component to the algorithm, and load is imbalanced.

Great that you found some ways to boost the performance! Any speed up 
will be very helpful in this application.

Burlen

John Biddiscombe wrote:
> Burlen
>
> I have had performance issues with the Distributed Stream tracer, but 
> in fact I found that in general, the problem of it not being very well 
> optimized for parallel operation was not the main trouble. If you are 
> using Unstructured Grids, and they are large (in my case 20million 
> cells in a block), then the main time was taken by the building of 
> cell links which are used to FindCEll inwhich an integration point 
> lies. I modified the stream tracer interpolation to use a BSP tree (or 
> CellLocator) and found a huge improvement in execution time. (minutes 
> instead of hours).
>
> Secondly. the parallelization of the stream tracer is an inherent 
> problem. One cannot integrate the streamline in block 2, until it has 
> reached a boundary in block 1 - one must wait until the streamling 
> traverses one block before passing it to the next. In actuality, the 
> implementation could be improved with more intelligent seeding and 
> rending/receiving of streamline seeds etc between iterations.
>
> The Particle tracer code could be modifed to produce streamlines in a 
> serial or distributed manner and ought to give a 'reasonably' optimal 
> solution to the problem - but in fact the chaps at kitware are at the 
> moment (they tell me) in the process of revamping the streamline code 
> to make use of CellLocators - and for this reason I recently committed 
> my BSP tree code.
>
> Here's how to check your bottleneck.
> Find a large StructuredGrid dataset which is loaded in parallel. 
> Generate streamlines. Time it. Convert the grdi to UnstructuredGrid 
> and do the same. If test 1 takes 1 minute and test 2 1 hour, then it 
> isn't the parallization that's the real issue, but the grid being used.
>
> JB
>
>
>
>
>> We've been using the distributed stream tracer to generate 100s-1000s 
>> of stream lines per time step. It's very slow, and it doesn't scale 
>> at all.  The class comments say as much. I'm sure there is a reason 
>> why this implementation was chosen. Is there something that generally 
>> prevents real parallel implementation? Is there a better 
>> implementation available out there?
>>
>> There is this post a while back
>> http://www.paraview.org/pipermail/paraview/2009-July/012959.html
>>
>> What's the status?
>>
>> Thanks
>> Burlen
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at 
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: 
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>
>