[Paraview] Parallel Streamtracer

Fri Jun 8 10:25:00 EDT 2012

Someone told me that you have to clear your build directory completely and
start a fresh PV build. 

Stephan

-----Ursprüngliche Nachricht-----
Von: burlen [mailto:burlen.loring at gmail.com] 
Gesendet: Freitag, 8. Juni 2012 16:21
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview at paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

Oh, thanks for the update, I wasn't aware of these changes. I have been
working with 3.14.1.

Burlen

On 06/08/2012 01:47 AM, Stephan Rogge wrote:
> Hello Burlen,
>
> thank you very much for your post. I really would like to test your 
> plugin and so I've start to build it. Unfortunately I've got a lot of 
> compiler errors (e.g. vtkstd isn't used in PV master anymore). Which 
> PV version is the base for your plugin?
>
> Regards,
> Stephan
>
> -----Ursprüngliche Nachricht-----
> Von: Burlen Loring [mailto:bloring at lbl.gov]
> Gesendet: Donnerstag, 7. Juni 2012 17:54
> An: Stephan Rogge
> Cc: 'Yuanxin Liu'; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> Hi Stephan,
>
> I've experienced the scaling behavior that you report when I was 
> working on a project that required generating millions of streamlines 
> for a topological mapping algorithm interactively in ParaView. To get 
> the required scaling I wrote a stream tracer that uses a load on 
> demand approach with tunable block cache so that all ranks could 
> integrate any streamline and stay busy throughout the entire 
> computation. It was very effective on our data and I've used it to 
> integrate 30 Million streamlines in about 10min on 256 cores. If you 
> really need better scalability than the distributed data tracing 
> approach implemented in PV, you might take a look at our work. The 
> down side of our approach is that in order to provide the demand 
> loading the reader has to implement a vtk object that provides an api 
> giving the integrator direct access to I/O functionality. In case you're
interested the stream tracer is class is vtkSQFieldTracer and our reader is
vtkSQBOVReader.
> The latest release could be found here
> https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531
>
> Burlen
>
> On 06/04/2012 02:21 AM, Stephan Rogge wrote:
>> Hello Leo,
>>
>> ok, I took the "disk_out_ref.ex2" example data set and did some time 
>> measurements. Remember, my machine has 4 Cores + HyperThreading.
>>
>> My first observation is that PV seems to have a problem with 
>> distributing the data when the Multi-Core option (GUI) is enabled.
>> When PV is started with builtin Multi-Core I was not able to apply a 
>> stream tracer with more than 1000 seed points (PV is freezing and 
>> never comes back). Otherwise, when pvserver processes has been 
>> started manually I was able to set up to 100.000 seed points. Is it a
bug?
>>
>> Now let's have a look on the scaling performance. As you suggested, 
>> I've used the D3 filter for distributing the data along the processes.
>> The stream tracer execution time for 10.000 seed points:
>>
>> ##   Bulitin: 10.063 seconds
>> ##   1 MPI-Process (no D3): 10.162 seconds
>> ##   4 MPI-Processes: 15.615 seconds
>> ##   8 MPI-Processes: 14.103 seconds
>>
>> and 100.000 seed points:
>>
>> ##   Bulitin: 100.603 seconds
>> ##   1 MPI-Process (no D3): 100.967 seconds
>> ##   4 MPI-Processes: 168.1 seconds
>> ##   8 MPI-Processes: 171.325 seconds
>>
>> I cannot see any positive scaling behavior here. Maybe is this 
>> example not appropriate for scaling measurements?
>>
>> One more thing: I've visualized the vtkProcessId and saw that the 
>> whole vector field is partitioned. I thought, that each streamline is 
>> integrated in its own process. But it seems that this is not the case.
>> This could explain my scaling issues: In cases of small vector fields 
>> the overhead of synchronization becomes too large and decreases the
> overall performance.
>> My suggestion is to have a parallel StreamTracer which is built for a 
>> single machine with several threads. Could be worth to randomly 
>> distribute the seeds over all available (local) processes? Of course, 
>> each process have access on the whole vector field.
>>
>> Cheers,
>> Stephan
>>
>>
>>
>> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
>> Gesendet: Freitag, 1. Juni 2012 16:13
>> An: Stephan Rogge
>> Cc: Andy Bauer; paraview at paraview.org
>> Betreff: Re: [Paraview] Parallel Streamtracer
>>
>> Hi, Stephan,
>>     I did measure the performance at some point and was able to get 
>> fairly decent speed up with more processors. So I am surprised you 
>> are seeing huge latency.
>>
>>     Of course, the performance is sensitive to the input.  It is also 
>> sensitive to how readers distribute data. So, one thing you might 
>> want to try is to attach the "D3" filter to the reader.
>>
>>     If that doesn't help,  I will be happy to get your data and take 
>> a
> look.
>> Leo
>>
>> On Fri, Jun 1, 2012 at 1:54 AM, Stephan 
>> Rogge<Stephan.Rogge at tu-cottbus.de>
>> wrote:
>> Leo,
>>
>> As I mentioned in my initial post of this thread: I used the 
>> up-to-date master branch of ParaView. Which means I have already used 
>> your implementation.
>>
>> I can imagine, to parallelize this algorithm can be very tough. And I 
>> can see that distribute the calculation over 8 processes does not 
>> lead to a nice scaling.
>>
>> But I don't understand this huge amount of latency when using the 
>> StreamTracer in a Cave-Mode with two view ports and two pvserver 
>> processes on the same machine (extra machine for the client). I guess 
>> the tracer filter is applied for each viewport separately? This would 
>> be ok as long as both filter executions run parallel. And I doubt 
>> that
> this is the case.
>> Can you help to clarify my problem?
>>
>> Regards,
>> Stephan
>>
>>
>> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
>> Gesendet: Donnerstag, 31. Mai 2012 21:33
>> An: Stephan Rogge
>> Cc: Andy Bauer; paraview at paraview.org
>> Betreff: Re: [Paraview] Parallel Streamtracer
>>
>> It is in the current VTK and ParaView master.  The class is 
>> vtkPStreamTracer.
>>
>> Leo
>> On Thu, May 31, 2012 at 3:31 PM, Stephan 
>> Rogge<stephan.rogge at tu-cottbus.de>
>> wrote:
>> Hi, Andy and Leo,
>>
>> thanks for your replies.
>>
>> Is it possible to get this new implementation? I would to give it a try.
>>
>> Regards,
>> Stephan
>>
>> Am 31.05.2012 um 17:48 schrieb Yuanxin Liu<leo.liu at kitware.com>:
>> Hi, Stephan,
>>      The previous implementation only has serial performance:  It 
>> traces the streamlines one at a time and never starts a new 
>> streamline until the previous one finishes.  With communication 
>> overhead, it is not surprising it got slower.
>>
>>     My new implementation is able to let the processes working on 
>> different streamlines simultaneously and should scale much better.
>>
>> Leo
>>
>> On Thu, May 31, 2012 at 11:27 AM, Andy Bauer<andy.bauer at kitware.com>
> wrote:
>> Hi Stephan,
>>
>> The parallel stream tracer uses the partitioning of the grid to 
>> determine which process does the integration. When the streamline 
>> exits the subdomain of a process there is a search to see if it 
>> enters a subdomain assigned to any other processes before figuring it 
>> whether it has left the entire domain.
>>
>> Leo, copied here, has been improving the streamline implementation 
>> inside of VTK so you may want to get his newer version. It is a 
>> pretty tough algorithm to parallelize efficiently without making any 
>> assumptions on the flow or partitioning.
>>
>> Andy
>>
>> On Thu, May 31, 2012 at 4:16 AM, Stephan 
>> Rogge<Stephan.Rogge at tu-cottbus.de>
>> wrote:
>> Hello,
>>
>> I have a question related to the parallelism of the stream tracer: As 
>> I understand the code right, each line integration (trace) is 
>> processed in an own MPI process. Right?
>>
>> To test the scalability of the Stream tracer I've load a structured
>> (curvilinear) grid and applied the filter with a Seed resolution of
>> 1500 and check the timings in a single and multi-thread (Multi Core 
>> enabled in PV
>> GUI) situation.
>>
>> I was really surprised that multi core slows done the execution time 
>> to 4 seconds. The single core takes only 1.2 seconds. Data migration 
>> cannot be the explanation for that behavior (0.5 seconds). What is 
>> the
> problem here?
>> Please see attached some statistics...
>>
>> Data:
>> * Structured (Curvilinear) Grid
>> * 244030 Cells
>> * 37 MB Memory
>>
>> System:
>> * Intel i7-2600K (4 Cores + HT = 8 Threads)
>> * 16 GB Ram
>> * Windows 7 64 Bit
>> * ParaView (master-branch, 64 bit compilation)
>>
>> #################################
>> Single Thread (Seed resolution 1500):
>> #################################
>>
>> Local Process
>> Still Render,  0.014 seconds
>> RenderView::Update,  1.222 seconds
>>      vtkPVView::Update,  1.222 seconds
>>          Execute vtkStreamTracer id: 2184,  1.214 seconds Still 
>> Render,
>> 0.015 seconds
>>
>> #################################
>> Eight Threads (Seed resolution 1500):
>> #################################
>>
>> Local Process
>> Still Render,  0.029 seconds
>> RenderView::Update,  4.134 seconds
>> vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
>>      FullRes Data Migration,  0.619 seconds Still Render,  0.042 
>> seconds
>>      OpenGL Dev Render,  0.01 seconds
>>
>>
>> Render Server, Process 0
>> RenderView::Update,  4.134 seconds
>>      vtkPVView::Update,  4.132 seconds
>>          Execute vtkStreamTracer id: 2193,  3.941 seconds FullRes 
>> Data Migration,  0.567 seconds
>>      Dataserver gathering to 0,  0.318 seconds
>>      Dataserver sending to client,  0.243 seconds
>>
>> Render Server, Process 1
>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>
>> Render Server, Process 2
>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>
>> Render Server, Process 3
>> Execute vtkStreamTracer id: 2193,  4.12 seconds
>>
>> Render Server, Process 4
>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>
>> Render Server, Process 5
>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>
>> Render Server, Process 6
>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>
>> Render Server, Process 7
>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>
>> Cheers,
>> Stephan
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at 
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at:
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>>
>>
>>
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at 
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at:
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>