[Paraview] Parallel Streamtracer

Fri Jun 8 04:47:35 EDT 2012

Hello Burlen,

thank you very much for your post. I really would like to test your plugin
and so I've start to build it. Unfortunately I've got a lot of compiler
errors (e.g. vtkstd isn't used in PV master anymore). Which PV version is
the base for your plugin?

Regards,
Stephan

-----Ursprüngliche Nachricht-----
Von: Burlen Loring [mailto:bloring at lbl.gov] 
Gesendet: Donnerstag, 7. Juni 2012 17:54
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview at paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

I've experienced the scaling behavior that you report when I was working on
a project that required generating millions of streamlines for a topological
mapping algorithm interactively in ParaView. To get the required scaling I
wrote a stream tracer that uses a load on demand approach with tunable block
cache so that all ranks could integrate any streamline and stay busy
throughout the entire computation. It was very effective on our data and
I've used it to integrate 30 Million streamlines in about 10min on 256
cores. If you really need better scalability than the distributed data
tracing approach implemented in PV, you might take a look at our work. The
down side of our approach is that in order to provide the demand loading the
reader has to implement a vtk object that provides an api giving the
integrator direct access to I/O functionality. In case you're interested the
stream tracer is class is vtkSQFieldTracer and our reader is vtkSQBOVReader.
The latest release could be found here
https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

Burlen

On 06/04/2012 02:21 AM, Stephan Rogge wrote:
> Hello Leo,
>
> ok, I took the "disk_out_ref.ex2" example data set and did some time 
> measurements. Remember, my machine has 4 Cores + HyperThreading.
>
> My first observation is that PV seems to have a problem with 
> distributing the data when the Multi-Core option (GUI) is enabled. 
> When PV is started with builtin Multi-Core I was not able to apply a 
> stream tracer with more than 1000 seed points (PV is freezing and 
> never comes back). Otherwise, when pvserver processes has been started 
> manually I was able to set up to 100.000 seed points. Is it a bug?
>
> Now let's have a look on the scaling performance. As you suggested, 
> I've used the D3 filter for distributing the data along the processes. 
> The stream tracer execution time for 10.000 seed points:
>
> ##   Bulitin: 10.063 seconds
> ##   1 MPI-Process (no D3): 10.162 seconds
> ##   4 MPI-Processes: 15.615 seconds
> ##   8 MPI-Processes: 14.103 seconds
>
> and 100.000 seed points:
>
> ##   Bulitin: 100.603 seconds
> ##   1 MPI-Process (no D3): 100.967 seconds
> ##   4 MPI-Processes: 168.1 seconds
> ##   8 MPI-Processes: 171.325 seconds
>
> I cannot see any positive scaling behavior here. Maybe is this example 
> not appropriate for scaling measurements?
>
> One more thing: I've visualized the vtkProcessId and saw that the 
> whole vector field is partitioned. I thought, that each streamline is 
> integrated in its own process. But it seems that this is not the case. 
> This could explain my scaling issues: In cases of small vector fields 
> the overhead of synchronization becomes too large and decreases the
overall performance.
>
> My suggestion is to have a parallel StreamTracer which is built for a 
> single machine with several threads. Could be worth to randomly 
> distribute the seeds over all available (local) processes? Of course, 
> each process have access on the whole vector field.
>
> Cheers,
> Stephan
>
>
>
> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> Gesendet: Freitag, 1. Juni 2012 16:13
> An: Stephan Rogge
> Cc: Andy Bauer; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> Hi, Stephan,
>    I did measure the performance at some point and was able to get 
> fairly decent speed up with more processors. So I am surprised you are 
> seeing huge latency.
>
>    Of course, the performance is sensitive to the input.  It is also 
> sensitive to how readers distribute data. So, one thing you might want 
> to try is to attach the "D3" filter to the reader.
>
>    If that doesn't help,  I will be happy to get your data and take a
look.
>
> Leo
>
> On Fri, Jun 1, 2012 at 1:54 AM, Stephan 
> Rogge<Stephan.Rogge at tu-cottbus.de>
> wrote:
> Leo,
>
> As I mentioned in my initial post of this thread: I used the 
> up-to-date master branch of ParaView. Which means I have already used 
> your implementation.
>
> I can imagine, to parallelize this algorithm can be very tough. And I 
> can see that distribute the calculation over 8 processes does not lead 
> to a nice scaling.
>
> But I don't understand this huge amount of latency when using the 
> StreamTracer in a Cave-Mode with two view ports and two pvserver 
> processes on the same machine (extra machine for the client). I guess 
> the tracer filter is applied for each viewport separately? This would 
> be ok as long as both filter executions run parallel. And I doubt that
this is the case.
>
> Can you help to clarify my problem?
>
> Regards,
> Stephan
>
>
> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> Gesendet: Donnerstag, 31. Mai 2012 21:33
> An: Stephan Rogge
> Cc: Andy Bauer; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> It is in the current VTK and ParaView master.  The class is 
> vtkPStreamTracer.
>
> Leo
> On Thu, May 31, 2012 at 3:31 PM, Stephan 
> Rogge<stephan.rogge at tu-cottbus.de>
> wrote:
> Hi, Andy and Leo,
>
> thanks for your replies.
>
> Is it possible to get this new implementation? I would to give it a try.
>
> Regards,
> Stephan
>
> Am 31.05.2012 um 17:48 schrieb Yuanxin Liu<leo.liu at kitware.com>:
> Hi, Stephan,
>     The previous implementation only has serial performance:  It 
> traces the streamlines one at a time and never starts a new streamline 
> until the previous one finishes.  With communication overhead, it is 
> not surprising it got slower.
>
>    My new implementation is able to let the processes working on 
> different streamlines simultaneously and should scale much better.
>
> Leo
>
> On Thu, May 31, 2012 at 11:27 AM, Andy Bauer<andy.bauer at kitware.com>
wrote:
> Hi Stephan,
>
> The parallel stream tracer uses the partitioning of the grid to 
> determine which process does the integration. When the streamline 
> exits the subdomain of a process there is a search to see if it enters 
> a subdomain assigned to any other processes before figuring it whether 
> it has left the entire domain.
>
> Leo, copied here, has been improving the streamline implementation 
> inside of VTK so you may want to get his newer version. It is a pretty 
> tough algorithm to parallelize efficiently without making any 
> assumptions on the flow or partitioning.
>
> Andy
>
> On Thu, May 31, 2012 at 4:16 AM, Stephan 
> Rogge<Stephan.Rogge at tu-cottbus.de>
> wrote:
> Hello,
>
> I have a question related to the parallelism of the stream tracer: As 
> I understand the code right, each line integration (trace) is 
> processed in an own MPI process. Right?
>
> To test the scalability of the Stream tracer I've load a structured
> (curvilinear) grid and applied the filter with a Seed resolution of 
> 1500 and check the timings in a single and multi-thread (Multi Core 
> enabled in PV
> GUI) situation.
>
> I was really surprised that multi core slows done the execution time 
> to 4 seconds. The single core takes only 1.2 seconds. Data migration 
> cannot be the explanation for that behavior (0.5 seconds). What is the
problem here?
>
> Please see attached some statistics...
>
> Data:
> * Structured (Curvilinear) Grid
> * 244030 Cells
> * 37 MB Memory
>
> System:
> * Intel i7-2600K (4 Cores + HT = 8 Threads)
> * 16 GB Ram
> * Windows 7 64 Bit
> * ParaView (master-branch, 64 bit compilation)
>
> #################################
> Single Thread (Seed resolution 1500):
> #################################
>
> Local Process
> Still Render,  0.014 seconds
> RenderView::Update,  1.222 seconds
>     vtkPVView::Update,  1.222 seconds
>         Execute vtkStreamTracer id: 2184,  1.214 seconds Still Render,  
> 0.015 seconds
>
> #################################
> Eight Threads (Seed resolution 1500):
> #################################
>
> Local Process
> Still Render,  0.029 seconds
> RenderView::Update,  4.134 seconds
> vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
>     FullRes Data Migration,  0.619 seconds Still Render,  0.042 
> seconds
>     OpenGL Dev Render,  0.01 seconds
>
>
> Render Server, Process 0
> RenderView::Update,  4.134 seconds
>     vtkPVView::Update,  4.132 seconds
>         Execute vtkStreamTracer id: 2193,  3.941 seconds FullRes Data 
> Migration,  0.567 seconds
>     Dataserver gathering to 0,  0.318 seconds
>     Dataserver sending to client,  0.243 seconds
>
> Render Server, Process 1
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Render Server, Process 2
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 3
> Execute vtkStreamTracer id: 2193,  4.12 seconds
>
> Render Server, Process 4
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 5
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Render Server, Process 6
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 7
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Cheers,
> Stephan
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at 
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at:
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>
>
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at 
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: 
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview