[Paraview] Parallel Streamtracer

Mon Jun 4 13:31:04 EDT 2012

Hi, Stephan,
  I will look into the multi-core issue as well as the performance issue.

  Some quick answers:

  - Yes, the whole vector fields are partitioned and the streamlines are
passed from one process to another. This is why the performance can be
highly sensitive to how data are distributed and how the streamlines travel
between data partitions.

  - Your suggestion makes sense if the data is small enough to be run on a
single machine. This is definitely something we would like to do in the
future. Right now, the implementation is more targeted towards handling
large data that have to be distributed across multiple machines.

Leo

On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge
<Stephan.Rogge at tu-cottbus.de>wrote:

> Hello Leo,
>
> ok, I took the "disk_out_ref.ex2" example data set and did some time
> measurements. Remember, my machine has 4 Cores + HyperThreading.
>
> My first observation is that PV seems to have a problem with distributing
> the data when the Multi-Core option (GUI) is enabled. When PV is started
> with builtin Multi-Core I was not able to apply a stream tracer with more
> than 1000 seed points (PV is freezing and never comes back). Otherwise,
> when
> pvserver processes has been started manually I was able to set up to
> 100.000
> seed points. Is it a bug?
>
> Now let's have a look on the scaling performance. As you suggested, I've
> used the D3 filter for distributing the data along the processes. The
> stream
> tracer execution time for 10.000 seed points:
>
> ##   Bulitin: 10.063 seconds
> ##   1 MPI-Process (no D3): 10.162 seconds
> ##   4 MPI-Processes: 15.615 seconds
> ##   8 MPI-Processes: 14.103 seconds
>
> and 100.000 seed points:
>
> ##   Bulitin: 100.603 seconds
> ##   1 MPI-Process (no D3): 100.967 seconds
> ##   4 MPI-Processes: 168.1 seconds
> ##   8 MPI-Processes: 171.325 seconds
>
> I cannot see any positive scaling behavior here. Maybe is this example not
> appropriate for scaling measurements?
>
> One more thing: I've visualized the vtkProcessId and saw that the whole
> vector field is partitioned. I thought, that each streamline is integrated
> in its own process. But it seems that this is not the case. This could
> explain my scaling issues: In cases of small vector fields the overhead of
> synchronization becomes too large and decreases the overall performance.
>
> My suggestion is to have a parallel StreamTracer which is built for a
> single
> machine with several threads. Could be worth to randomly distribute the
> seeds over all available (local) processes? Of course, each process have
> access on the whole vector field.
>
> Cheers,
> Stephan
>
>
>
> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> Gesendet: Freitag, 1. Juni 2012 16:13
> An: Stephan Rogge
> Cc: Andy Bauer; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> Hi, Stephan,
>   I did measure the performance at some point and was able to get fairly
> decent speed up with more processors. So I am surprised you are seeing huge
> latency.
>
>   Of course, the performance is sensitive to the input.  It is also
> sensitive to how readers distribute data. So, one thing you might want to
> try is to attach the "D3" filter to the reader.
>
>   If that doesn't help,  I will be happy to get your data and take a look.
>
> Leo
>
> On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge <Stephan.Rogge at tu-cottbus.de
> >
> wrote:
> Leo,
>
> As I mentioned in my initial post of this thread: I used the up-to-date
> master branch of ParaView. Which means I have already used your
> implementation.
>
> I can imagine, to parallelize this algorithm can be very tough. And I can
> see that distribute the calculation over 8 processes does not lead to a
> nice
> scaling.
>
> But I don't understand this huge amount of latency when using the
> StreamTracer in a Cave-Mode with two view ports and two pvserver processes
> on the same machine (extra machine for the client). I guess the tracer
> filter is applied for each viewport separately? This would be ok as long as
> both filter executions run parallel. And I doubt that this is the case.
>
> Can you help to clarify my problem?
>
> Regards,
> Stephan
>
>
> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> Gesendet: Donnerstag, 31. Mai 2012 21:33
> An: Stephan Rogge
> Cc: Andy Bauer; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> It is in the current VTK and ParaView master.  The class is
> vtkPStreamTracer.
>
> Leo
> On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge <
> stephan.rogge at tu-cottbus.de>
> wrote:
> Hi, Andy and Leo,
>
> thanks for your replies.
>
> Is it possible to get this new implementation? I would to give it a try.
>
> Regards,
> Stephan
>
> Am 31.05.2012 um 17:48 schrieb Yuanxin Liu <leo.liu at kitware.com>:
> Hi, Stephan,
>    The previous implementation only has serial performance:  It traces the
> streamlines one at a time and never starts a new streamline until the
> previous one finishes.  With communication overhead, it is not surprising
> it
> got slower.
>
>   My new implementation is able to let the processes working on different
> streamlines simultaneously and should scale much better.
>
> Leo
>
> On Thu, May 31, 2012 at 11:27 AM, Andy Bauer <andy.bauer at kitware.com>
> wrote:
> Hi Stephan,
>
> The parallel stream tracer uses the partitioning of the grid to determine
> which process does the integration. When the streamline exits the subdomain
> of a process there is a search to see if it enters a subdomain assigned to
> any other processes before figuring it whether it has left the entire
> domain.
>
> Leo, copied here, has been improving the streamline implementation inside
> of
> VTK so you may want to get his newer version. It is a pretty tough
> algorithm
> to parallelize efficiently without making any assumptions on the flow or
> partitioning.
>
> Andy
>
> On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge <
> Stephan.Rogge at tu-cottbus.de>
> wrote:
> Hello,
>
> I have a question related to the parallelism of the stream tracer: As I
> understand the code right, each line integration (trace) is processed in an
> own MPI process. Right?
>
> To test the scalability of the Stream tracer I've load a structured
> (curvilinear) grid and applied the filter with a Seed resolution of 1500
> and
> check the timings in a single and multi-thread (Multi Core enabled in PV
> GUI) situation.
>
> I was really surprised that multi core slows done the execution time to 4
> seconds. The single core takes only 1.2 seconds. Data migration cannot be
> the explanation for that behavior (0.5 seconds). What is the problem here?
>
> Please see attached some statistics...
>
> Data:
> * Structured (Curvilinear) Grid
> * 244030 Cells
> * 37 MB Memory
>
> System:
> * Intel i7-2600K (4 Cores + HT = 8 Threads)
> * 16 GB Ram
> * Windows 7 64 Bit
> * ParaView (master-branch, 64 bit compilation)
>
> #################################
> Single Thread (Seed resolution 1500):
> #################################
>
> Local Process
> Still Render,  0.014 seconds
> RenderView::Update,  1.222 seconds
>    vtkPVView::Update,  1.222 seconds
>        Execute vtkStreamTracer id: 2184,  1.214 seconds
> Still Render,  0.015 seconds
>
> #################################
> Eight Threads (Seed resolution 1500):
> #################################
>
> Local Process
> Still Render,  0.029 seconds
> RenderView::Update,  4.134 seconds
> vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
>    FullRes Data Migration,  0.619 seconds
> Still Render,  0.042 seconds
>    OpenGL Dev Render,  0.01 seconds
>
>
> Render Server, Process 0
> RenderView::Update,  4.134 seconds
>    vtkPVView::Update,  4.132 seconds
>        Execute vtkStreamTracer id: 2193,  3.941 seconds
> FullRes Data Migration,  0.567 seconds
>    Dataserver gathering to 0,  0.318 seconds
>    Dataserver sending to client,  0.243 seconds
>
> Render Server, Process 1
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Render Server, Process 2
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 3
> Execute vtkStreamTracer id: 2193,  4.12 seconds
>
> Render Server, Process 4
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 5
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Render Server, Process 6
> Execute vtkStreamTracer id: 2193,  3.938 seconds
>
> Render Server, Process 7
> Execute vtkStreamTracer id: 2193,  3.939 seconds
>
> Cheers,
> Stephan
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at:
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20120604/8fe6b129/attachment.htm>