[Paraview] Parallel Streamtracer

Fri Jun 8 12:10:56 EDT 2012

Hi,
  I have recently gotten Burlen's code and updated it to work with the
latest ParaView.  Aside from vtkstd, there are also a few backward
incompatible VTK changes ( see the VTK6.0 section on the VTK wiki).   But
it is not too much work. I will be happy send either of you my code changes
if you need a reference.

Leo

On Fri, Jun 8, 2012 at 10:25 AM, Stephan Rogge
<Stephan.Rogge at tu-cottbus.de>wrote:

> Someone told me that you have to clear your build directory completely and
> start a fresh PV build.
>
> Stephan
>
> -----Ursprüngliche Nachricht-----
> Von: burlen [mailto:burlen.loring at gmail.com]
> Gesendet: Freitag, 8. Juni 2012 16:21
> An: Stephan Rogge
> Cc: 'Yuanxin Liu'; paraview at paraview.org
> Betreff: Re: [Paraview] Parallel Streamtracer
>
> Hi Stephan,
>
> Oh, thanks for the update, I wasn't aware of these changes. I have been
> working with 3.14.1.
>
> Burlen
>
> On 06/08/2012 01:47 AM, Stephan Rogge wrote:
> > Hello Burlen,
> >
> > thank you very much for your post. I really would like to test your
> > plugin and so I've start to build it. Unfortunately I've got a lot of
> > compiler errors (e.g. vtkstd isn't used in PV master anymore). Which
> > PV version is the base for your plugin?
> >
> > Regards,
> > Stephan
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Burlen Loring [mailto:bloring at lbl.gov]
> > Gesendet: Donnerstag, 7. Juni 2012 17:54
> > An: Stephan Rogge
> > Cc: 'Yuanxin Liu'; paraview at paraview.org
> > Betreff: Re: [Paraview] Parallel Streamtracer
> >
> > Hi Stephan,
> >
> > I've experienced the scaling behavior that you report when I was
> > working on a project that required generating millions of streamlines
> > for a topological mapping algorithm interactively in ParaView. To get
> > the required scaling I wrote a stream tracer that uses a load on
> > demand approach with tunable block cache so that all ranks could
> > integrate any streamline and stay busy throughout the entire
> > computation. It was very effective on our data and I've used it to
> > integrate 30 Million streamlines in about 10min on 256 cores. If you
> > really need better scalability than the distributed data tracing
> > approach implemented in PV, you might take a look at our work. The
> > down side of our approach is that in order to provide the demand
> > loading the reader has to implement a vtk object that provides an api
> > giving the integrator direct access to I/O functionality. In case you're
> interested the stream tracer is class is vtkSQFieldTracer and our reader is
> vtkSQBOVReader.
> > The latest release could be found here
> > https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531
> >
> > Burlen
> >
> > On 06/04/2012 02:21 AM, Stephan Rogge wrote:
> >> Hello Leo,
> >>
> >> ok, I took the "disk_out_ref.ex2" example data set and did some time
> >> measurements. Remember, my machine has 4 Cores + HyperThreading.
> >>
> >> My first observation is that PV seems to have a problem with
> >> distributing the data when the Multi-Core option (GUI) is enabled.
> >> When PV is started with builtin Multi-Core I was not able to apply a
> >> stream tracer with more than 1000 seed points (PV is freezing and
> >> never comes back). Otherwise, when pvserver processes has been
> >> started manually I was able to set up to 100.000 seed points. Is it a
> bug?
> >>
> >> Now let's have a look on the scaling performance. As you suggested,
> >> I've used the D3 filter for distributing the data along the processes.
> >> The stream tracer execution time for 10.000 seed points:
> >>
> >> ##   Bulitin: 10.063 seconds
> >> ##   1 MPI-Process (no D3): 10.162 seconds
> >> ##   4 MPI-Processes: 15.615 seconds
> >> ##   8 MPI-Processes: 14.103 seconds
> >>
> >> and 100.000 seed points:
> >>
> >> ##   Bulitin: 100.603 seconds
> >> ##   1 MPI-Process (no D3): 100.967 seconds
> >> ##   4 MPI-Processes: 168.1 seconds
> >> ##   8 MPI-Processes: 171.325 seconds
> >>
> >> I cannot see any positive scaling behavior here. Maybe is this
> >> example not appropriate for scaling measurements?
> >>
> >> One more thing: I've visualized the vtkProcessId and saw that the
> >> whole vector field is partitioned. I thought, that each streamline is
> >> integrated in its own process. But it seems that this is not the case.
> >> This could explain my scaling issues: In cases of small vector fields
> >> the overhead of synchronization becomes too large and decreases the
> > overall performance.
> >> My suggestion is to have a parallel StreamTracer which is built for a
> >> single machine with several threads. Could be worth to randomly
> >> distribute the seeds over all available (local) processes? Of course,
> >> each process have access on the whole vector field.
> >>
> >> Cheers,
> >> Stephan
> >>
> >>
> >>
> >> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> >> Gesendet: Freitag, 1. Juni 2012 16:13
> >> An: Stephan Rogge
> >> Cc: Andy Bauer; paraview at paraview.org
> >> Betreff: Re: [Paraview] Parallel Streamtracer
> >>
> >> Hi, Stephan,
> >>     I did measure the performance at some point and was able to get
> >> fairly decent speed up with more processors. So I am surprised you
> >> are seeing huge latency.
> >>
> >>     Of course, the performance is sensitive to the input.  It is also
> >> sensitive to how readers distribute data. So, one thing you might
> >> want to try is to attach the "D3" filter to the reader.
> >>
> >>     If that doesn't help,  I will be happy to get your data and take
> >> a
> > look.
> >> Leo
> >>
> >> On Fri, Jun 1, 2012 at 1:54 AM, Stephan
> >> Rogge<Stephan.Rogge at tu-cottbus.de>
> >> wrote:
> >> Leo,
> >>
> >> As I mentioned in my initial post of this thread: I used the
> >> up-to-date master branch of ParaView. Which means I have already used
> >> your implementation.
> >>
> >> I can imagine, to parallelize this algorithm can be very tough. And I
> >> can see that distribute the calculation over 8 processes does not
> >> lead to a nice scaling.
> >>
> >> But I don't understand this huge amount of latency when using the
> >> StreamTracer in a Cave-Mode with two view ports and two pvserver
> >> processes on the same machine (extra machine for the client). I guess
> >> the tracer filter is applied for each viewport separately? This would
> >> be ok as long as both filter executions run parallel. And I doubt
> >> that
> > this is the case.
> >> Can you help to clarify my problem?
> >>
> >> Regards,
> >> Stephan
> >>
> >>
> >> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
> >> Gesendet: Donnerstag, 31. Mai 2012 21:33
> >> An: Stephan Rogge
> >> Cc: Andy Bauer; paraview at paraview.org
> >> Betreff: Re: [Paraview] Parallel Streamtracer
> >>
> >> It is in the current VTK and ParaView master.  The class is
> >> vtkPStreamTracer.
> >>
> >> Leo
> >> On Thu, May 31, 2012 at 3:31 PM, Stephan
> >> Rogge<stephan.rogge at tu-cottbus.de>
> >> wrote:
> >> Hi, Andy and Leo,
> >>
> >> thanks for your replies.
> >>
> >> Is it possible to get this new implementation? I would to give it a try.
> >>
> >> Regards,
> >> Stephan
> >>
> >> Am 31.05.2012 um 17:48 schrieb Yuanxin Liu<leo.liu at kitware.com>:
> >> Hi, Stephan,
> >>      The previous implementation only has serial performance:  It
> >> traces the streamlines one at a time and never starts a new
> >> streamline until the previous one finishes.  With communication
> >> overhead, it is not surprising it got slower.
> >>
> >>     My new implementation is able to let the processes working on
> >> different streamlines simultaneously and should scale much better.
> >>
> >> Leo
> >>
> >> On Thu, May 31, 2012 at 11:27 AM, Andy Bauer<andy.bauer at kitware.com>
> > wrote:
> >> Hi Stephan,
> >>
> >> The parallel stream tracer uses the partitioning of the grid to
> >> determine which process does the integration. When the streamline
> >> exits the subdomain of a process there is a search to see if it
> >> enters a subdomain assigned to any other processes before figuring it
> >> whether it has left the entire domain.
> >>
> >> Leo, copied here, has been improving the streamline implementation
> >> inside of VTK so you may want to get his newer version. It is a
> >> pretty tough algorithm to parallelize efficiently without making any
> >> assumptions on the flow or partitioning.
> >>
> >> Andy
> >>
> >> On Thu, May 31, 2012 at 4:16 AM, Stephan
> >> Rogge<Stephan.Rogge at tu-cottbus.de>
> >> wrote:
> >> Hello,
> >>
> >> I have a question related to the parallelism of the stream tracer: As
> >> I understand the code right, each line integration (trace) is
> >> processed in an own MPI process. Right?
> >>
> >> To test the scalability of the Stream tracer I've load a structured
> >> (curvilinear) grid and applied the filter with a Seed resolution of
> >> 1500 and check the timings in a single and multi-thread (Multi Core
> >> enabled in PV
> >> GUI) situation.
> >>
> >> I was really surprised that multi core slows done the execution time
> >> to 4 seconds. The single core takes only 1.2 seconds. Data migration
> >> cannot be the explanation for that behavior (0.5 seconds). What is
> >> the
> > problem here?
> >> Please see attached some statistics...
> >>
> >> Data:
> >> * Structured (Curvilinear) Grid
> >> * 244030 Cells
> >> * 37 MB Memory
> >>
> >> System:
> >> * Intel i7-2600K (4 Cores + HT = 8 Threads)
> >> * 16 GB Ram
> >> * Windows 7 64 Bit
> >> * ParaView (master-branch, 64 bit compilation)
> >>
> >> #################################
> >> Single Thread (Seed resolution 1500):
> >> #################################
> >>
> >> Local Process
> >> Still Render,  0.014 seconds
> >> RenderView::Update,  1.222 seconds
> >>      vtkPVView::Update,  1.222 seconds
> >>          Execute vtkStreamTracer id: 2184,  1.214 seconds Still
> >> Render,
> >> 0.015 seconds
> >>
> >> #################################
> >> Eight Threads (Seed resolution 1500):
> >> #################################
> >>
> >> Local Process
> >> Still Render,  0.029 seconds
> >> RenderView::Update,  4.134 seconds
> >> vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
> >>      FullRes Data Migration,  0.619 seconds Still Render,  0.042
> >> seconds
> >>      OpenGL Dev Render,  0.01 seconds
> >>
> >>
> >> Render Server, Process 0
> >> RenderView::Update,  4.134 seconds
> >>      vtkPVView::Update,  4.132 seconds
> >>          Execute vtkStreamTracer id: 2193,  3.941 seconds FullRes
> >> Data Migration,  0.567 seconds
> >>      Dataserver gathering to 0,  0.318 seconds
> >>      Dataserver sending to client,  0.243 seconds
> >>
> >> Render Server, Process 1
> >> Execute vtkStreamTracer id: 2193,  3.939 seconds
> >>
> >> Render Server, Process 2
> >> Execute vtkStreamTracer id: 2193,  3.938 seconds
> >>
> >> Render Server, Process 3
> >> Execute vtkStreamTracer id: 2193,  4.12 seconds
> >>
> >> Render Server, Process 4
> >> Execute vtkStreamTracer id: 2193,  3.938 seconds
> >>
> >> Render Server, Process 5
> >> Execute vtkStreamTracer id: 2193,  3.939 seconds
> >>
> >> Render Server, Process 6
> >> Execute vtkStreamTracer id: 2193,  3.938 seconds
> >>
> >> Render Server, Process 7
> >> Execute vtkStreamTracer id: 2193,  3.939 seconds
> >>
> >> Cheers,
> >> Stephan
> >>
> >>
> >> _______________________________________________
> >> Powered by www.kitware.com
> >>
> >> Visit other Kitware open-source projects at
> >> http://www.kitware.com/opensource/opensource.html
> >>
> >> Please keep messages on-topic and check the ParaView Wiki at:
> >> http://paraview.org/Wiki/ParaView
> >>
> >> Follow this link to subscribe/unsubscribe:
> >> http://www.paraview.org/mailman/listinfo/paraview
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Powered by www.kitware.com
> >>
> >> Visit other Kitware open-source projects at
> >> http://www.kitware.com/opensource/opensource.html
> >>
> >> Please keep messages on-topic and check the ParaView Wiki at:
> >> http://paraview.org/Wiki/ParaView
> >>
> >> Follow this link to subscribe/unsubscribe:
> >> http://www.paraview.org/mailman/listinfo/paraview
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20120608/6ff97b6f/attachment-0001.htm>