[Paraview] Parallel Streamtracer

Fri Jun 8 19:25:37 EDT 2012

Hi Stephan,

As promised here are instructions and a small test dataset. 
http://www.hpcvis.com/vis/sq-field-tracer.html

Burlen

On 06/08/2012 11:14 AM, burlen wrote:
> OK, you had me a little worried there, ;)
>
> I will send you some instructions and example data to test with, our 
> network is down due to an unexpected power outage so it won't be today.
>
> Burlen
>
> On 06/08/2012 07:25 AM, Stephan Rogge wrote:
>> Someone told me that you have to clear your build directory 
>> completely and
>> start a fresh PV build.
>>
>> Stephan
>>
>> -----Ursprüngliche Nachricht-----
>> Von: burlen [mailto:burlen.loring at gmail.com]
>> Gesendet: Freitag, 8. Juni 2012 16:21
>> An: Stephan Rogge
>> Cc: 'Yuanxin Liu'; paraview at paraview.org
>> Betreff: Re: [Paraview] Parallel Streamtracer
>>
>> Hi Stephan,
>>
>> Oh, thanks for the update, I wasn't aware of these changes. I have been
>> working with 3.14.1.
>>
>> Burlen
>>
>> On 06/08/2012 01:47 AM, Stephan Rogge wrote:
>>> Hello Burlen,
>>>
>>> thank you very much for your post. I really would like to test your
>>> plugin and so I've start to build it. Unfortunately I've got a lot of
>>> compiler errors (e.g. vtkstd isn't used in PV master anymore). Which
>>> PV version is the base for your plugin?
>>>
>>> Regards,
>>> Stephan
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Burlen Loring [mailto:bloring at lbl.gov]
>>> Gesendet: Donnerstag, 7. Juni 2012 17:54
>>> An: Stephan Rogge
>>> Cc: 'Yuanxin Liu'; paraview at paraview.org
>>> Betreff: Re: [Paraview] Parallel Streamtracer
>>>
>>> Hi Stephan,
>>>
>>> I've experienced the scaling behavior that you report when I was
>>> working on a project that required generating millions of streamlines
>>> for a topological mapping algorithm interactively in ParaView. To get
>>> the required scaling I wrote a stream tracer that uses a load on
>>> demand approach with tunable block cache so that all ranks could
>>> integrate any streamline and stay busy throughout the entire
>>> computation. It was very effective on our data and I've used it to
>>> integrate 30 Million streamlines in about 10min on 256 cores. If you
>>> really need better scalability than the distributed data tracing
>>> approach implemented in PV, you might take a look at our work. The
>>> down side of our approach is that in order to provide the demand
>>> loading the reader has to implement a vtk object that provides an api
>>> giving the integrator direct access to I/O functionality. In case 
>>> you're
>> interested the stream tracer is class is vtkSQFieldTracer and our 
>> reader is
>> vtkSQBOVReader.
>>> The latest release could be found here
>>> https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531
>>>
>>> Burlen
>>>
>>> On 06/04/2012 02:21 AM, Stephan Rogge wrote:
>>>> Hello Leo,
>>>>
>>>> ok, I took the "disk_out_ref.ex2" example data set and did some time
>>>> measurements. Remember, my machine has 4 Cores + HyperThreading.
>>>>
>>>> My first observation is that PV seems to have a problem with
>>>> distributing the data when the Multi-Core option (GUI) is enabled.
>>>> When PV is started with builtin Multi-Core I was not able to apply a
>>>> stream tracer with more than 1000 seed points (PV is freezing and
>>>> never comes back). Otherwise, when pvserver processes has been
>>>> started manually I was able to set up to 100.000 seed points. Is it a
>> bug?
>>>> Now let's have a look on the scaling performance. As you suggested,
>>>> I've used the D3 filter for distributing the data along the processes.
>>>> The stream tracer execution time for 10.000 seed points:
>>>>
>>>> ##   Bulitin: 10.063 seconds
>>>> ##   1 MPI-Process (no D3): 10.162 seconds
>>>> ##   4 MPI-Processes: 15.615 seconds
>>>> ##   8 MPI-Processes: 14.103 seconds
>>>>
>>>> and 100.000 seed points:
>>>>
>>>> ##   Bulitin: 100.603 seconds
>>>> ##   1 MPI-Process (no D3): 100.967 seconds
>>>> ##   4 MPI-Processes: 168.1 seconds
>>>> ##   8 MPI-Processes: 171.325 seconds
>>>>
>>>> I cannot see any positive scaling behavior here. Maybe is this
>>>> example not appropriate for scaling measurements?
>>>>
>>>> One more thing: I've visualized the vtkProcessId and saw that the
>>>> whole vector field is partitioned. I thought, that each streamline is
>>>> integrated in its own process. But it seems that this is not the case.
>>>> This could explain my scaling issues: In cases of small vector fields
>>>> the overhead of synchronization becomes too large and decreases the
>>> overall performance.
>>>> My suggestion is to have a parallel StreamTracer which is built for a
>>>> single machine with several threads. Could be worth to randomly
>>>> distribute the seeds over all available (local) processes? Of course,
>>>> each process have access on the whole vector field.
>>>>
>>>> Cheers,
>>>> Stephan
>>>>
>>>>
>>>>
>>>> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
>>>> Gesendet: Freitag, 1. Juni 2012 16:13
>>>> An: Stephan Rogge
>>>> Cc: Andy Bauer; paraview at paraview.org
>>>> Betreff: Re: [Paraview] Parallel Streamtracer
>>>>
>>>> Hi, Stephan,
>>>>      I did measure the performance at some point and was able to get
>>>> fairly decent speed up with more processors. So I am surprised you
>>>> are seeing huge latency.
>>>>
>>>>      Of course, the performance is sensitive to the input.  It is also
>>>> sensitive to how readers distribute data. So, one thing you might
>>>> want to try is to attach the "D3" filter to the reader.
>>>>
>>>>      If that doesn't help,  I will be happy to get your data and take
>>>> a
>>> look.
>>>> Leo
>>>>
>>>> On Fri, Jun 1, 2012 at 1:54 AM, Stephan
>>>> Rogge<Stephan.Rogge at tu-cottbus.de>
>>>> wrote:
>>>> Leo,
>>>>
>>>> As I mentioned in my initial post of this thread: I used the
>>>> up-to-date master branch of ParaView. Which means I have already used
>>>> your implementation.
>>>>
>>>> I can imagine, to parallelize this algorithm can be very tough. And I
>>>> can see that distribute the calculation over 8 processes does not
>>>> lead to a nice scaling.
>>>>
>>>> But I don't understand this huge amount of latency when using the
>>>> StreamTracer in a Cave-Mode with two view ports and two pvserver
>>>> processes on the same machine (extra machine for the client). I guess
>>>> the tracer filter is applied for each viewport separately? This would
>>>> be ok as long as both filter executions run parallel. And I doubt
>>>> that
>>> this is the case.
>>>> Can you help to clarify my problem?
>>>>
>>>> Regards,
>>>> Stephan
>>>>
>>>>
>>>> Von: Yuanxin Liu [mailto:leo.liu at kitware.com]
>>>> Gesendet: Donnerstag, 31. Mai 2012 21:33
>>>> An: Stephan Rogge
>>>> Cc: Andy Bauer; paraview at paraview.org
>>>> Betreff: Re: [Paraview] Parallel Streamtracer
>>>>
>>>> It is in the current VTK and ParaView master.  The class is
>>>> vtkPStreamTracer.
>>>>
>>>> Leo
>>>> On Thu, May 31, 2012 at 3:31 PM, Stephan
>>>> Rogge<stephan.rogge at tu-cottbus.de>
>>>> wrote:
>>>> Hi, Andy and Leo,
>>>>
>>>> thanks for your replies.
>>>>
>>>> Is it possible to get this new implementation? I would to give it a 
>>>> try.
>>>>
>>>> Regards,
>>>> Stephan
>>>>
>>>> Am 31.05.2012 um 17:48 schrieb Yuanxin Liu<leo.liu at kitware.com>:
>>>> Hi, Stephan,
>>>>       The previous implementation only has serial performance:  It
>>>> traces the streamlines one at a time and never starts a new
>>>> streamline until the previous one finishes.  With communication
>>>> overhead, it is not surprising it got slower.
>>>>
>>>>      My new implementation is able to let the processes working on
>>>> different streamlines simultaneously and should scale much better.
>>>>
>>>> Leo
>>>>
>>>> On Thu, May 31, 2012 at 11:27 AM, Andy Bauer<andy.bauer at kitware.com>
>>> wrote:
>>>> Hi Stephan,
>>>>
>>>> The parallel stream tracer uses the partitioning of the grid to
>>>> determine which process does the integration. When the streamline
>>>> exits the subdomain of a process there is a search to see if it
>>>> enters a subdomain assigned to any other processes before figuring it
>>>> whether it has left the entire domain.
>>>>
>>>> Leo, copied here, has been improving the streamline implementation
>>>> inside of VTK so you may want to get his newer version. It is a
>>>> pretty tough algorithm to parallelize efficiently without making any
>>>> assumptions on the flow or partitioning.
>>>>
>>>> Andy
>>>>
>>>> On Thu, May 31, 2012 at 4:16 AM, Stephan
>>>> Rogge<Stephan.Rogge at tu-cottbus.de>
>>>> wrote:
>>>> Hello,
>>>>
>>>> I have a question related to the parallelism of the stream tracer: As
>>>> I understand the code right, each line integration (trace) is
>>>> processed in an own MPI process. Right?
>>>>
>>>> To test the scalability of the Stream tracer I've load a structured
>>>> (curvilinear) grid and applied the filter with a Seed resolution of
>>>> 1500 and check the timings in a single and multi-thread (Multi Core
>>>> enabled in PV
>>>> GUI) situation.
>>>>
>>>> I was really surprised that multi core slows done the execution time
>>>> to 4 seconds. The single core takes only 1.2 seconds. Data migration
>>>> cannot be the explanation for that behavior (0.5 seconds). What is
>>>> the
>>> problem here?
>>>> Please see attached some statistics...
>>>>
>>>> Data:
>>>> * Structured (Curvilinear) Grid
>>>> * 244030 Cells
>>>> * 37 MB Memory
>>>>
>>>> System:
>>>> * Intel i7-2600K (4 Cores + HT = 8 Threads)
>>>> * 16 GB Ram
>>>> * Windows 7 64 Bit
>>>> * ParaView (master-branch, 64 bit compilation)
>>>>
>>>> #################################
>>>> Single Thread (Seed resolution 1500):
>>>> #################################
>>>>
>>>> Local Process
>>>> Still Render,  0.014 seconds
>>>> RenderView::Update,  1.222 seconds
>>>>       vtkPVView::Update,  1.222 seconds
>>>>           Execute vtkStreamTracer id: 2184,  1.214 seconds Still
>>>> Render,
>>>> 0.015 seconds
>>>>
>>>> #################################
>>>> Eight Threads (Seed resolution 1500):
>>>> #################################
>>>>
>>>> Local Process
>>>> Still Render,  0.029 seconds
>>>> RenderView::Update,  4.134 seconds
>>>> vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
>>>>       FullRes Data Migration,  0.619 seconds Still Render,  0.042
>>>> seconds
>>>>       OpenGL Dev Render,  0.01 seconds
>>>>
>>>>
>>>> Render Server, Process 0
>>>> RenderView::Update,  4.134 seconds
>>>>       vtkPVView::Update,  4.132 seconds
>>>>           Execute vtkStreamTracer id: 2193,  3.941 seconds FullRes
>>>> Data Migration,  0.567 seconds
>>>>       Dataserver gathering to 0,  0.318 seconds
>>>>       Dataserver sending to client,  0.243 seconds
>>>>
>>>> Render Server, Process 1
>>>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>>>
>>>> Render Server, Process 2
>>>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>>>
>>>> Render Server, Process 3
>>>> Execute vtkStreamTracer id: 2193,  4.12 seconds
>>>>
>>>> Render Server, Process 4
>>>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>>>
>>>> Render Server, Process 5
>>>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>>>
>>>> Render Server, Process 6
>>>> Execute vtkStreamTracer id: 2193,  3.938 seconds
>>>>
>>>> Render Server, Process 7
>>>> Execute vtkStreamTracer id: 2193,  3.939 seconds
>>>>
>>>> Cheers,
>>>> Stephan
>>>>
>>>>
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>> http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>> http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at 
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: 
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>