[vtkusers] Slower rendering with OpenMP and Python?

Wed Jan 16 12:06:39 EST 2019

Thanks, Andras, I'll try out TBB again.  I had tried to use it in preference to OpenMP initially, but had trouble building it and bailed out.

From: Andras Lasso <lasso at queensu.ca>
Sent: Tuesday, January 15, 2019 18:25
To: Fahlgren, Eric <eric.fahlgren at smith-nephew.com>; vtkusers at vtk.org
Subject: RE: Slower rendering with OpenMP and Python?

CAUTION: External email
REMINDER: Do not click links or open attachments unless you know the sender & are expecting the email.
________________________________
We experienced terrible performance on desktop systems with strong Nvidia GPUs when we switched to VTK OpenGL2 rendering backend. Apparently, Nvidia's threaded optimization off-loaded some work on the CPU and that interfered with VTK's multithreading. Maybe OpenMP backend has the same issue. Switching to SMP backend to TBB solved the issue for us. See more details in this pull request: https://github.com/Slicer/Slicer/pull/930

Andras

From: vtkusers <vtkusers-bounces at vtk.org<mailto:vtkusers-bounces at vtk.org>> On Behalf Of Fahlgren, Eric
Sent: Tuesday, January 15, 2019 6:53 PM
To: vtkusers at vtk.org<mailto:vtkusers at vtk.org>
Subject: [vtkusers] Slower rendering with OpenMP and Python?

Seeing Felix's thread on attempts to speed up his contour filtering, I wanted to try this myself.  So, Windows 10, VTK 8.1.2, MSVC 2017, Python 3.7, installed MS MPI tools and SDK v 10.0.

I added these to my configuration script:

        -D VTK_SMP_IMPLEMENTATION_TYPE:STRING=OpenMP                 \
        -D Module_vtkFiltersParallelFlowPaths:BOOL=ON                \
        -D Module_vtkFiltersParallelGeometry:BOOL=ON                 \
        -D Module_vtkFiltersParallelStatistics:BOOL=ON               \
        -D Module_vtkFiltersParallelVerdict:BOOL=ON                  \
        -D Module_vtkParallelMPI4Py:BOOL=ON                          \
        -D Module_vtkRenderingParallel:BOOL=ON                       \
        -D Module_vtkRenderingParallelLIC:BOOL=ON                    \

The test is an animation sequence of the lower spine in a flexion-extension event with a dozen bones, most set to 50% transparency and texture mapped, various other synthetic geometry like cylinders and toroids, pretty simple stuff (no images of any sort, so no pixels or voxels).  Animation does reconfigure some rubber-banding objects (spinal discs and some cables from the test rig), but mostly the geometry remains as-is and the rendering at new xforms is the overwhelming time sink.

The SMP=Sequential baseline runs on Intel i7-7700k with HD 630 integrated graphics at 35 FPS.  With all the above parallelism enabled, it's at about half that, best case maybe 18 FPS.  Notably, the CPU usage is 12% (typical one thread on an 8-thread CPU) with Sequential, but pegs 80-90% with OpenMP enabled.

When we test on a machine with a GPU, GTX 1050 Ti, we see the baseline for SMP=Sequential is about 41 FPS, but with OpenMP it's 18 FPS here, too.  CPU usage is same as above, one-thread max with Sequential and "use it all" with OpenMP.

Is this expected for this mix of mostly static geometry?  Or am I setting the wrong configuration variables?  Or am I just doing something dumb?

Thanks,
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://vtk.org/pipermail/vtkusers/attachments/20190116/493f0eda/attachment.html>