[vtk-developers] New SMP implementation for image filters

Thu Mar 10 08:19:22 EST 2016

Hi All,

As of today, the VTK imaging pipeline has a new implementation based
on vtkSMPTools.  A big chunk of this work is from the "Shared Memory
Parallelism" Google Summer of Code project for 2015.

Many of you will not see much difference, because the the imaging
pipeline has been multithreaded for many, many years.  The advantages
of the new implementation are that it provides more performance tuning,
and it will be easier to maintain as VTK moves forward.  The new SMP
code will only be used if you set VTK_SMP_IMPLEMENTATION_TYPE
to OpenMP or TBB in cmake.  Otherwise, the old threading code will be
used.

The main difference with the new implementation is the load balancing,
Previously, the data was divided evenly (or roughly so) among the threads.
So if there were 10 slices and 8 threads, then 6 of the threads would get
one slice, and 2 of the threads would get 2 slices.  Now, things follow
a different paradigm: the data is divided into a large number of pieces
that are queued for a thread pool.  Pieces are assigned to threads based
on thread availability, and overall CPU utilization is improved because
the load balancing is done dynamically.

So, what performance tuning is available and what gains can you expect
to see?  Let's look at the tuning first.  The following new methods are
available for image filters derived from vtkThreadedImageAlgorithm:

// Enable or disable the new behavior for all filters.
static void SetGlobalDefaultEnableSMP(bool);

// Enable or disable the new behavior for just one filter.
void SetEnableSMP(bool);

// Set the size of the image pieces that will be sent to the
// threads for execution. The ideal size will depend on the
// memory use pattern of the image filter that is being used,
// but the default size of 65536 bytes works well for most.
void SetDesiredBytesPerPiece(vtkIdType bytes);

// Set the minimum size of piece to send to a thread.  Obviously
// giving the threads one voxel at a time would be inefficient.
// A default minimum size of [16,1,1] ensures some contiguity.
void SetMinimumPieceSize(const int size[3]);

// Use pieces that are roughly square in shape (or cubic for 3D
// images).  This provides best results for filters that operate
// on a neighborhood around each output voxel.
void SetSplitModeToBlock();

// Use slab-shaped pieces.  This provides best results for filters
// that perform simple operations on the scalars, such as color mapping.
void SetSplitModeToSlab();

// Use thin rod-shaped pieces.  This also provides good results
// filters like color mapping.  I haven't yet found any algorithms
// for which this splitting method is the best to use.
void SetSplitModeToBeam();

The performance improvements to be gained by tweaking these parameters
are modest, usually less than 20%, but sometimes much higher.  As part
of this patch, I have added a new example to VTK called "ImageBenchmark"
that makes it easy to run a filter under different conditions in order to
optimize
the settings.  I'll create a wiki page in the future, but for now, you can
run "ImageBenchmark --help" to get a comprehensive description of all the
options (assuming that you built VTK with BUILD_EXAMPLES=ON).

Cheers,
  - David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20160310/1d6ec668/attachment.html>