<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>David,</div><div>   This is going to be really useful for tuning applications.</div><div>   I am trying it out on several different machines all using TBB.</div><div><br></div><div>  Thank You for all your efforts in writing this.</div><div><br></div><div>Regards</div><div>   Andrew </div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">---------- Forwarded message ----------<br>From: David Gobbi <<a href="mailto:david.gobbi@gmail.com">david.gobbi@gmail.com</a>><br>To: VTK Developers <<a href="mailto:vtk-developers@vtk.org">vtk-developers@vtk.org</a>><br>Cc: <br>Date: Thu, 10 Mar 2016 06:19:22 -0700<br>Subject: [vtk-developers] New SMP implementation for image filters<br><div dir="ltr"><div>Hi All,</div><div><br></div><div>As of today, the VTK imaging pipeline has a new implementation based</div><div>on vtkSMPTools.  A big chunk of this work is from the "Shared Memory</div><div>Parallelism" Google Summer of Code project for 2015.</div><div><br></div><div>Many of you will not see much difference, because the the imaging</div><div>pipeline has been multithreaded for many, many years.  The advantages</div><div>of the new implementation are that it provides more performance tuning,</div><div>and it will be easier to maintain as VTK moves forward.  The new SMP</div><div>code will only be used if you set VTK_SMP_IMPLEMENTATION_TYPE</div><div>to OpenMP or TBB in cmake.  Otherwise, the old threading code will be</div><div>used.</div><div><br></div><div>The main difference with the new implementation is the load balancing,</div><div>Previously, the data was divided evenly (or roughly so) among the threads.</div><div>So if there were 10 slices and 8 threads, then 6 of the threads would get</div><div>one slice, and 2 of the threads would get 2 slices.  Now, things follow</div><div>a different paradigm: the data is divided into a large number of pieces</div><div>that are queued for a thread pool.  Pieces are assigned to threads based</div><div>on thread availability, and overall CPU utilization is improved because</div><div>the load balancing is done dynamically.</div><div><br></div><div><br></div><div>So, what performance tuning is available and what gains can you expect</div><div>to see?  Let's look at the tuning first.  The following new methods are</div><div>available for image filters derived from vtkThreadedImageAlgorithm:</div><div><br></div><div>// Enable or disable the new behavior for all filters.</div><div>static void SetGlobalDefaultEnableSMP(bool);</div><div><br></div><div>// Enable or disable the new behavior for just one filter.</div><div>void SetEnableSMP(bool);</div><div><br></div><div>// Set the size of the image pieces that will be sent to the</div><div>// threads for execution. The ideal size will depend on the</div><div>// memory use pattern of the image filter that is being used,</div><div>// but the default size of 65536 bytes works well for most.</div><div>void SetDesiredBytesPerPiece(vtkIdType bytes);</div><div><br></div><div>// Set the minimum size of piece to send to a thread.  Obviously</div><div>// giving the threads one voxel at a time would be inefficient.</div><div>// A default minimum size of [16,1,1] ensures some contiguity.</div><div>void SetMinimumPieceSize(const int size[3]);</div><div><br></div><div>// Use pieces that are roughly square in shape (or cubic for 3D</div><div>// images).  This provides best results for filters that operate</div><div>// on a neighborhood around each output voxel.</div><div>void SetSplitModeToBlock();</div><div><br></div><div>// Use slab-shaped pieces.  This provides best results for filters</div><div>// that perform simple operations on the scalars, such as color mapping.</div><div>void SetSplitModeToSlab();</div><div><br></div><div>// Use thin rod-shaped pieces.  This also provides good results</div><div>// filters like color mapping.  I haven't yet found any algorithms</div><div>// for which this splitting method is the best to use.</div><div>void SetSplitModeToBeam();</div><div><br></div><div><br></div><div>The performance improvements to be gained by tweaking these parameters</div><div>are modest, usually less than 20%, but sometimes much higher.  As part</div><div>of this patch, I have added a new example to VTK called "ImageBenchmark"</div><div>that makes it easy to run a filter under different conditions in order to optimize</div><div>the settings.  I'll create a wiki page in the future, but for now, you can run "ImageBenchmark --help" to get a comprehensive description of all the</div><div>options (assuming that you built VTK with BUILD_EXAMPLES=ON).</div><div><br></div><div>Cheers,</div><div>  - David</div></div>

<br><br></blockquote></div><div><br></div>-- <br><div class="gmail_signature">___________________________________________<br>Andrew J. P. Maclean<br><br>___________________________________________</div>

</div></div>