[Insight-developers] Multi-threading strategies

Mon Jul 30 08:28:58 EDT 2007

Hi Simon,

  Thanks for the reference, I'll look into it.  Much of my thoughts have
been sparked with the GPL'ing of Intel's Threading Building Blocks
(http://www.threadingbuildingblocks.org/).  I have not quite yet dug
into the implementation, but it appears that TBB provides a much finer
granularity of control than the current ITK MultiThreader, while also
having high level constructs.

  I've used OpenMP just recently and have been very impressed with it's
capabilities.  Rarely are ITK loops simple enough for OpenMP.  Though
GCC 4.x (where I forget what x is) supports OpenMP, most compilers do
not, so we shut out threaded implementations for many ITK users.  OpenMP
wins hands down for simplicity of implementation, no question.  Just
insert "#pragma omp parallel for", recompile and boom, the loop runs on
all 8 processors.

  I'd like to compare OpenMP, TBB, and ITK's MultiThreader to gain an
understanding of functionality/performance comparisons, perhaps on a
simple gaussian filter.  This is a sideline project, so won't happen
overnight.

-dan

Simon Warfield wrote:
> There are efficient paradigms for maintaining a collection of
> threads, and supplying them with work, and having them efficiently
> wait for work on a condition variable when none is available.  One of
> the most general purpose structures for this is to use a work-pile
> organization, and explanations of this and other strategies are in
> the book 'Programming with Threads' by Kleiman, Shah, and Smaalders. 
> There used to be code available from the book examples from Sun.      
> 
> It is clear from experience in the parallel computing field that with
> an efficient workpile implementation, it is far more efficient to
> maintain a group of live threads and hand them work when required,
> than to create and join repeatedly.  Also, it makes it
> straightforward to use dynamic load balancing rather than static load
> balancing, and in many medical imaging situations, this leads to
> improved performance.      
> 
> More recently, rather than hand-craft these implementations, it has
> become popular to ask the compiler to parallelize loops when
> instances of the loop can be executed in parallel.  OpenMP provides a
> simple and portable means of doing so http://www.openmp.org ,
> http://en.wikipedia.org/wiki/OpenMP.    
> 
> In the past, it has not been possible to effectively reuse any of the
> SMP parallelization strategies on distributed memory clusters. 
> Recently, Intel made a release of their Cluster OpenMP product, which
> takes openmp code and uses a virtual shared memory implementation to
> provide parallelization over a distributed memory cluster :  
> http://www.intel.com/cd/software/products/asmo-na/eng/329023.htm
>   I expect that such products will continue to improve over time.
> 
>   My suggestion is to try to simplify the loop structure such that
> you can take advantage of OpenMP, and simply insert a couple of
> compiler directives to get automatic parallelization.  
> 
> --
> Simon
> 
> 
>> Date: Sat, 28 Jul 2007 09:32:23 -0700
>> From: Torsten Rohlfing <torsten at synapse.sri.com>
>> Subject: [Insight-developers] Multi-threading strategies
>> To: insight-developers at itk.org
>> Message-ID: <46AB6F97.9070700 at synapse.sri.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Hi --
>> 
>> I think you need to consider also that there's a cost to suspending
>> and re-activating a thread. Do you know how you're going to do it? I
>> assume a condition variable or something?
>> 
>>  From my personal experience, I can say that I considered this option
>> once over creating new threads, and I tried it to some extent, but it
>> did not lead to any tangible benefit using pthreads on Linux.
>> Basically, the cost of using the condition variable with the added
>> complexity of the implementation completely eliminated any benefit
>> from avoiding thread creation and joining. There may of course be
>> differences depending on your platform and the efficiency of its
>> threads implementation. 
>> 
>> Which certainly still leaves the one advantage that by keeping
>> threads 
>> around you avoid those incredibly annoying thread
>> creation/annihilation messages in gdb ;)
>> 
>> Cheers!
>>   Torsten
>> 
>> 
>>> That is definitely the preferred method...go for it! :)
>>> 
>>> Stephen
>>> 
>>> Blezek, Daniel J (GE, Research) wrote:
>>> 
>>>> / Hi all,
>>>> 
>>> />/
>>> />/   I was debugging a multi-threaded registration metric today,
>>> and gdb />/ nicely prints thread creation/destruction messages.  In
>>> our 
>>> current />/ MultiThreader, pthreads are created/joined in the
>>> scatter/gather />/ pattern.  For a general filter, this isn't likely
>>> to be a problem, />/ 'cause it executes only once (in general).  For
>>> optimization metrics, it />/ may be called thousands of times,
>>> leading to a huge number of pthreads />/ created/joined.  Is this
>>> efficient?  Would it be worth while to />/ investigate keeping
>>> threads around, rather than joining them?  They />/ could simply sit
>>> idle until they have something to do...  This would />/ reduce
>>> overhead, but may add complexity, but we only need to get it />/
>>> right once... />/ />/   Stephen Aylward: any comments?
>>> />/ /