[Insight-developers] Multi-threading strategies

Stephen R. Aylward Stephen.Aylward at Kitware.com
Mon Sep 10 10:09:10 EDT 2007


Hi Dan,

I agree - I wouldn't expect the thread pool to pay off when processing a 
single filter - the concept of a pool pays off when processing a 
sequence of filters that would otherwise involve multiple thread 
creations and destructions.

Even if zthreads doesn't pay off much for the main ITK pipeline (the 
improvement may only be minor for the 2-3 filter pipelines that are 
commonly used in ITK programs), I still think we should strongly 
consider it since specialized (i.e., tailored, within-filter) 
multi-threading is needed for deformable registration, DTI fiber 
tracking, registration metric computation, etc.

Stephen



Blezek, Daniel J (GE, Research) wrote:
> Hi Gaëtan,
> 
> I used an ITK Time Probe Collector, which I think reports in seconds.
> I'm a little suprised at the 16 chunk results, and don't trust them.
> I'll try the empty filter, but I think it will be very hard to time,
> perhaps a profile'd run would be more helpful.  I'll also post
> results of the bigger radius (didn't occur to me at the time).
> 
> To answer Bill's question: I don't think we can conclusively say that
> ZThreads are slower. They seem to be on par with the ITK version, but
> I created the thread pool inside the filter, rather than a global
> pool.  I'll refactor the code to create the thread pool outside the
> filter and run this all again.
> 
> -dan
> 
> -----Original Message----- From: Gaëtan Lehmann
> [mailto:gaetan.lehmann at jouy.inra.fr] Sent: Friday, September 07, 2007
> 3:13 PM To: Blezek, Daniel J (GE, Research) Subject: Re:
> [Insight-developers] Multi-threading strategies
> 
> 
> Hi Dan,
> 
> The execution times are in seconds? If yes, can you tell us how you
> have measured the execution times of the median filters ? The result
> with 16 chunks is really surprising, and, from my (small) experience
> in measuring execution times with ITK, can't be explained only by the
> overhead of the thread management.
> 
> It would also be interesting to have the execution time of a filter
> which does nothing else than creating the threads (by implementing an
> empty ThreadedGenerateData() for example).
> 
> To have longer execution time, you can simply run the median with a
> bigger radius - the execution times should increase dramatically :-)
> 
> Gaëtan
> 
> 
> 
> Le 7 sept. 07 à 20:47, Blezek, Daniel J (GE, Research) a écrit :
> 
>> Hi all,
>> 
>> I've done some looking around and found the ZThread library
>> (http:// zthread.sourceforge.net/index.html &
>> http://sourceforge.net/ projects/zthread).  It's cross-platform and
>> purports to compile on Linux and Windows, but I only tried Linux.
>> The library has many constructs for threading including a thread
>> pool execution model where you state how many threads you'd like
>> and then  feed it jobs.  I replaced the GenerateData in the Median
>> filter with ZThread library calls and ran some tests on a 2 CPU and
>> 8 CPU Linux boxes, running RedHat.  I also varied the number of
>> chunks each filter was divided into.  ITK uses the number of
>> threads to split the work.
>> 
>> The reports below compare the ZThread (MedianZ) with the regular
>> ITK thread model (Median).
>> 
>> 8 CPU, 8 chunks Probe Tag    Starts    Stops           Time Median
>> 1            1 0.373023109044879674911499023438 MedianZ           1
>> 1 0.410052934079430997371673583984
>> 
>> 2 CPU, 2 chunks Probe Tag    Starts    Stops           Time Median
>> 1            1 2.50991311680991202592849731445 MedianZ           1
>> 1 2.42412604950368404388427734375
>> 
>> 8 CPU, 16 chunks Probe Tag    Starts    Stops           Time Median
>> 1            1 0.412385921692475676536560058594 MedianZ           1
>> 1 2.42693911609239876270294189453
>> 
>> 2 CPU, 4 chunks Probe Tag    Starts    Stops           Time Median
>> 1            1 3.93622599844820797443389892578 MedianZ           1
>> 1 4.21256111224647611379623413086
>> 
>> 
>> I think the 8 CPU, 16 chunks is a bit skewed, as the jobs are short
>>  enough that thread synchronization really slows everything down a
>> bit. I imagine 8 way overhead is a bit higher than 2 way.  On the 2
>> CPU machine, the overhead was minimal.
>> 
>> The Median image filter is a bad example as it runs so quickly: 
>> suggestions for a better test are welcome.
>> 
>> Here's the relevant code from my testing, I can include all of it
>> for interested parties.  There is very little change from
>> itkImageSource's implementation.  In this case, I create the
>> threads inside the filter, so thread creation is part of the
>> overhead.  In practice they would be in a global accessible pool to
>> be used by all executing filters.
>> 
>> Comments welcome, -dan
>> 
>> 
>> //--------------------------------------------------------------------
>>  -------- template< class TInputImage, class TOutputImage > void 
>> MedianZThreadImageFilter<TInputImage, TOutputImage> 
>> ::GenerateData() { // Call a method that can be overriden by a
>> subclass to allocate // memory for the filter's outputs 
>> this->AllocateOutputs();
>> 
>> // Call a method that can be overridden by a subclass to perform //
>> some calculations prior to splitting the main computations into //
>> separate threads this->BeforeThreadedGenerateData();
>> 
>> 
>> // Do this with ZThread's ZThread::PoolExecutor
>> executor(this->GetMultiThreader()-
>>> GetNumberOfThreads());
>> typename TOutputImage::RegionType splitRegion; int NumberOfPieces =
>> 2 * this->GetMultiThreader()-
>>> GetNumberOfThreads();
>> try { for ( int i = 0; i < NumberOfPieces; i++ ) { ZThreadStruct* s
>> = new ZThreadStruct(); s->threadId = i; s->Filter = this; 
>> this->SplitRequestedRegion(s->threadId, NumberOfPieces, 
>> splitRegion); s->region = splitRegion; executor.execute ( s ); } //
>> Let it all finish executor.wait(); } catch (
>> ZThread::Synchronization_Exception &e ) { itkGenericExceptionMacro
>> ( << "Error adding runnable to executor: " << e.what() ); }
>> 
>> // Call a method that can be overridden by a subclass to perform //
>> some calculations after all the threads have completed 
>> this->AfterThreadedGenerateData();
>> 
>> }
>> 
>> 
>> 
>> -----Original Message----- From:
>> insight-developers-bounces+blezek=crd.ge.com at itk.org 
>> [mailto:insight-developers-bounces+blezek=crd.ge.com at itk.org] On 
>> Behalf Of Torsten Rohlfing
>> 
>> Sent: Saturday, July 28, 2007 12:32 PM To:
>> insight-developers at itk.org Subject: [Insight-developers]
>> Multi-threading strategies
>> 
>> Hi --
>> 
>> I think you need to consider also that there's a cost to suspending
>>  and re-activating a thread. Do you know how you're going to do it?
>>  I assume a condition variable or something?
>> 
>> From my personal experience, I can say that I considered this
>> option once over creating new threads, and I tried it to some
>> extent, but it did not lead to any tangible benefit using pthreads
>> on Linux. Basically, the cost of using the condition variable with
>> the added complexity of the implementation completely eliminated
>> any benefit from avoiding thread creation and joining. There may of
>> course be differences depending on your platform and the efficiency
>> of its threads implementation.
>> 
>> Which certainly still leaves the one advantage that by keeping
>> threads around you avoid those incredibly annoying thread creation/
>>  annihilation messages in gdb ;)
>> 
>> Cheers! Torsten
>> 
>>> That is definitely the preferred method...go for it! :)
>>> 
>>> Stephen
>>> 
>>> Blezek, Daniel J (GE, Research) wrote:
>>>> / Hi all,
>>> />/ />/   I was debugging a multi-threaded registration metric
>>> today,
>> and gdb
>>> />/ nicely prints thread creation/destruction messages.  In our 
>>> current />/ MultiThreader, pthreads are created/joined in the 
>>> scatter/gather />/ pattern.  For a general filter, this isn't
>>> likely to be a problem, />/ 'cause it executes only once (in
>>> general).  For optimization metrics, it />/ may be called
>>> thousands of times,
>> leading
>>> to a huge number of pthreads />/ created/joined.  Is this
>>> efficient? Would it be worth while to />/ investigate keeping
>>> threads around, rather than joining them?  They />/ could simply
>>> sit idle until they have something to do...  This would />/
>>> reduce overhead, but may add complexity, but we only need to get
>>> it />/ right once... />/ />/   Stephen Aylward: any comments? />/
>>> /
>> -- Torsten Rohlfing, PhD          SRI International, Neuroscience
>> Program Research Scientist             333 Ravenswood Ave, Menlo
>> Park, CA 94025 Phone: ++1 (650) 859-3379      Fax: ++1 (650)
>> 859-2743 torsten at synapse.sri.com
>> http://www.stanford.edu/~rohlfing/
>> 
>> "Though this be madness, yet there is a method in't"
>> 
>> _______________________________________________ Insight-developers
>> mailing list Insight-developers at itk.org 
>> http://www.itk.org/mailman/listinfo/insight-developers
> 
> -- Gaëtan Lehmann Biologie du Développement et de la Reproduction
> INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66    fax: 01 34
> 65 29 09 http://voxel.jouy.inra.fr
> 
> 
> 
> _______________________________________________ Insight-developers
> mailing list Insight-developers at itk.org 
> http://www.itk.org/mailman/listinfo/insight-developers
> 

-- 
=============================================================
Stephen R. Aylward, Ph.D.
Chief Medical Scientist
Kitware, Inc. - Chapel Hill Office
http://www.kitware.com
Phone: (518)371-3971 x300


More information about the Insight-developers mailing list