[Insight-developers] Multi-threading strategies

Fri Sep 7 15:14:50 EDT 2007

Dan,

So Zthread's are slower?

Bill

On 9/7/07, Blezek, Daniel J (GE, Research) <blezek at crd.ge.com> wrote:
>
>  Hi all,
>
> I've done some looking around and found the ZThread library (*
> http://zthread.sourceforge.net/index.html*<http://zthread.sourceforge.net/index.html>&
> *http://sourceforge.net/projects/zthread*<http://sourceforge.net/projects/zthread>).
> It's cross-platform and purports to compile on Linux and Windows, but I only
> tried Linux.  The library has many constructs for threading including a
> thread pool execution model where you state how many threads you'd like and
> then  feed it jobs.  I replaced the GenerateData in the Median filter with
> ZThread library calls and ran some tests on a 2 CPU and 8 CPU Linux boxes,
> running RedHat.  I also varied the number of chunks each filter was divided
> into.  ITK uses the number of threads to split the work.
>
> The reports below compare the ZThread (MedianZ) with the regular ITK
> thread model (Median).
>
> 8 CPU, 8 chunks
>           Probe Tag    Starts    Stops           Time
>               Median           1            1
> 0.373023109044879674911499023438
>              MedianZ           1            1
> 0.410052934079430997371673583984
>
> 2 CPU, 2 chunks
>           Probe Tag    Starts    Stops           Time
>               Median           1            1
> 2.50991311680991202592849731445
>              MedianZ           1            1
> 2.42412604950368404388427734375
>
> 8 CPU, 16 chunks
>           Probe Tag    Starts    Stops           Time
>               Median           1            1
> 0.412385921692475676536560058594
>              MedianZ           1            1
> 2.42693911609239876270294189453
>
> 2 CPU, 4 chunks
>           Probe Tag    Starts    Stops           Time
>               Median           1            1
> 3.93622599844820797443389892578
>              MedianZ           1            1
> 4.21256111224647611379623413086
>
> I think the 8 CPU, 16 chunks is a bit skewed, as the jobs are short enough
> that thread synchronization really slows everything down a bit.  I imagine 8
> way overhead is a bit higher than 2 way.  On the 2 CPU machine, the overhead
> was minimal.
>
> The Median image filter is a bad example as it runs so quickly:
> suggestions for a better test are welcome.
>
> Here's the relevant code from my testing, I can include all of it for
> interested parties.  There is very little change from itkImageSource's
> implementation.  In this case, I create the threads inside the filter, so
> thread creation is part of the overhead.  In practice they would be in a
> global accessible pool to be used by all executing filters.
>
> Comments welcome,
> -dan
>
>
> //----------------------------------------------------------------------------
> template< class TInputImage, class TOutputImage >
> void
> MedianZThreadImageFilter<TInputImage, TOutputImage>
> ::GenerateData()
> {
>   // Call a method that can be overriden by a subclass to allocate
>   // memory for the filter's outputs
>   this->AllocateOutputs();
>
>   // Call a method that can be overridden by a subclass to perform
>   // some calculations prior to splitting the main computations into
>   // separate threads
>   this->BeforeThreadedGenerateData();
>
>   // Do this with ZThread's
>   ZThread::PoolExecutor
> executor(this->GetMultiThreader()->GetNumberOfThreads());
>   typename TOutputImage::RegionType splitRegion;
>   int NumberOfPieces = 2 * this->GetMultiThreader()->GetNumberOfThreads();
>   try
>     {
>     for ( int i = 0; i < NumberOfPieces; i++ )
>       {
>       ZThreadStruct* s = new ZThreadStruct();
>       s->threadId = i;
>       s->Filter = this;
>       this->SplitRequestedRegion(s->threadId, NumberOfPieces,
> splitRegion);
>       s->region = splitRegion;
>       executor.execute ( s );
>       }
>     // Let it all finish
>     executor.wait();
>     }
>   catch ( ZThread::Synchronization_Exception &e )
>     {
>     itkGenericExceptionMacro ( << "Error adding runnable to executor: " <<
> e.what() );
>     }
>
>   // Call a method that can be overridden by a subclass to perform
>   // some calculations after all the threads have completed
>   this->AfterThreadedGenerateData();
>
> }
>
>
> -----Original Message-----
> From: insight-developers-bounces+blezek=crd.ge.com at itk.org [*
> mailto:insight-developers-bounces+blezek=crd.ge.com at itk.org*<insight-developers-bounces+blezek=crd.ge.com at itk.org>]
> On Behalf Of Torsten Rohlfing
>
> Sent: Saturday, July 28, 2007 12:32 PM
> To: insight-developers at itk.org
> Subject: [Insight-developers] Multi-threading strategies
>
> Hi --
>
> I think you need to consider also that there's a cost to suspending and
> re-activating a thread. Do you know how you're going to do it? I assume a
> condition variable or something?
>
>  From my personal experience, I can say that I considered this option once
> over creating new threads, and I tried it to some extent, but it did not
> lead to any tangible benefit using pthreads on Linux. Basically, the cost of
> using the condition variable with the added complexity of the implementation
> completely eliminated any benefit from avoiding thread creation and joining.
> There may of course be differences depending on your platform and the
> efficiency of its threads implementation.
>
> Which certainly still leaves the one advantage that by keeping threads
> around you avoid those incredibly annoying thread creation/annihilation
> messages in gdb ;)
>
> Cheers!
>   Torsten
>
> > That is definitely the preferred method...go for it! :)
> >
> > Stephen
> >
> > Blezek, Daniel J (GE, Research) wrote:
> > >/ Hi all,
> > />/
> > />/   I was debugging a multi-threaded registration metric today, and
> gdb
> > />/ nicely prints thread creation/destruction messages.  In our
> > current />/ MultiThreader, pthreads are created/joined in the
> > scatter/gather />/ pattern.  For a general filter, this isn't likely
> > to be a problem, />/ 'cause it executes only once (in general).  For
> > optimization metrics, it />/ may be called thousands of times, leading
> > to a huge number of pthreads />/ created/joined.  Is this efficient?
> > Would it be worth while to />/ investigate keeping threads around,
> > rather than joining them?  They />/ could simply sit idle until they
> > have something to do...  This would />/ reduce overhead, but may add
> > complexity, but we only need to get it />/ right once...
> > />/
> > />/   Stephen Aylward: any comments?
> > />/ /
>
> --
> Torsten Rohlfing, PhD          SRI International, Neuroscience Program
>  Research Scientist             333 Ravenswood Ave, Menlo Park, CA 94025
>   Phone: ++1 (650) 859-3379      Fax: ++1 (650) 859-2743
>    torsten at synapse.sri.com        *http://www.stanford.edu/~rohlfing/*<http://www.stanford.edu/%7Erohlfing/>
>
>      "Though this be madness, yet there is a method in't"
>
> _______________________________________________
> Insight-developers mailing list
> Insight-developers at itk.org
> http://www.itk.org/mailman/listinfo/insight-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.itk.org/mailman/private/insight-developers/attachments/20070907/9d240ab9/attachment-0001.htm