[Insight-developers] Multi-threading strategies
Blezek, Daniel J (GE, Research)
blezek at crd.ge.com
Mon Sep 10 07:24:19 EDT 2007
Hi Gaëtan,
I used an ITK Time Probe Collector, which I think reports in seconds. I'm a little suprised at the 16 chunk results, and don't trust them. I'll try the empty filter, but I think it will be very hard to time, perhaps a profile'd run would be more helpful. I'll also post results of the bigger radius (didn't occur to me at the time).
To answer Bill's question: I don't think we can conclusively say that ZThreads are slower. They seem to be on par with the ITK version, but I created the thread pool inside the filter, rather than a global pool. I'll refactor the code to create the thread pool outside the filter and run this all again.
-dan
-----Original Message-----
From: Gaëtan Lehmann [mailto:gaetan.lehmann at jouy.inra.fr]
Sent: Friday, September 07, 2007 3:13 PM
To: Blezek, Daniel J (GE, Research)
Subject: Re: [Insight-developers] Multi-threading strategies
Hi Dan,
The execution times are in seconds?
If yes, can you tell us how you have measured the execution times of the median filters ? The result with 16 chunks is really surprising, and, from my (small) experience in measuring execution times with ITK, can't be explained only by the overhead of the thread management.
It would also be interesting to have the execution time of a filter which does nothing else than creating the threads (by implementing an empty ThreadedGenerateData() for example).
To have longer execution time, you can simply run the median with a bigger radius - the execution times should increase dramatically :-)
Gaëtan
Le 7 sept. 07 à 20:47, Blezek, Daniel J (GE, Research) a écrit :
> Hi all,
>
> I've done some looking around and found the ZThread library (http://
> zthread.sourceforge.net/index.html & http://sourceforge.net/
> projects/zthread). It's cross-platform and purports to compile on
> Linux and Windows, but I only tried Linux. The library has many
> constructs for threading including a thread pool execution model where
> you state how many threads you'd like and then feed it jobs. I
> replaced the GenerateData in the Median filter with ZThread library
> calls and ran some tests on a 2 CPU and 8 CPU Linux boxes, running
> RedHat. I also varied the number of chunks each filter was divided
> into. ITK uses the number of threads to split the work.
>
> The reports below compare the ZThread (MedianZ) with the regular ITK
> thread model (Median).
>
> 8 CPU, 8 chunks
> Probe Tag Starts Stops Time
> Median 1 1
> 0.373023109044879674911499023438
> MedianZ 1 1
> 0.410052934079430997371673583984
>
> 2 CPU, 2 chunks
> Probe Tag Starts Stops Time
> Median 1 1
> 2.50991311680991202592849731445
> MedianZ 1 1
> 2.42412604950368404388427734375
>
> 8 CPU, 16 chunks
> Probe Tag Starts Stops Time
> Median 1 1
> 0.412385921692475676536560058594
> MedianZ 1 1
> 2.42693911609239876270294189453
>
> 2 CPU, 4 chunks
> Probe Tag Starts Stops Time
> Median 1 1
> 3.93622599844820797443389892578
> MedianZ 1 1
> 4.21256111224647611379623413086
>
>
> I think the 8 CPU, 16 chunks is a bit skewed, as the jobs are short
> enough that thread synchronization really slows everything down a bit.
> I imagine 8 way overhead is a bit higher than 2 way. On the
> 2 CPU machine, the overhead was minimal.
>
> The Median image filter is a bad example as it runs so quickly:
> suggestions for a better test are welcome.
>
> Here's the relevant code from my testing, I can include all of it for
> interested parties. There is very little change from itkImageSource's
> implementation. In this case, I create the threads inside the filter,
> so thread creation is part of the overhead. In practice they would be
> in a global accessible pool to be used by all executing filters.
>
> Comments welcome,
> -dan
>
>
> //--------------------------------------------------------------------
> --------
> template< class TInputImage, class TOutputImage > void
> MedianZThreadImageFilter<TInputImage, TOutputImage>
> ::GenerateData()
> {
> // Call a method that can be overriden by a subclass to allocate
> // memory for the filter's outputs
> this->AllocateOutputs();
>
> // Call a method that can be overridden by a subclass to perform
> // some calculations prior to splitting the main computations into
> // separate threads
> this->BeforeThreadedGenerateData();
>
>
> // Do this with ZThread's
> ZThread::PoolExecutor executor(this->GetMultiThreader()-
> >GetNumberOfThreads());
> typename TOutputImage::RegionType splitRegion;
> int NumberOfPieces = 2 * this->GetMultiThreader()-
> >GetNumberOfThreads();
> try
> {
> for ( int i = 0; i < NumberOfPieces; i++ )
> {
> ZThreadStruct* s = new ZThreadStruct();
> s->threadId = i;
> s->Filter = this;
> this->SplitRequestedRegion(s->threadId, NumberOfPieces,
> splitRegion);
> s->region = splitRegion;
> executor.execute ( s );
> }
> // Let it all finish
> executor.wait();
> }
> catch ( ZThread::Synchronization_Exception &e )
> {
> itkGenericExceptionMacro ( << "Error adding runnable to
> executor: " << e.what() );
> }
>
> // Call a method that can be overridden by a subclass to perform
> // some calculations after all the threads have completed
> this->AfterThreadedGenerateData();
>
> }
>
>
>
> -----Original Message-----
> From: insight-developers-bounces+blezek=crd.ge.com at itk.org
> [mailto:insight-developers-bounces+blezek=crd.ge.com at itk.org] On
> Behalf Of Torsten Rohlfing
>
> Sent: Saturday, July 28, 2007 12:32 PM
> To: insight-developers at itk.org
> Subject: [Insight-developers] Multi-threading strategies
>
> Hi --
>
> I think you need to consider also that there's a cost to suspending
> and re-activating a thread. Do you know how you're going to do it?
> I assume a condition variable or something?
>
> From my personal experience, I can say that I considered this option
> once over creating new threads, and I tried it to some extent, but it
> did not lead to any tangible benefit using pthreads on Linux.
> Basically, the cost of using the condition variable with the added
> complexity of the implementation completely eliminated any benefit
> from avoiding thread creation and joining. There may of course be
> differences depending on your platform and the efficiency of its
> threads implementation.
>
> Which certainly still leaves the one advantage that by keeping threads
> around you avoid those incredibly annoying thread creation/
> annihilation messages in gdb ;)
>
> Cheers!
> Torsten
>
> > That is definitely the preferred method...go for it! :)
> >
> > Stephen
> >
> > Blezek, Daniel J (GE, Research) wrote:
> > >/ Hi all,
> > />/
> > />/ I was debugging a multi-threaded registration metric today,
> and gdb
> > />/ nicely prints thread creation/destruction messages. In our
> > current />/ MultiThreader, pthreads are created/joined in the
> > scatter/gather />/ pattern. For a general filter, this isn't likely
> > to be a problem, />/ 'cause it executes only once (in general). For
> > optimization metrics, it />/ may be called thousands of times,
> leading
> > to a huge number of pthreads />/ created/joined. Is this efficient?
> > Would it be worth while to />/ investigate keeping threads around,
> > rather than joining them? They />/ could simply sit idle until they
> > have something to do... This would />/ reduce overhead, but may add
> > complexity, but we only need to get it />/ right once...
> > />/
> > />/ Stephen Aylward: any comments?
> > />/ /
>
> --
> Torsten Rohlfing, PhD SRI International, Neuroscience Program
> Research Scientist 333 Ravenswood Ave, Menlo Park, CA
> 94025
> Phone: ++1 (650) 859-3379 Fax: ++1 (650) 859-2743
> torsten at synapse.sri.com http://www.stanford.edu/~rohlfing/
>
> "Though this be madness, yet there is a method in't"
>
> _______________________________________________
> Insight-developers mailing list
> Insight-developers at itk.org
> http://www.itk.org/mailman/listinfo/insight-developers
--
Gaëtan Lehmann
Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66 fax: 01 34 65 29 09
http://voxel.jouy.inra.fr
More information about the Insight-developers
mailing list