[Insight-developers] Multi-threading strategies

Mon Sep 10 16:06:38 EDT 2007

Stephen,

That is a good point. Maybe it's inactive because they moslty solved the
problem and have a STABLE API.

Bill

On 9/10/07, Stephen R. Aylward <Stephen.Aylward at kitware.com> wrote:
>
> More background...
>
> We have been wanting to evaluate methods for using thread pools in ITK.
>   In particular, for registration metric computation, thread creation
> and destruction seems to induce an overhead - particularly for 32+
> processor systems and during optimization with millions of calls to the
> metric.  Our hope was that thread pools would reduce that overhead.
>
> Dan independently found zthreads and began evaluating it.  For the sake
> of evaluation of the technology, and if the technology proves useful, it
> is probably better for us to begin with an inactive project (if it does
> what we want) rather than reinventing it ourselves.
>
> Of course, I have much to learn about zthreads particularities - it may
> end up being a bad starting point...I think it is still a good direction
> to pursue...we are waiting for our funding to start and then we'll jump
> back into the foray...
>
> Stephen
>
> Bill Lorensen wrote:
> > But, looking at the sourceforge site, the zthreads project appears to be
> > inactive. And questions in the forums about porting seem to go
> unanswered.
> >
> > Bill
> >
> >
> > On 9/10/07, *Stephen R. Aylward* <Stephen.Aylward at kitware.com
> > <mailto:Stephen.Aylward at kitware.com>> wrote:
> >
> >     Hi Dan,
> >
> >     I agree - I wouldn't expect the thread pool to pay off when
> processing a
> >     single filter - the concept of a pool pays off when processing a
> >     sequence of filters that would otherwise involve multiple thread
> >     creations and destructions.
> >
> >     Even if zthreads doesn't pay off much for the main ITK pipeline (the
> >     improvement may only be minor for the 2-3 filter pipelines that are
> >     commonly used in ITK programs), I still think we should strongly
> >     consider it since specialized (i.e., tailored, within-filter)
> >     multi-threading is needed for deformable registration, DTI fiber
> >     tracking, registration metric computation, etc.
> >
> >     Stephen
> >
> >
> >
> >     Blezek, Daniel J (GE, Research) wrote:
> >      > Hi Gaëtan,
> >      >
> >      > I used an ITK Time Probe Collector, which I think reports in
> seconds.
> >      > I'm a little suprised at the 16 chunk results, and don't trust
> them.
> >      > I'll try the empty filter, but I think it will be very hard to
> time,
> >      > perhaps a profile'd run would be more helpful.  I'll also post
> >      > results of the bigger radius (didn't occur to me at the time).
> >      >
> >      > To answer Bill's question: I don't think we can conclusively say
> that
> >      > ZThreads are slower. They seem to be on par with the ITK version,
> but
> >      > I created the thread pool inside the filter, rather than a global
> >      > pool.  I'll refactor the code to create the thread pool outside
> the
> >      > filter and run this all again.
> >      >
> >      > -dan
> >      >
> >      > -----Original Message----- From: Gaëtan Lehmann
> >      > [mailto: gaetan.lehmann at jouy.inra.fr
> >     <mailto:gaetan.lehmann at jouy.inra.fr>] Sent: Friday, September 07,
> 2007
> >      > 3:13 PM To: Blezek, Daniel J (GE, Research) Subject: Re:
> >      > [Insight-developers] Multi-threading strategies
> >      >
> >      >
> >      > Hi Dan,
> >      >
> >      > The execution times are in seconds? If yes, can you tell us how
> you
> >      > have measured the execution times of the median filters ? The
> result
> >      > with 16 chunks is really surprising, and, from my (small)
> experience
> >      > in measuring execution times with ITK, can't be explained only by
> the
> >      > overhead of the thread management.
> >      >
> >      > It would also be interesting to have the execution time of a
> filter
> >      > which does nothing else than creating the threads (by
> >     implementing an
> >      > empty ThreadedGenerateData() for example).
> >      >
> >      > To have longer execution time, you can simply run the median with
> a
> >      > bigger radius - the execution times should increase dramatically
> :-)
> >      >
> >      > Gaëtan
> >      >
> >      >
> >      >
> >      > Le 7 sept. 07 à 20:47, Blezek, Daniel J (GE, Research) a écrit :
> >      >
> >      >> Hi all,
> >      >>
> >      >> I've done some looking around and found the ZThread library
> >      >> (http:// zthread.sourceforge.net/index.html
> >     <http://zthread.sourceforge.net/index.html> &
> >      >> http://sourceforge.net/ projects/zthread).  It's cross-platform
> and
> >      >> purports to compile on Linux and Windows, but I only tried
> Linux.
> >      >> The library has many constructs for threading including a thread
> >      >> pool execution model where you state how many threads you'd like
> >      >> and then  feed it jobs.  I replaced the GenerateData in the
> Median
> >      >> filter with ZThread library calls and ran some tests on a 2 CPU
> and
> >      >> 8 CPU Linux boxes, running RedHat.  I also varied the number of
> >      >> chunks each filter was divided into.  ITK uses the number of
> >      >> threads to split the work.
> >      >>
> >      >> The reports below compare the ZThread (MedianZ) with the regular
> >      >> ITK thread model (Median).
> >      >>
> >      >> 8 CPU, 8 chunks Probe Tag    Starts    Stops           Time
> Median
> >      >> 1            1 0.373023109044879674911499023438MedianZ           1
> >      >> 1 0.410052934079430997371673583984
> >      >>
> >      >> 2 CPU, 2 chunks Probe Tag    Starts    Stops           Time
> Median
> >      >> 1            1 2.50991311680991202592849731445 MedianZ
> 1
> >      >> 1 2.42412604950368404388427734375
> >      >>
> >      >> 8 CPU, 16 chunks Probe Tag    Starts    Stops           Time
> Median
> >      >> 1            1 0.412385921692475676536560058594MedianZ           1
> >      >> 1 2.42693911609239876270294189453
> >      >>
> >      >> 2 CPU, 4 chunks Probe Tag    Starts    Stops           Time
> Median
> >      >> 1            1 3.93622599844820797443389892578 MedianZ
> 1
> >      >> 1 4.21256111224647611379623413086
> >      >>
> >      >>
> >      >> I think the 8 CPU, 16 chunks is a bit skewed, as the jobs are
> short
> >      >>  enough that thread synchronization really slows everything down
> a
> >      >> bit. I imagine 8 way overhead is a bit higher than 2 way.  On
> the 2
> >      >> CPU machine, the overhead was minimal.
> >      >>
> >      >> The Median image filter is a bad example as it runs so quickly:
> >      >> suggestions for a better test are welcome.
> >      >>
> >      >> Here's the relevant code from my testing, I can include all of
> it
> >      >> for interested parties.  There is very little change from
> >      >> itkImageSource's implementation.  In this case, I create the
> >      >> threads inside the filter, so thread creation is part of the
> >      >> overhead.  In practice they would be in a global accessible pool
> to
> >      >> be used by all executing filters.
> >      >>
> >      >> Comments welcome, -dan
> >      >>
> >      >>
> >      >>
> >
> //--------------------------------------------------------------------
> >      >>  -------- template< class TInputImage, class TOutputImage > void
> >      >> MedianZThreadImageFilter<TInputImage, TOutputImage>
> >      >> ::GenerateData() { // Call a method that can be overriden by a
> >      >> subclass to allocate // memory for the filter's outputs
> >      >> this->AllocateOutputs();
> >      >>
> >      >> // Call a method that can be overridden by a subclass to perform
> //
> >      >> some calculations prior to splitting the main computations into
> //
> >      >> separate threads this->BeforeThreadedGenerateData();
> >      >>
> >      >>
> >      >> // Do this with ZThread's ZThread::PoolExecutor
> >      >> executor(this->GetMultiThreader()-
> >      >>> GetNumberOfThreads());
> >      >> typename TOutputImage::RegionType splitRegion; int
> NumberOfPieces =
> >      >> 2 * this->GetMultiThreader()-
> >      >>> GetNumberOfThreads();
> >      >> try { for ( int i = 0; i < NumberOfPieces; i++ ) {
> ZThreadStruct* s
> >      >> = new ZThreadStruct(); s->threadId = i; s->Filter = this;
> >      >> this->SplitRequestedRegion(s->threadId, NumberOfPieces,
> >      >> splitRegion); s->region = splitRegion; executor.execute ( s ); }
> //
> >      >> Let it all finish executor.wait(); } catch (
> >      >> ZThread::Synchronization_Exception &e ) {
> itkGenericExceptionMacro
> >      >> ( << "Error adding runnable to executor: " << e.what() ); }
> >      >>
> >      >> // Call a method that can be overridden by a subclass to perform
> //
> >      >> some calculations after all the threads have completed
> >      >> this->AfterThreadedGenerateData();
> >      >>
> >      >> }
> >      >>
> >      >>
> >      >>
> >      >> -----Original Message----- From:
> >      >> insight-developers-bounces+blezek=crd.ge.com at itk.org
> >     <mailto:crd.ge.com at itk.org>
> >      >> [mailto: insight-developers-bounces+blezek=crd.ge.com at itk.org
> >     <mailto:insight-developers-bounces+blezek=crd.ge.com at itk.org>] On
> >      >> Behalf Of Torsten Rohlfing
> >      >>
> >      >> Sent: Saturday, July 28, 2007 12:32 PM To:
> >      >> insight-developers at itk.org <mailto:insight-developers at itk.org>
> >     Subject: [Insight-developers]
> >      >> Multi-threading strategies
> >      >>
> >      >> Hi --
> >      >>
> >      >> I think you need to consider also that there's a cost to
> suspending
> >      >>  and re-activating a thread. Do you know how you're going to do
> it?
> >      >>  I assume a condition variable or something?
> >      >>
> >      >> From my personal experience, I can say that I considered this
> >      >> option once over creating new threads, and I tried it to some
> >      >> extent, but it did not lead to any tangible benefit using
> pthreads
> >      >> on Linux. Basically, the cost of using the condition variable
> with
> >      >> the added complexity of the implementation completely eliminated
> >      >> any benefit from avoiding thread creation and joining. There may
> of
> >      >> course be differences depending on your platform and the
> efficiency
> >      >> of its threads implementation.
> >      >>
> >      >> Which certainly still leaves the one advantage that by keeping
> >      >> threads around you avoid those incredibly annoying thread
> creation/
> >      >>  annihilation messages in gdb ;)
> >      >>
> >      >> Cheers! Torsten
> >      >>
> >      >>> That is definitely the preferred method...go for it! :)
> >      >>>
> >      >>> Stephen
> >      >>>
> >      >>> Blezek, Daniel J (GE, Research) wrote:
> >      >>>> / Hi all,
> >      >>> />/ />/   I was debugging a multi-threaded registration metric
> >      >>> today,
> >      >> and gdb
> >      >>> />/ nicely prints thread creation/destruction messages.  In our
> >      >>> current />/ MultiThreader, pthreads are created/joined in the
> >      >>> scatter/gather />/ pattern.  For a general filter, this isn't
> >      >>> likely to be a problem, />/ 'cause it executes only once (in
> >      >>> general).  For optimization metrics, it />/ may be called
> >      >>> thousands of times,
> >      >> leading
> >      >>> to a huge number of pthreads />/ created/joined.  Is this
> >      >>> efficient? Would it be worth while to />/ investigate keeping
> >      >>> threads around, rather than joining them?  They />/ could
> simply
> >      >>> sit idle until they have something to do...  This would />/
> >      >>> reduce overhead, but may add complexity, but we only need to
> get
> >      >>> it />/ right once... />/ />/   Stephen Aylward: any comments?
> />/
> >      >>> /
> >      >> -- Torsten Rohlfing, PhD          SRI International,
> Neuroscience
> >      >> Program Research Scientist             333 Ravenswood Ave, Menlo
> >      >> Park, CA 94025 Phone: ++1 (650) 859-3379      Fax: ++1 (650)
> >      >> 859-2743 torsten at synapse.sri.com <mailto:torsten at synapse.sri.com
> >
> >      >> http://www.stanford.edu/~rohlfing/
> >      >>
> >      >> "Though this be madness, yet there is a method in't"
> >      >>
> >      >> _______________________________________________
> Insight-developers
> >      >> mailing list Insight-developers at itk.org
> >     <mailto:Insight-developers at itk.org>
> >      >> http://www.itk.org/mailman/listinfo/insight-developers
> >     <http://www.itk.org/mailman/listinfo/insight-developers>
> >      >
> >      > -- Gaëtan Lehmann Biologie du Développement et de la Reproduction
> >      > INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66    fax: 01
> 34
> >      > 65 29 09 http://voxel.jouy.inra.fr <http://voxel.jouy.inra.fr>
> >      >
> >      >
> >      >
> >      > _______________________________________________
> Insight-developers
> >      > mailing list Insight-developers at itk.org
> >     <mailto:Insight-developers at itk.org>
> >      > http://www.itk.org/mailman/listinfo/insight-developers
> >      >
> >
> >     --
> >     =============================================================
> >     Stephen R. Aylward, Ph.D.
> >     Chief Medical Scientist
> >     Kitware, Inc. - Chapel Hill Office
> >     http://www.kitware.com
> >     Phone: (518)371-3971 x300
> >
> >
>
> --
> =============================================================
> Stephen R. Aylward, Ph.D.
> Chief Medical Scientist
> Kitware, Inc. - Chapel Hill Office
> http://www.kitware.com
> Phone: (518)371-3971 x300
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.itk.org/mailman/private/insight-developers/attachments/20070910/e5de7a4f/attachment.html