[Insight-developers] Threading image metrics
Karthik Krishnan
Karthik.Krishnan at kitware.com
Mon Mar 14 10:49:42 EST 2005
Dear Jim,
Thank you for your quick reply.
I cannot offer a reasonable explanation for the behaviour, but if you
try the version checked into CVS now and compare it to rev 1.5, it does
offer a 15-20% speedup on linuxgcc335. My recollection is that it did on
windows too (vs7).
The revision right now does pretty much the same as what you've
described since the ivars m_ThreadMatches[threadId] and
m_ThreadCounts[threadId] are accesssed only once per call to
ThreadedGetValue, which is called GetNumberOfThreads() per metric
evaluation.
Its still puzzling because although the ivars m_ThreadCounts are shared
data, they aren't mutable, so I don't know why it would slow the thread
down. But that was the only conclusion I could derive after setting
several time probes.
Thanks
regards
karthik
Miller, James V (Research) wrote:
> Karthik,
>
> This problems still befuddles me. The MatchCardinality did run slower
> on some systems when threaded. On other systems, multithreaded had
> its expected benefit, near linear speedup in the number of processors.
>
> Is your m_NumberOfPixelCounted a local variable? If not, you could
> have multiple threads
> attempting to write to the same memory.
>
> Another option to try is to change the function declaration for
> ThreadedGetValue to be
>
> ::ThreadedGetValue( const FixedImageRegionType ®ionForThread,
> int threadId, unsigned long &count, double &metric )
> where count and metric would take the place of
> m_ThreadMatches[threadId] and m_ThreadCounts[threadId]. This just
> moves the problem a bit. You still need to rollup
> the counts and metric from across the threads.
>
> Jim
>
>
>
>
> -----Original Message-----
> *From:* insight-developers-bounces at itk.org
> [mailto:insight-developers-bounces at itk.org]*On Behalf Of *Karthik
> Krishnan
> *Sent:* Monday, March 14, 2005 2:29 AM
> *To:* Insight-developers (E-mail)
> *Subject:* [Insight-developers] Threading image metrics
>
> Hi,
>
> I have been trying to thread the MeanSquares metric in an attempt
> to get registration to run faster on my dual processor machine.
> After following an architectrure similar to the one in
> itkMatchCardinalityImageToImageMetric.txx, I was able to thread it
> and happily both processors were used. However this did not speed
> up execution times. In fact threading slowed it down by 2%.
>
> Surprised, I decided to go back to the MatchCardinality metric. A
> google search turned up the following post:
> http://www.itk.org/pipermail/insight-users/2004-May/008553.html
> which also ran 2% slower on both linuxgcc335 (-O2-g build) and on
> Debug builds in VS7.
>
> It turns out that using m_ThreadMatches[threadId] and
> m_ThreadCounts[threadId] variables within the iterator in the
> ThreadedGetValue function introduces something akin to a race
> condition. Here is the hypothesis: m_ThreadMatches/Counts are
> ivars representing contiguous indices in memory. Moving it outside
> the iterator and using local vars within the iterator seems to help.
>
> After the changes, the threaded version runs faster on VS7 and
> linuxgcc335. Still its just a 17% reduction in time on
> linuxgcc335. (There don't seem any additional mutexes). I was
> wondering if there were any followups on the mail-thread above.
>
> Thanks
> Regards
> Karthik
>
> ::ThreadedGetValue( const FixedImageRegionType ®ionForThread,
> int threadId )
> {
> itk::TimeProbe MultiThreadClk1;
> MultiThreadClk1.Start();
>
> //m_ThreadMatches[threadId] = NumericTraits< MeasureType >::Zero;
> //m_ThreadCounts[threadId] = 0;
> MeasureType measure = NumericTraits< MeasureType >::Zero;
> m_NumberOfPixelsCounted = 0;
>
> while(!ti.IsAtEnd())
> {
> index = ti.GetIndex();
>
> typename Superclass::InputPointType inputPoint;
> fixedImage->TransformIndexToPhysicalPoint( index, inputPoint );
>
> if( this->GetFixedImageMask() &&
> !this->GetFixedImageMask()->IsInside( inputPoint ) )
> {
> ++ti;
> continue;
> }
>
> typename Superclass::OutputPointType
> transformedPoint = this->GetTransform()->TransformPoint(
> inputPoint );
>
> if( this->GetMovingImageMask() &&
> !this->GetMovingImageMask()->IsInside( transformedPoint ) )
> {
> ++ti;
> continue;
> }
>
> if( this->GetInterpolator()->IsInsideBuffer( transformedPoint ) )
> {
> const RealType movingValue=
> this->GetInterpolator()->Evaluate( transformedPoint );
> const RealType fixedValue = ti.Get();
> RealType diff;
>
> //m_ThreadCounts[threadId]++;
> m_NumberOfPixelsCounted++;
>
> if (m_MeasureMatches)
> {
> diff = (movingValue == fixedValue); // count matches
> }
> else
> {
> diff = (movingValue != fixedValue); // count mismatches
> }
> //m_ThreadMatches[threadId] += diff;
> measure += diff;
> }
>
> ++ti;
> }
>
> m_ThreadMatches[threadId] = measure;
> m_ThreadCounts[threadId] = m_NumberOfPixelsCounted;
>
>
> MultiThreadClk1.Stop();
> std::cout << " [ Time taken by Function ThreadedGetValue()
> thread(" << threadId << ")" << MultiThreadClk1.GetMeanTime() << "
> seconds ]" << std::endl;
> }
>
>
>
>
>
More information about the Insight-developers
mailing list