[Insight-developers] itk performance numbers

Thu Jul 26 11:53:32 EDT 2012

Hello,

Well I did get to it before you:

http://review.source.kitware.com/#/c/6614/

I also uped the size of the image 100x  in your test, here is the current performance on my system:

System: victoria.nlm.nih.gov
Processor: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
 Serial #: 
    Cache: 32768
    Clock: 2794.27
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 3.20.1
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58374
           Probe Name:        Count          Min           Mean         Stdev            Max        Total 
 MeanSquares_1_threads            20      0.344348      0.347567    0.00244733      0.352629       6.95134
 MeanSquares_2_threads            20      0.251223      0.300869     0.0179305      0.321404       6.01738
 MeanSquares_4_threads            20      0.215516      0.348677      0.173645      0.678274       6.97355
 MeanSquares_8_threads            20      0.138184      0.182681     0.0297812      0.237129       3.65362
System: victoria.nlm.nih.gov
Processor: 
 Serial #: 
    Cache: 32768
    Clock: 2930
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58371
           Probe Name:        Count          Min           Mean         Stdev            Max        Total 
 MeanSquares_1_threads            20      0.382481      0.383342    0.00186954      0.391027       7.66685
 MeanSquares_2_threads            20      0.211908      0.335328     0.0777408      0.435574       6.70655
 MeanSquares_4_threads            20      0.271531      0.315688     0.0390751      0.385683       6.31377
 MeanSquares_8_threads            20      0.147544      0.192132     0.0299427      0.240976       3.84263

In the patch provided, it is implicitly done on assignment on a per-thread basis. What was most un-expected was when then allocation of the Jacobin was explicitly done out side the threaded part, the time when up by 50%! I presume that the sequential allocation, of the doubles in the master thread made the allocation sequentially, next to each other, and may be a more insidious form of false sharing. Below is the numbers from this run, notice the lack of speed up with more threads:

System: victoria.nlm.nih.gov
Processor: 
 Serial #: 
    Cache: 32768
    Clock: 2930
    Cores: 12 cpus x 24 Cores = 288
OSName:     Mac OS X
  Release:  10.6.8
  Version:  10K549
  Platform: x86_64
  Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 226
Physical Memory: Total:65536 Available: 57091
           Probe Name:        Count          Min           Mean         Stdev            Max        Total 
 MeanSquares_1_threads            20      0.403931       0.40648    0.00213043       0.41389        8.1296
 MeanSquares_2_threads            20      0.243789      0.367603     0.0894637       0.65006       7.35206
 MeanSquares_4_threads            20      0.281336      0.354749     0.0431082      0.440161       7.09497
 MeanSquares_8_threads            20       0.24615      0.301576     0.0552998      0.446528       6.03151

Brad

On Jul 26, 2012, at 8:56 AM, Rupert Brooks wrote:

> Brad,
> 
> The false sharing issue is a good point - however, i dont think this is the cause of the performance degradation.  This part of the class (m_Threader, etc) has not changed since 3.20.  (I used the optimized metrics in my 3.20 builds, so its in Review/itkOptMeanSquares....) It also does not explain the performance drop in single threaded mode.
> 
> Testing will tell...  Seems like a Friday afternoon project to me, unless someone else gets there first.
> 
> Rupert
> 
> --------------------------------------------------------------
> Rupert Brooks
> rupert.brooks at gmail.com
> 
> 
> 
> On Wed, Jul 25, 2012 at 5:18 PM, Bradley Lowekamp <blowekamp at mail.nih.gov> wrote:
> Hello,
> 
> Continuing to glance at the class.... I also see the following member variables for the MeanSquares class:
> 
>   MeasureType *   m_ThreaderMSE;
>   DerivativeType *m_ThreaderMSEDerivatives;
> 
> Where these are index by the thread ID and access simultaneously across the threads causes the potential for False Sharing, which can be a MAJOR problem with threaded algorithms.
> 
> I would think a good solution would be to create a per-thread data structure consisting of the Jacobin, MeasureType, and DerivativeType, plus padding to prevent false sharing, or equivalently assigning max data alignment to the structure.
> 
> Rupert, Would like to take a stab at this fix?
> 
> Brad
> 
> 
> On Jul 25, 2012, at 4:31 PM, Rupert Brooks wrote:
> 
>> Sorry if this repeats - i just got a bounce from Insight Developers, so im trimming the message and resending....
>> --------------------------------------------------------------
>> Rupert Brooks
>> rupert.brooks at gmail.com
>> 
>> 
>> 
>> On Wed, Jul 25, 2012 at 4:12 PM, Rupert Brooks <rupert.brooks at gmail.com> wrote:
>> Aha.  Heres around line 183 of itkTranslationTransform.
>> 
>> // Compute the Jacobian in one position
>> template <class TScalarType, unsigned int NDimensions>
>> void
>> TranslationTransform<TScalarType, NDimensions>::ComputeJacobianWithRespectToParameters(
>>   const InputPointType &,
>>   JacobianType & jacobian) const
>> {
>>   // the Jacobian is constant for this transform, and it has already been
>>   // initialized in the constructor, so we just need to return it here.
>>   jacobian = this->m_IdentityJacobian;
>>   return;
>> }
>> 
>> Thats probably the culprit, although the root cause may be the reallocating of the jacobian every time through the loop.
>> 
>> Rupert
>> 
>> <snipped>
> 
> 

========================================================
Bradley Lowekamp  
Medical Science and Computing for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-developers/attachments/20120726/1e824f89/attachment.htm>