[Insight-developers] itk performance numbers
Bradley Lowekamp
blowekamp at mail.nih.gov
Thu Jul 26 11:53:32 EDT 2012
Hello,
Well I did get to it before you:
http://review.source.kitware.com/#/c/6614/
I also uped the size of the image 100x in your test, here is the current performance on my system:
System: victoria.nlm.nih.gov
Processor: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
Serial #:
Cache: 32768
Clock: 2794.27
Cores: 12 cpus x 24 Cores = 288
OSName: Mac OS X
Release: 10.6.8
Version: 10K549
Platform: x86_64
Operating System is 64 bit
ITK Version: 3.20.1
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58374
Probe Name: Count Min Mean Stdev Max Total
MeanSquares_1_threads 20 0.344348 0.347567 0.00244733 0.352629 6.95134
MeanSquares_2_threads 20 0.251223 0.300869 0.0179305 0.321404 6.01738
MeanSquares_4_threads 20 0.215516 0.348677 0.173645 0.678274 6.97355
MeanSquares_8_threads 20 0.138184 0.182681 0.0297812 0.237129 3.65362
System: victoria.nlm.nih.gov
Processor:
Serial #:
Cache: 32768
Clock: 2930
Cores: 12 cpus x 24 Cores = 288
OSName: Mac OS X
Release: 10.6.8
Version: 10K549
Platform: x86_64
Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 228
Physical Memory: Total:65536 Available: 58371
Probe Name: Count Min Mean Stdev Max Total
MeanSquares_1_threads 20 0.382481 0.383342 0.00186954 0.391027 7.66685
MeanSquares_2_threads 20 0.211908 0.335328 0.0777408 0.435574 6.70655
MeanSquares_4_threads 20 0.271531 0.315688 0.0390751 0.385683 6.31377
MeanSquares_8_threads 20 0.147544 0.192132 0.0299427 0.240976 3.84263
In the patch provided, it is implicitly done on assignment on a per-thread basis. What was most un-expected was when then allocation of the Jacobin was explicitly done out side the threaded part, the time when up by 50%! I presume that the sequential allocation, of the doubles in the master thread made the allocation sequentially, next to each other, and may be a more insidious form of false sharing. Below is the numbers from this run, notice the lack of speed up with more threads:
System: victoria.nlm.nih.gov
Processor:
Serial #:
Cache: 32768
Clock: 2930
Cores: 12 cpus x 24 Cores = 288
OSName: Mac OS X
Release: 10.6.8
Version: 10K549
Platform: x86_64
Operating System is 64 bit
ITK Version: 4.2.0
Virtual Memory: Total: 256 Available: 226
Physical Memory: Total:65536 Available: 57091
Probe Name: Count Min Mean Stdev Max Total
MeanSquares_1_threads 20 0.403931 0.40648 0.00213043 0.41389 8.1296
MeanSquares_2_threads 20 0.243789 0.367603 0.0894637 0.65006 7.35206
MeanSquares_4_threads 20 0.281336 0.354749 0.0431082 0.440161 7.09497
MeanSquares_8_threads 20 0.24615 0.301576 0.0552998 0.446528 6.03151
Brad
On Jul 26, 2012, at 8:56 AM, Rupert Brooks wrote:
> Brad,
>
> The false sharing issue is a good point - however, i dont think this is the cause of the performance degradation. This part of the class (m_Threader, etc) has not changed since 3.20. (I used the optimized metrics in my 3.20 builds, so its in Review/itkOptMeanSquares....) It also does not explain the performance drop in single threaded mode.
>
> Testing will tell... Seems like a Friday afternoon project to me, unless someone else gets there first.
>
> Rupert
>
> --------------------------------------------------------------
> Rupert Brooks
> rupert.brooks at gmail.com
>
>
>
> On Wed, Jul 25, 2012 at 5:18 PM, Bradley Lowekamp <blowekamp at mail.nih.gov> wrote:
> Hello,
>
> Continuing to glance at the class.... I also see the following member variables for the MeanSquares class:
>
> MeasureType * m_ThreaderMSE;
> DerivativeType *m_ThreaderMSEDerivatives;
>
> Where these are index by the thread ID and access simultaneously across the threads causes the potential for False Sharing, which can be a MAJOR problem with threaded algorithms.
>
> I would think a good solution would be to create a per-thread data structure consisting of the Jacobin, MeasureType, and DerivativeType, plus padding to prevent false sharing, or equivalently assigning max data alignment to the structure.
>
> Rupert, Would like to take a stab at this fix?
>
> Brad
>
>
> On Jul 25, 2012, at 4:31 PM, Rupert Brooks wrote:
>
>> Sorry if this repeats - i just got a bounce from Insight Developers, so im trimming the message and resending....
>> --------------------------------------------------------------
>> Rupert Brooks
>> rupert.brooks at gmail.com
>>
>>
>>
>> On Wed, Jul 25, 2012 at 4:12 PM, Rupert Brooks <rupert.brooks at gmail.com> wrote:
>> Aha. Heres around line 183 of itkTranslationTransform.
>>
>> // Compute the Jacobian in one position
>> template <class TScalarType, unsigned int NDimensions>
>> void
>> TranslationTransform<TScalarType, NDimensions>::ComputeJacobianWithRespectToParameters(
>> const InputPointType &,
>> JacobianType & jacobian) const
>> {
>> // the Jacobian is constant for this transform, and it has already been
>> // initialized in the constructor, so we just need to return it here.
>> jacobian = this->m_IdentityJacobian;
>> return;
>> }
>>
>> Thats probably the culprit, although the root cause may be the reallocating of the jacobian every time through the loop.
>>
>> Rupert
>>
>> <snipped>
>
>
========================================================
Bradley Lowekamp
Medical Science and Computing for
Office of High Performance Computing and Communications
National Library of Medicine
blowekamp at mail.nih.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-developers/attachments/20120726/1e824f89/attachment.htm>
More information about the Insight-developers
mailing list