[Insight-developers] itk performance numbers
Bradley Lowekamp
blowekamp at mail.nih.gov
Fri Jul 27 09:59:04 EDT 2012
OK here are some more numbers for the latest patch in gerrit. I will follow Ruperts format as it's the most clear.
MeanSquares:
Threads 3.2 4.2 4.2+patch patch percentage of 3.20
1 0.3615 0.8214 0.4071 113%
2 0.3222 0.6055 0.3365 104%
4 0.3249 0.4448 0.3293 101%
8 0.1703 0.3093 0.1943 114%
12 0.1457 0.2031 0.1322 91%
24* 0.1062 0.1332 0.0949 89%
MutualInformation:
Threads 3.2 4.2 4.2+patch patch percentage of 3.20
1 0.1467 0.6103 0.3353 228%
2 0.1036 0.3747 0.1774 171%
4 0.0847 0.2175 0.1262 149%
8 0.0655 0.1291 0.0681 104%
12 0.0551 0.1035 0.0486 88%
24* 0.0460 0.0829 0.0526 114%
*Hyperthreading
The observation to be made about MutualInformation is that while 4.2 it's still slower with one thread, there is a significant increase is speed-up due to threads now.
Brad
On Jul 26, 2012, at 2:02 PM, Rupert Brooks wrote:
> Ok that makes way more sense, sorry i didnt understand first time around.
>
> Just so i've got it right
> Threads 3.20 4.2+patch Time 4.2 as percent of 3.20
> 1 0.347567 0.383342 110.293%
> 2 0.300869 0.335328 111.453
> 4 0.348677 0.315688 90.5388
> 8 0.182681 0.192132 105.173
>
> So theres about 10% more time with ITK 4.2 used in the 1 and 2 thread case. That is definitely better than what we were getting. Cool.
>
> Rupert
>
> --------------------------------------------------------------
> Rupert Brooks
> rupert.brooks at gmail.com
>
>
>
> On Thu, Jul 26, 2012 at 1:13 PM, Bradley Lowekamp <blowekamp at mail.nih.gov> wrote:
> Sorry for not being clear! I got too excited by finding the solution to the performance issue with ITKv3 registration in ITKv4.
>
> This first is vanilla 3.20, the second is 4.20+ the gerrit patch. The third is the gerrit patch with the pre-malloc of the Jacobin outside the threaded section! Vanilla 4.2 is ~2x 3.20 for this test on my system too.
>
> Summary for the MeansSquares metric in your test:
>
> 3.20: 1X
> 4.2: 2+X
> 4.2+gerrit patch: 1X
> 4.2+gerrit patch + single-threaded preallocation of jacobian: 1.5X
========================================================
Bradley Lowekamp
Medical Science and Computing for
Office of High Performance Computing and Communications
National Library of Medicine
blowekamp at mail.nih.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-developers/attachments/20120727/0feaee6c/attachment.htm>
More information about the Insight-developers
mailing list