[Insight-developers] memcpy VS iterators copy!!

Bradley Lowekamp blowekamp at mail.nih.gov
Mon Mar 28 11:00:10 EDT 2011


Hello,

This is another performance improvement that I think should me a MUST for v4! We need to replace the for loop image iterator copies with an abstraction that can use memcpy when possible!

I have been wanting to run the performance comparison for a while and this was the opportunity to do so! I replaced the for loop in question here with a memcpy ( it still has bugs it it but it's doing the needed work extremely fast! )

# memcpy loop
Executed 10 times with mean 16.8704s

I just replaced the for loop with a memcpy:
    {
    const IdentifierType numberOfPixelsInSlice = sliceRegionToRequest.GetNumberOfPixels();
    const size_t numberOfComponents = output->GetNumberOfComponentsPerPixel();
    const IdentifierType numberOfPixelsUpToSlice = numberOfPixelsInSlice * i * numberOfComponents;

    typename  TOutputImage::InternalPixelType * outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
    typename  TOutputImage::InternalPixelType * inputBuffer =c reader->GetOutput()->GetBufferPointer();

    memcpy( outputSliceBuffer, inputBuffer, sizeof( typename TOutputImage::InternalPixelType ) * numberOfPixelsInSlice * numberOfComponents );
    }

Still for this case, no copy is still better then memcpy. 

On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley (NIH/NLM/LHC) [C] wrote:

> Hello Roger,
> 
> Your benchmark program had a few more dependencies, the just ITK so I wrote my own and attached it. I used a series of tiff I have, so I hope it would be comparable. I have also arrived at a similar conclusion that the copy loop is expensive and should be avoided. However, my benchmark does indicate that the progress reporting is taking 50% of the additional execution time, which is rather different then your experiment.
> 
> 
> Testing series reader with 349 files.
> Image Size: [2048, 1536, 349]
> 
> # current ITK
> Executed 10 times with mean 24.4403s
> 
> # progress commented out
> Executed  10 times with mean 20.7206s
> 
> # copy loop commented out
> Executed with 10 times with mean 16.5306s
> 
> # gerrit patch version
> Executed 10 times with mean 16.9262s
> 
> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110328/41db97f7/attachment.htm>


More information about the Insight-developers mailing list