[Insight-developers] memcpy VS iterators copy!!

Bradley Lowekamp blowekamp at mail.nih.gov
Mon Mar 28 11:49:00 EDT 2011


Hans,

On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:

> Brad,
> 
> There is only one time listed.  IS 16.8704s fast or slow?  What is the comparison time?
The comparison times were at the bottom of the e-mail, from a prior message. The ImageSeriesReader is performing a copy for the ImageReader to the output image on a per-slice basis, by test images were unsigned chars of:

Testing series reader with 349 files.
Image Size: [2048, 1536, 349]

# memcpy at for each slice
Executed 10 times with mean 16.8704s

# current ITK iterator for loop with progress
Executed 10 times with mean 24.4403s

# iterator for loop with out progress
Executed  10 times with mean 20.7206s

# no for loop 
Executed with 10 times with mean 16.5306s

# gerrit patch version, which avoided the copy by setting a buffer.
Executed 10 times with mean 16.9262s


> This should now be possible now that we have partial template specialization.   How do we identify POD that is suitable for memcopy algorithm vs. the required iterators?
In an ideal world we could just use std::tr1::is_pod<ImageType::InternalPixelType>. But in our case we may need to introduce a NumericTraits field for this. How does the image duplicator deal with it now?

> Where was this loop copy iteration listed?   The FillBuffer is also a prime candidate for a speed improvement. Especially the FillBuffer(0) paradigm for integer and floating point types.

I see a class call ImageRegionDuplicator, which would be similar to the ImageDuplicator in some respects. 

The trick is also going to be determining the maximum input stride and output stride, so that the longest contiguous section of memory can be copied. I would be willing to bet that even memcpying row by row would be faster than the iterator loop. What I am unsure about, is if multiple threads have any benefits.

When I am doing streaming there is a lot of region extraction and compositing, which are really just buffer coping. I think this could have a big impact on this type of operation aswell.


> Hans
> 
> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Date: Mon, 28 Mar 2011 11:00:10 -0400
> To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Cc: ITK <insight-developers at itk.org>, Luis Ibanez <luis.ibanez at kitware.com>
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> 
> Hello,
> 
> This is another performance improvement that I think should me a MUST for v4! We need to replace the for loop image iterator copies with an abstraction that can use memcpy when possible!
> 
> I have been wanting to run the performance comparison for a while and this was the opportunity to do so! I replaced the for loop in question here with a memcpy ( it still has bugs it it but it's doing the needed work extremely fast! )
> 
> # memcpy loop
> Executed 10 times with mean 16.8704s
> 
> I just replaced the for loop with a memcpy:
>     {
>     const IdentifierType numberOfPixelsInSlice = sliceRegionToRequest.GetNumberOfPixels();
>     const size_t numberOfComponents = output->GetNumberOfComponentsPerPixel();
>     const IdentifierType numberOfPixelsUpToSlice = numberOfPixelsInSlice * i * numberOfComponents;
> 
>     typename  TOutputImage::InternalPixelType * outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
>     typename  TOutputImage::InternalPixelType * inputBuffer =c reader->GetOutput()->GetBufferPointer();
> 
>     memcpy( outputSliceBuffer, inputBuffer, sizeof( typename TOutputImage::InternalPixelType ) * numberOfPixelsInSlice * numberOfComponents );
>     }
> 
> Still for this case, no copy is still better then memcpy. 
> 
> On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley (NIH/NLM/LHC) [C] wrote:
> 
>> Hello Roger,
>> 
>> Your benchmark program had a few more dependencies, the just ITK so I wrote my own and attached it. I used a series of tiff I have, so I hope it would be comparable. I have also arrived at a similar conclusion that the copy loop is expensive and should be avoided. However, my benchmark does indicate that the progress reporting is taking 50% of the additional execution time, which is rather different then your experiment.
>> 
>> 
>> Testing series reader with 349 files.
>> Image Size: [2048, 1536, 349]
>> 
>> # current ITK
>> Executed 10 times with mean 24.4403s
>> 
>> # progress commented out
>> Executed  10 times with mean 20.7206s
>> 
>> # copy loop commented out
>> Executed with 10 times with mean 16.5306s
>> 
>> # gerrit patch version
>> Executed 10 times with mean 16.9262s
>> 
>> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> 
> ========================================================
> Bradley Lowekamp  
> Lockheed Martin Contractor for
> Office of High Performance Computing and Communications
> National Library of Medicine 
> blowekamp at mail.nih.gov
> 
> 
> _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.html Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://www.itk.org/mailman/listinfo/insight-developers 
> 
> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110328/39e5afe3/attachment-0001.htm>


More information about the Insight-developers mailing list