[Insight-developers] memcpy VS iterators copy!!

Mon Mar 28 12:56:14 EDT 2011

Brad,

I support pulling in some form of std::tr1::is_pod<ImageType::InternalPixelType> paradigm.  This was exactly what I was thinking of.  I"ve looked into this once, and I think it has several places where this paradigm can help solve key performance bottle necks where our need to support general pixel types (I.e. We need to support pixel types of "elephant objects") severely impact the most common scalar/POD performance.

Hans

From: Bradley Lowekamp <blowekamp at mail.nih.gov<mailto:blowekamp at mail.nih.gov>>
Date: Mon, 28 Mar 2011 11:49:00 -0400
To: Hans Johnson <hans-johnson at uiowa.edu<mailto:hans-johnson at uiowa.edu>>
Cc: ITK <insight-developers at itk.org<mailto:insight-developers at itk.org>>, Luis Ibanez <luis.ibanez at kitware.com<mailto:luis.ibanez at kitware.com>>
Subject: Re: [Insight-developers] memcpy VS iterators copy!!

Hans,

On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:

Brad,

  1.  There is only one time listed.  IS 16.8704s fast or slow?  What is the comparison time?

The comparison times were at the bottom of the e-mail, from a prior message. The ImageSeriesReader is performing a copy for the ImageReader to the output image on a per-slice basis, by test images were unsigned chars of:

Testing series reader with 349 files.
Image Size: [2048, 1536, 349]

# memcpy at for each slice
Executed 10 times with mean 16.8704s

# current ITK iterator for loop with progress
Executed 10 times with mean 24.4403s

# iterator for loop with out progress
Executed  10 times with mean 20.7206s

# no for loop
Executed with 10 times with mean 16.5306s

# gerrit patch version, which avoided the copy by setting a buffer.
Executed 10 times with mean 16.9262s

  1.  This should now be possible now that we have partial template specialization.   How do we identify POD that is suitable for memcopy algorithm vs. the required iterators?

In an ideal world we could just use std::tr1::is_pod<ImageType::InternalPixelType>. But in our case we may need to introduce a NumericTraits field for this. How does the image duplicator deal with it now?

  1.  Where was this loop copy iteration listed?   The FillBuffer is also a prime candidate for a speed improvement. Especially the FillBuffer(0) paradigm for integer and floating point types.

I see a class call ImageRegionDuplicator, which would be similar to the ImageDuplicator in some respects.

The trick is also going to be determining the maximum input stride and output stride, so that the longest contiguous section of memory can be copied. I would be willing to bet that even memcpying row by row would be faster than the iterator loop. What I am unsure about, is if multiple threads have any benefits.

When I am doing streaming there is a lot of region extraction and compositing, which are really just buffer coping. I think this could have a big impact on this type of operation aswell.

Hans

From: Bradley Lowekamp <blowekamp at mail.nih.gov<mailto:blowekamp at mail.nih.gov>>
Date: Mon, 28 Mar 2011 11:00:10 -0400
To: Bradley Lowekamp <blowekamp at mail.nih.gov<mailto:blowekamp at mail.nih.gov>>
Cc: ITK <insight-developers at itk.org<mailto:insight-developers at itk.org>>, Luis Ibanez <luis.ibanez at kitware.com<mailto:luis.ibanez at kitware.com>>
Subject: Re: [Insight-developers] memcpy VS iterators copy!!

Hello,

This is another performance improvement that I think should me a MUST for v4! We need to replace the for loop image iterator copies with an abstraction that can use memcpy when possible!

I have been wanting to run the performance comparison for a while and this was the opportunity to do so! I replaced the for loop in question here with a memcpy ( it still has bugs it it but it's doing the needed work extremely fast! )

# memcpy loop
Executed 10 times with mean 16.8704s

I just replaced the for loop with a memcpy:
    {
    const IdentifierType numberOfPixelsInSlice = sliceRegionToRequest.GetNumberOfPixels();
    const size_t numberOfComponents = output->GetNumberOfComponentsPerPixel();
    const IdentifierType numberOfPixelsUpToSlice = numberOfPixelsInSlice * i * numberOfComponents;

    typename  TOutputImage::InternalPixelType * outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
    typename  TOutputImage::InternalPixelType * inputBuffer =c reader->GetOutput()->GetBufferPointer();

    memcpy( outputSliceBuffer, inputBuffer, sizeof( typename TOutputImage::InternalPixelType ) * numberOfPixelsInSlice * numberOfComponents );
    }

Still for this case, no copy is still better then memcpy.

On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley (NIH/NLM/LHC) [C] wrote:

Hello Roger,

Your benchmark program had a few more dependencies, the just ITK so I wrote my own and attached it. I used a series of tiff I have, so I hope it would be comparable. I have also arrived at a similar conclusion that the copy loop is expensive and should be avoided. However, my benchmark does indicate that the progress reporting is taking 50% of the additional execution time, which is rather different then your experiment.

Testing series reader with 349 files.
Image Size: [2048, 1536, 349]

# current ITK
Executed 10 times with mean 24.4403s

# progress commented out
Executed  10 times with mean 20.7206s

# copy loop commented out
Executed with 10 times with mean 16.5306s

# gerrit patch version
Executed 10 times with mean 16.9262s

<itkImageSeriesReaderPerformance.cxx><ATT00001..htm>

========================================================
Bradley Lowekamp
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine
blowekamp at mail.nih.gov<mailto:blowekamp at mail.nih.gov>

_______________________________________________ Powered by www.kitware.com<http://www.kitware.com> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.html Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://www.itk.org/mailman/listinfo/insight-developers

________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
________________________________

========================================================

Bradley Lowekamp

Lockheed Martin Contractor for

Office of High Performance Computing and Communications

National Library of Medicine

blowekamp at mail.nih.gov<mailto:blowekamp at mail.nih.gov>

________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110328/63acdb9f/attachment.htm>