[Insight-developers] memcpy VS iterators copy!!

Bradley Lowekamp blowekamp at mail.nih.gov
Tue Mar 29 10:24:43 EDT 2011


Bill and Kris,

Using std::copy over memcpy would leave it to the compiler to make the is_pod check, and may be easier. It could also be a little more flexible is some situations.

The real challenge is going to be drill down the ITK layers of abstractions to ensure that such a method will produce the correct results for Image, Image of complex, Image of Vectors, VectorImage, Adapted Image and Adapted Vector Image etc...


I propose the following, create the interface similar to ImageDuplicatior to perform Image region to region coping. Initially, implement it with the iterator copy for loop. Ensure that for all ITK images it works. Then at some point we can begin to optimize the cases which we can detect. During the review process, these types of loops should be replaces with this new Region duplicator class.

Brad


On Mar 28, 2011, at 7:24 PM, Bill Lorensen wrote:

> Should we be using std:;copy instead of memcpy?
> 
> On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans <kris.thielemans at csc.mrc.ac.uk> wrote:
> Hi
> 
> any decent C++ compiler should have these optimisations when using
> std::copy. For example, from
>  /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
> 
>  // All of these auxiliary functions serve two purposes.  (1) Replace
>  // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
>  // because the input and output ranges are permitted to overlap.)
>  // (2) If we're using random access iterators, then write the loop as
>  // a for loop with an explicit count.
> 
> Kris
> 
> > -----Original Message-----
> > From: insight-developers-bounces at itk.org
> > [mailto:insight-developers-bounces at itk.org] On Behalf Of
> > Johnson, Hans J
> > Sent: 28 March 2011 17:56
> > To: Bradley Lowekamp
> > Cc: ITK; Luis Ibanez
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> > Brad,
> >
> > I support pulling in some form of
> > std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> > This was exactly what I was thinking of.  I"ve looked into
> > this once, and I think it has several places where this
> > paradigm can help solve key performance bottle necks where
> > our need to support general pixel types (I.e. We need to
> > support pixel types of "elephant objects") severely impact
> > the most common scalar/POD performance.
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:49:00 -0400
> > To: Hans Johnson <hans-johnson at uiowa.edu>
> > Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> > Hans,
> >
> > On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> >
> >
> >       Brad,
> >
> >
> >       1.      There is only one time listed.  IS 16.8704s
> > fast or slow?  What is the comparison time?
> >
> > The comparison times were at the bottom of the e-mail, from a
> > prior message. The ImageSeriesReader is performing a copy for
> > the ImageReader to the output image on a per-slice basis, by
> > test images were unsigned chars of:
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # memcpy at for each slice
> > Executed 10 times with mean 16.8704s
> >
> > # current ITK iterator for loop with progress Executed 10
> > times with mean 24.4403s
> >
> > # iterator for loop with out progress
> > Executed  10 times with mean 20.7206s
> >
> > # no for loop
> > Executed with 10 times with mean 16.5306s
> >
> > # gerrit patch version, which avoided the copy by setting a buffer.
> > Executed 10 times with mean 16.9262s
> >
> >
> >
> >       2.      This should now be possible now that we have
> > partial template specialization.   How do we identify POD
> > that is suitable for memcopy algorithm vs. the required iterators?
> >
> > In an ideal world we could just use
> > std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> > case we may need to introduce a NumericTraits field for this.
> > How does the image duplicator deal with it now?
> >
> >
> >       3.      Where was this loop copy iteration listed?
> > The FillBuffer is also a prime candidate for a speed
> > improvement. Especially the FillBuffer(0) paradigm for
> > integer and floating point types.
> >
> >
> > I see a class call ImageRegionDuplicator, which would be
> > similar to the ImageDuplicator in some respects.
> >
> > The trick is also going to be determining the maximum input
> > stride and output stride, so that the longest contiguous
> > section of memory can be copied. I would be willing to bet
> > that even memcpying row by row would be faster than the
> > iterator loop. What I am unsure about, is if multiple threads
> > have any benefits.
> >
> > When I am doing streaming there is a lot of region extraction
> > and compositing, which are really just buffer coping. I think
> > this could have a big impact on this type of operation aswell.
> >
> >
> >
> >       Hans
> >
> >
> >       From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> >       Date: Mon, 28 Mar 2011 11:00:10 -0400
> >       To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> >       Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> >       Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> >       Hello,
> >
> >       This is another performance improvement that I think
> > should me a MUST for v4! We need to replace the for loop
> > image iterator copies with an abstraction that can use memcpy
> > when possible!
> >
> >       I have been wanting to run the performance comparison
> > for a while and this was the opportunity to do so! I replaced
> > the for loop in question here with a memcpy ( it still has
> > bugs it it but it's doing the needed work extremely fast! )
> >
> >       # memcpy loop
> >       Executed 10 times with mean 16.8704s
> >
> >       I just replaced the for loop with a memcpy:
> >           {
> >           const IdentifierType numberOfPixelsInSlice =
> > sliceRegionToRequest.GetNumberOfPixels();
> >           const size_t numberOfComponents =
> > output->GetNumberOfComponentsPerPixel();
> >           const IdentifierType numberOfPixelsUpToSlice =
> > numberOfPixelsInSlice * i * numberOfComponents;
> >
> >           typename  TOutputImage::InternalPixelType *
> > outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
> >           typename  TOutputImage::InternalPixelType *
> > inputBuffer =c reader->GetOutput()->GetBufferPointer();
> >
> >           memcpy( outputSliceBuffer, inputBuffer, sizeof(
> > typename TOutputImage::InternalPixelType ) *
> > numberOfPixelsInSlice * numberOfComponents );
> >           }
> >
> >       Still for this case, no copy is still better then memcpy.
> >
> >       On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> > (NIH/NLM/LHC) [C] wrote:
> >
> >
> >               Hello Roger,
> >
> >               Your benchmark program had a few more
> > dependencies, the just ITK so I wrote my own and attached it.
> > I used a series of tiff I have, so I hope it would be
> > comparable. I have also arrived at a similar conclusion that
> > the copy loop is expensive and should be avoided. However, my
> > benchmark does indicate that the progress reporting is taking
> > 50% of the additional execution time, which is rather
> > different then your experiment.
> >
> >
> >               Testing series reader with 349 files.
> >               Image Size: [2048, 1536, 349]
> >
> >               # current ITK
> >               Executed 10 times with mean 24.4403s
> >
> >               # progress commented out
> >               Executed  10 times with mean 20.7206s
> >
> >               # copy loop commented out
> >               Executed with 10 times with mean 16.5306s
> >
> >               # gerrit patch version
> >               Executed 10 times with mean 16.9262s
> >
> >               <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> >
> >
> >       ========================================================
> >       Bradley Lowekamp
> >       Lockheed Martin Contractor for
> >       Office of High Performance Computing and Communications
> >       National Library of Medicine
> >       blowekamp at mail.nih.gov
> >
> >
> >
> >       _______________________________________________ Powered
> > by www.kitware.com <http://www.kitware.com>  Visit other
> > Kitware open-source projects at
> > http://www.kitware.com/opensource/opensource.html
> > <http://www.kitware.com/opensource/opensource.html>  Kitware
> > offers ITK Training Courses, for more information visit:
> > http://kitware.com/products/protraining.html Please keep
> > messages on-topic and check the ITK FAQ at:
> > http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> > subscribe/unsubscribe:
> > http://www.itk.org/mailman/listinfo/insight-developers
> >
> >
> > ________________________________
> >
> >       Notice: This UI Health Care e-mail (including
> > attachments) is covered by the Electronic Communications
> > Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be
> > legally privileged.  If you are not the intended recipient,
> > you are hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited.  Please reply to the sender that you have
> > received the message in error, then delete it.  Thank you.
> > ________________________________
> >
> >
> >
> > ========================================================
> >
> > Bradley Lowekamp
> >
> > Lockheed Martin Contractor for
> >
> > Office of High Performance Computing and Communications
> >
> > National Library of Medicine
> >
> > blowekamp at mail.nih.gov
> >
> >
> >
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including attachments) is
> > covered by the Electronic Communications Privacy Act, 18
> > U.S.C. 2510-2521, is confidential and may be legally
> > privileged.  If you are not the intended recipient, you are
> > hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited.  Please reply to the sender that you have
> > received the message in error, then delete it.  Thank you.
> > ________________________________
> >
> >
> 
> 
> _______________________________________________
> Powered by www.kitware.com
> 
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
> 
> Kitware offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.html
> 
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
> 
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
> 

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110329/436c4117/attachment-0001.htm>


More information about the Insight-developers mailing list