[Insight-developers] memcpy VS iterators copy!!

Mon Mar 28 19:24:21 EDT 2011

Should we be using std:;copy instead of memcpy?

On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans <
kris.thielemans at csc.mrc.ac.uk> wrote:

> Hi
>
> any decent C++ compiler should have these optimisations when using
> std::copy. For example, from
>  /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
>
>  // All of these auxiliary functions serve two purposes.  (1) Replace
>  // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
>  // because the input and output ranges are permitted to overlap.)
>  // (2) If we're using random access iterators, then write the loop as
>  // a for loop with an explicit count.
>
> Kris
>
> > -----Original Message-----
> > From: insight-developers-bounces at itk.org
> > [mailto:insight-developers-bounces at itk.org] On Behalf Of
> > Johnson, Hans J
> > Sent: 28 March 2011 17:56
> > To: Bradley Lowekamp
> > Cc: ITK; Luis Ibanez
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> > Brad,
> >
> > I support pulling in some form of
> > std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> > This was exactly what I was thinking of.  I"ve looked into
> > this once, and I think it has several places where this
> > paradigm can help solve key performance bottle necks where
> > our need to support general pixel types (I.e. We need to
> > support pixel types of "elephant objects") severely impact
> > the most common scalar/POD performance.
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:49:00 -0400
> > To: Hans Johnson <hans-johnson at uiowa.edu>
> > Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> > Hans,
> >
> > On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> >
> >
> >       Brad,
> >
> >
> >       1.      There is only one time listed.  IS 16.8704s
> > fast or slow?  What is the comparison time?
> >
> > The comparison times were at the bottom of the e-mail, from a
> > prior message. The ImageSeriesReader is performing a copy for
> > the ImageReader to the output image on a per-slice basis, by
> > test images were unsigned chars of:
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # memcpy at for each slice
> > Executed 10 times with mean 16.8704s
> >
> > # current ITK iterator for loop with progress Executed 10
> > times with mean 24.4403s
> >
> > # iterator for loop with out progress
> > Executed  10 times with mean 20.7206s
> >
> > # no for loop
> > Executed with 10 times with mean 16.5306s
> >
> > # gerrit patch version, which avoided the copy by setting a buffer.
> > Executed 10 times with mean 16.9262s
> >
> >
> >
> >       2.      This should now be possible now that we have
> > partial template specialization.   How do we identify POD
> > that is suitable for memcopy algorithm vs. the required iterators?
> >
> > In an ideal world we could just use
> > std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> > case we may need to introduce a NumericTraits field for this.
> > How does the image duplicator deal with it now?
> >
> >
> >       3.      Where was this loop copy iteration listed?
> > The FillBuffer is also a prime candidate for a speed
> > improvement. Especially the FillBuffer(0) paradigm for
> > integer and floating point types.
> >
> >
> > I see a class call ImageRegionDuplicator, which would be
> > similar to the ImageDuplicator in some respects.
> >
> > The trick is also going to be determining the maximum input
> > stride and output stride, so that the longest contiguous
> > section of memory can be copied. I would be willing to bet
> > that even memcpying row by row would be faster than the
> > iterator loop. What I am unsure about, is if multiple threads
> > have any benefits.
> >
> > When I am doing streaming there is a lot of region extraction
> > and compositing, which are really just buffer coping. I think
> > this could have a big impact on this type of operation aswell.
> >
> >
> >
> >       Hans
> >
> >
> >       From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> >       Date: Mon, 28 Mar 2011 11:00:10 -0400
> >       To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> >       Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> >       Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> >       Hello,
> >
> >       This is another performance improvement that I think
> > should me a MUST for v4! We need to replace the for loop
> > image iterator copies with an abstraction that can use memcpy
> > when possible!
> >
> >       I have been wanting to run the performance comparison
> > for a while and this was the opportunity to do so! I replaced
> > the for loop in question here with a memcpy ( it still has
> > bugs it it but it's doing the needed work extremely fast! )
> >
> >       # memcpy loop
> >       Executed 10 times with mean 16.8704s
> >
> >       I just replaced the for loop with a memcpy:
> >           {
> >           const IdentifierType numberOfPixelsInSlice =
> > sliceRegionToRequest.GetNumberOfPixels();
> >           const size_t numberOfComponents =
> > output->GetNumberOfComponentsPerPixel();
> >           const IdentifierType numberOfPixelsUpToSlice =
> > numberOfPixelsInSlice * i * numberOfComponents;
> >
> >           typename  TOutputImage::InternalPixelType *
> > outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
> >           typename  TOutputImage::InternalPixelType *
> > inputBuffer =c reader->GetOutput()->GetBufferPointer();
> >
> >           memcpy( outputSliceBuffer, inputBuffer, sizeof(
> > typename TOutputImage::InternalPixelType ) *
> > numberOfPixelsInSlice * numberOfComponents );
> >           }
> >
> >       Still for this case, no copy is still better then memcpy.
> >
> >       On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> > (NIH/NLM/LHC) [C] wrote:
> >
> >
> >               Hello Roger,
> >
> >               Your benchmark program had a few more
> > dependencies, the just ITK so I wrote my own and attached it.
> > I used a series of tiff I have, so I hope it would be
> > comparable. I have also arrived at a similar conclusion that
> > the copy loop is expensive and should be avoided. However, my
> > benchmark does indicate that the progress reporting is taking
> > 50% of the additional execution time, which is rather
> > different then your experiment.
> >
> >
> >               Testing series reader with 349 files.
> >               Image Size: [2048, 1536, 349]
> >
> >               # current ITK
> >               Executed 10 times with mean 24.4403s
> >
> >               # progress commented out
> >               Executed  10 times with mean 20.7206s
> >
> >               # copy loop commented out
> >               Executed with 10 times with mean 16.5306s
> >
> >               # gerrit patch version
> >               Executed 10 times with mean 16.9262s
> >
> >               <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> >
> >
> >       ========================================================
> >       Bradley Lowekamp
> >       Lockheed Martin Contractor for
> >       Office of High Performance Computing and Communications
> >       National Library of Medicine
> >       blowekamp at mail.nih.gov
> >
> >
> >
> >       _______________________________________________ Powered
> > by www.kitware.com <http://www.kitware.com>  Visit other
> > Kitware open-source projects at
> > http://www.kitware.com/opensource/opensource.html
> > <http://www.kitware.com/opensource/opensource.html>  Kitware
> > offers ITK Training Courses, for more information visit:
> > http://kitware.com/products/protraining.html Please keep
> > messages on-topic and check the ITK FAQ at:
> > http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> > subscribe/unsubscribe:
> > http://www.itk.org/mailman/listinfo/insight-developers
> >
> >
> > ________________________________
> >
> >       Notice: This UI Health Care e-mail (including
> > attachments) is covered by the Electronic Communications
> > Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be
> > legally privileged.  If you are not the intended recipient,
> > you are hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited.  Please reply to the sender that you have
> > received the message in error, then delete it.  Thank you.
> > ________________________________
> >
> >
> >
> > ========================================================
> >
> > Bradley Lowekamp
> >
> > Lockheed Martin Contractor for
> >
> > Office of High Performance Computing and Communications
> >
> > National Library of Medicine
> >
> > blowekamp at mail.nih.gov
> >
> >
> >
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including attachments) is
> > covered by the Electronic Communications Privacy Act, 18
> > U.S.C. 2510-2521, is confidential and may be legally
> > privileged.  If you are not the intended recipient, you are
> > hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited.  Please reply to the sender that you have
> > received the message in error, then delete it.  Thank you.
> > ________________________________
> >
> >
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110328/489cad3e/attachment.htm>