[Insight-developers] memcpy VS iterators copy!!
Bill Lorensen
bill.lorensen at gmail.com
Mon Mar 28 19:24:21 EDT 2011
Should we be using std:;copy instead of memcpy?
On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans <
kris.thielemans at csc.mrc.ac.uk> wrote:
> Hi
>
> any decent C++ compiler should have these optimisations when using
> std::copy. For example, from
> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
>
> // All of these auxiliary functions serve two purposes. (1) Replace
> // calls to copy with memmove whenever possible. (Memmove, not memcpy,
> // because the input and output ranges are permitted to overlap.)
> // (2) If we're using random access iterators, then write the loop as
> // a for loop with an explicit count.
>
> Kris
>
> > -----Original Message-----
> > From: insight-developers-bounces at itk.org
> > [mailto:insight-developers-bounces at itk.org] On Behalf Of
> > Johnson, Hans J
> > Sent: 28 March 2011 17:56
> > To: Bradley Lowekamp
> > Cc: ITK; Luis Ibanez
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> > Brad,
> >
> > I support pulling in some form of
> > std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> > This was exactly what I was thinking of. I"ve looked into
> > this once, and I think it has several places where this
> > paradigm can help solve key performance bottle necks where
> > our need to support general pixel types (I.e. We need to
> > support pixel types of "elephant objects") severely impact
> > the most common scalar/POD performance.
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:49:00 -0400
> > To: Hans Johnson <hans-johnson at uiowa.edu>
> > Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> > Hans,
> >
> > On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> >
> >
> > Brad,
> >
> >
> > 1. There is only one time listed. IS 16.8704s
> > fast or slow? What is the comparison time?
> >
> > The comparison times were at the bottom of the e-mail, from a
> > prior message. The ImageSeriesReader is performing a copy for
> > the ImageReader to the output image on a per-slice basis, by
> > test images were unsigned chars of:
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # memcpy at for each slice
> > Executed 10 times with mean 16.8704s
> >
> > # current ITK iterator for loop with progress Executed 10
> > times with mean 24.4403s
> >
> > # iterator for loop with out progress
> > Executed 10 times with mean 20.7206s
> >
> > # no for loop
> > Executed with 10 times with mean 16.5306s
> >
> > # gerrit patch version, which avoided the copy by setting a buffer.
> > Executed 10 times with mean 16.9262s
> >
> >
> >
> > 2. This should now be possible now that we have
> > partial template specialization. How do we identify POD
> > that is suitable for memcopy algorithm vs. the required iterators?
> >
> > In an ideal world we could just use
> > std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> > case we may need to introduce a NumericTraits field for this.
> > How does the image duplicator deal with it now?
> >
> >
> > 3. Where was this loop copy iteration listed?
> > The FillBuffer is also a prime candidate for a speed
> > improvement. Especially the FillBuffer(0) paradigm for
> > integer and floating point types.
> >
> >
> > I see a class call ImageRegionDuplicator, which would be
> > similar to the ImageDuplicator in some respects.
> >
> > The trick is also going to be determining the maximum input
> > stride and output stride, so that the longest contiguous
> > section of memory can be copied. I would be willing to bet
> > that even memcpying row by row would be faster than the
> > iterator loop. What I am unsure about, is if multiple threads
> > have any benefits.
> >
> > When I am doing streaming there is a lot of region extraction
> > and compositing, which are really just buffer coping. I think
> > this could have a big impact on this type of operation aswell.
> >
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:00:10 -0400
> > To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> >
> >
> > Hello,
> >
> > This is another performance improvement that I think
> > should me a MUST for v4! We need to replace the for loop
> > image iterator copies with an abstraction that can use memcpy
> > when possible!
> >
> > I have been wanting to run the performance comparison
> > for a while and this was the opportunity to do so! I replaced
> > the for loop in question here with a memcpy ( it still has
> > bugs it it but it's doing the needed work extremely fast! )
> >
> > # memcpy loop
> > Executed 10 times with mean 16.8704s
> >
> > I just replaced the for loop with a memcpy:
> > {
> > const IdentifierType numberOfPixelsInSlice =
> > sliceRegionToRequest.GetNumberOfPixels();
> > const size_t numberOfComponents =
> > output->GetNumberOfComponentsPerPixel();
> > const IdentifierType numberOfPixelsUpToSlice =
> > numberOfPixelsInSlice * i * numberOfComponents;
> >
> > typename TOutputImage::InternalPixelType *
> > outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
> > typename TOutputImage::InternalPixelType *
> > inputBuffer =c reader->GetOutput()->GetBufferPointer();
> >
> > memcpy( outputSliceBuffer, inputBuffer, sizeof(
> > typename TOutputImage::InternalPixelType ) *
> > numberOfPixelsInSlice * numberOfComponents );
> > }
> >
> > Still for this case, no copy is still better then memcpy.
> >
> > On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> > (NIH/NLM/LHC) [C] wrote:
> >
> >
> > Hello Roger,
> >
> > Your benchmark program had a few more
> > dependencies, the just ITK so I wrote my own and attached it.
> > I used a series of tiff I have, so I hope it would be
> > comparable. I have also arrived at a similar conclusion that
> > the copy loop is expensive and should be avoided. However, my
> > benchmark does indicate that the progress reporting is taking
> > 50% of the additional execution time, which is rather
> > different then your experiment.
> >
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # current ITK
> > Executed 10 times with mean 24.4403s
> >
> > # progress commented out
> > Executed 10 times with mean 20.7206s
> >
> > # copy loop commented out
> > Executed with 10 times with mean 16.5306s
> >
> > # gerrit patch version
> > Executed 10 times with mean 16.9262s
> >
> > <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> >
> >
> > ========================================================
> > Bradley Lowekamp
> > Lockheed Martin Contractor for
> > Office of High Performance Computing and Communications
> > National Library of Medicine
> > blowekamp at mail.nih.gov
> >
> >
> >
> > _______________________________________________ Powered
> > by www.kitware.com <http://www.kitware.com> Visit other
> > Kitware open-source projects at
> > http://www.kitware.com/opensource/opensource.html
> > <http://www.kitware.com/opensource/opensource.html> Kitware
> > offers ITK Training Courses, for more information visit:
> > http://kitware.com/products/protraining.html Please keep
> > messages on-topic and check the ITK FAQ at:
> > http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> > subscribe/unsubscribe:
> > http://www.itk.org/mailman/listinfo/insight-developers
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including
> > attachments) is covered by the Electronic Communications
> > Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be
> > legally privileged. If you are not the intended recipient,
> > you are hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited. Please reply to the sender that you have
> > received the message in error, then delete it. Thank you.
> > ________________________________
> >
> >
> >
> > ========================================================
> >
> > Bradley Lowekamp
> >
> > Lockheed Martin Contractor for
> >
> > Office of High Performance Computing and Communications
> >
> > National Library of Medicine
> >
> > blowekamp at mail.nih.gov
> >
> >
> >
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including attachments) is
> > covered by the Electronic Communications Privacy Act, 18
> > U.S.C. 2510-2521, is confidential and may be legally
> > privileged. If you are not the intended recipient, you are
> > hereby notified that any retention, dissemination,
> > distribution, or copying of this communication is strictly
> > prohibited. Please reply to the sender that you have
> > received the message in error, then delete it. Thank you.
> > ________________________________
> >
> >
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20110328/489cad3e/attachment.htm>
More information about the Insight-developers
mailing list