[Insight-developers] memcpy VS iterators copy!!
Kris Thielemans
kris.thielemans at csc.mrc.ac.uk
Mon Mar 28 16:05:54 EDT 2011
Hi
any decent C++ compiler should have these optimisations when using
std::copy. For example, from
/usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
// All of these auxiliary functions serve two purposes. (1) Replace
// calls to copy with memmove whenever possible. (Memmove, not memcpy,
// because the input and output ranges are permitted to overlap.)
// (2) If we're using random access iterators, then write the loop as
// a for loop with an explicit count.
Kris
> -----Original Message-----
> From: insight-developers-bounces at itk.org
> [mailto:insight-developers-bounces at itk.org] On Behalf Of
> Johnson, Hans J
> Sent: 28 March 2011 17:56
> To: Bradley Lowekamp
> Cc: ITK; Luis Ibanez
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
>
> Brad,
>
> I support pulling in some form of
> std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> This was exactly what I was thinking of. I"ve looked into
> this once, and I think it has several places where this
> paradigm can help solve key performance bottle necks where
> our need to support general pixel types (I.e. We need to
> support pixel types of "elephant objects") severely impact
> the most common scalar/POD performance.
>
>
> Hans
>
>
> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Date: Mon, 28 Mar 2011 11:49:00 -0400
> To: Hans Johnson <hans-johnson at uiowa.edu>
> Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> <luis.ibanez at kitware.com>
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
>
>
> Hans,
>
> On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
>
>
> Brad,
>
>
> 1. There is only one time listed. IS 16.8704s
> fast or slow? What is the comparison time?
>
> The comparison times were at the bottom of the e-mail, from a
> prior message. The ImageSeriesReader is performing a copy for
> the ImageReader to the output image on a per-slice basis, by
> test images were unsigned chars of:
>
> Testing series reader with 349 files.
> Image Size: [2048, 1536, 349]
>
> # memcpy at for each slice
> Executed 10 times with mean 16.8704s
>
> # current ITK iterator for loop with progress Executed 10
> times with mean 24.4403s
>
> # iterator for loop with out progress
> Executed 10 times with mean 20.7206s
>
> # no for loop
> Executed with 10 times with mean 16.5306s
>
> # gerrit patch version, which avoided the copy by setting a buffer.
> Executed 10 times with mean 16.9262s
>
>
>
> 2. This should now be possible now that we have
> partial template specialization. How do we identify POD
> that is suitable for memcopy algorithm vs. the required iterators?
>
> In an ideal world we could just use
> std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> case we may need to introduce a NumericTraits field for this.
> How does the image duplicator deal with it now?
>
>
> 3. Where was this loop copy iteration listed?
> The FillBuffer is also a prime candidate for a speed
> improvement. Especially the FillBuffer(0) paradigm for
> integer and floating point types.
>
>
> I see a class call ImageRegionDuplicator, which would be
> similar to the ImageDuplicator in some respects.
>
> The trick is also going to be determining the maximum input
> stride and output stride, so that the longest contiguous
> section of memory can be copied. I would be willing to bet
> that even memcpying row by row would be faster than the
> iterator loop. What I am unsure about, is if multiple threads
> have any benefits.
>
> When I am doing streaming there is a lot of region extraction
> and compositing, which are really just buffer coping. I think
> this could have a big impact on this type of operation aswell.
>
>
>
> Hans
>
>
> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Date: Mon, 28 Mar 2011 11:00:10 -0400
> To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> <luis.ibanez at kitware.com>
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
>
>
> Hello,
>
> This is another performance improvement that I think
> should me a MUST for v4! We need to replace the for loop
> image iterator copies with an abstraction that can use memcpy
> when possible!
>
> I have been wanting to run the performance comparison
> for a while and this was the opportunity to do so! I replaced
> the for loop in question here with a memcpy ( it still has
> bugs it it but it's doing the needed work extremely fast! )
>
> # memcpy loop
> Executed 10 times with mean 16.8704s
>
> I just replaced the for loop with a memcpy:
> {
> const IdentifierType numberOfPixelsInSlice =
> sliceRegionToRequest.GetNumberOfPixels();
> const size_t numberOfComponents =
> output->GetNumberOfComponentsPerPixel();
> const IdentifierType numberOfPixelsUpToSlice =
> numberOfPixelsInSlice * i * numberOfComponents;
>
> typename TOutputImage::InternalPixelType *
> outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
> typename TOutputImage::InternalPixelType *
> inputBuffer =c reader->GetOutput()->GetBufferPointer();
>
> memcpy( outputSliceBuffer, inputBuffer, sizeof(
> typename TOutputImage::InternalPixelType ) *
> numberOfPixelsInSlice * numberOfComponents );
> }
>
> Still for this case, no copy is still better then memcpy.
>
> On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> (NIH/NLM/LHC) [C] wrote:
>
>
> Hello Roger,
>
> Your benchmark program had a few more
> dependencies, the just ITK so I wrote my own and attached it.
> I used a series of tiff I have, so I hope it would be
> comparable. I have also arrived at a similar conclusion that
> the copy loop is expensive and should be avoided. However, my
> benchmark does indicate that the progress reporting is taking
> 50% of the additional execution time, which is rather
> different then your experiment.
>
>
> Testing series reader with 349 files.
> Image Size: [2048, 1536, 349]
>
> # current ITK
> Executed 10 times with mean 24.4403s
>
> # progress commented out
> Executed 10 times with mean 20.7206s
>
> # copy loop commented out
> Executed with 10 times with mean 16.5306s
>
> # gerrit patch version
> Executed 10 times with mean 16.9262s
>
> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
>
>
> ========================================================
> Bradley Lowekamp
> Lockheed Martin Contractor for
> Office of High Performance Computing and Communications
> National Library of Medicine
> blowekamp at mail.nih.gov
>
>
>
> _______________________________________________ Powered
> by www.kitware.com <http://www.kitware.com> Visit other
> Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
> <http://www.kitware.com/opensource/opensource.html> Kitware
> offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.html Please keep
> messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>
>
> ________________________________
>
> Notice: This UI Health Care e-mail (including
> attachments) is covered by the Electronic Communications
> Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be
> legally privileged. If you are not the intended recipient,
> you are hereby notified that any retention, dissemination,
> distribution, or copying of this communication is strictly
> prohibited. Please reply to the sender that you have
> received the message in error, then delete it. Thank you.
> ________________________________
>
>
>
> ========================================================
>
> Bradley Lowekamp
>
> Lockheed Martin Contractor for
>
> Office of High Performance Computing and Communications
>
> National Library of Medicine
>
> blowekamp at mail.nih.gov
>
>
>
>
>
> ________________________________
>
> Notice: This UI Health Care e-mail (including attachments) is
> covered by the Electronic Communications Privacy Act, 18
> U.S.C. 2510-2521, is confidential and may be legally
> privileged. If you are not the intended recipient, you are
> hereby notified that any retention, dissemination,
> distribution, or copying of this communication is strictly
> prohibited. Please reply to the sender that you have
> received the message in error, then delete it. Thank you.
> ________________________________
>
>
More information about the Insight-developers
mailing list