[Insight-developers] memcpy VS iterators copy!!
Kris Thielemans
kris.thielemans at csc.mrc.ac.uk
Tue Mar 29 18:19:28 EDT 2011
Hi
I'm afraid I don't know current (or v4) status of ITK iterators well enough,
but if there's now an STL-complaint iterator concept in ITK, std::copy
should do the proper thing (or it's a bug in ITK or the compiler). The
compiler's is_pod and hence std::copy as well should be pretty smart. Maybe
not as smart as a human, but possibly less error-prone.
Of course, if you have a dumb compiler, it doesn't help (but I'm not sure if
you should cater for that anymore). Seems easy to redo the timings with
std:copy to see what happens in any case.
Kris
PS: if you decide for memcpy, the gcc note implies it should be memmove.
> -----Original Message-----
> From: Bradley Lowekamp [mailto:blowekamp at mail.nih.gov]
> Sent: 29 March 2011 15:25
> To: Bill Lorensen
> Cc: Kris Thielemans; Johnson, Hans J; ITK; Luis Ibanez
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
>
> Bill and Kris,
>
> Using std::copy over memcpy would leave it to the compiler to
> make the is_pod check, and may be easier. It could also be a
> little more flexible is some situations.
>
> The real challenge is going to be drill down the ITK layers
> of abstractions to ensure that such a method will produce the
> correct results for Image, Image of complex, Image of
> Vectors, VectorImage, Adapted Image and Adapted Vector Image etc...
>
>
> I propose the following, create the interface similar to
> ImageDuplicatior to perform Image region to region coping.
> Initially, implement it with the iterator copy for loop.
> Ensure that for all ITK images it works. Then at some point
> we can begin to optimize the cases which we can detect.
> During the review process, these types of loops should be
> replaces with this new Region duplicator class.
>
> Brad
>
>
>
> On Mar 28, 2011, at 7:24 PM, Bill Lorensen wrote:
>
>
> Should we be using std:;copy instead of memcpy?
>
>
> On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans
> <kris.thielemans at csc.mrc.ac.uk> wrote:
>
>
> Hi
>
> any decent C++ compiler should have these
> optimisations when using
> std::copy. For example, from
>
> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
>
> // All of these auxiliary functions serve two
> purposes. (1) Replace
> // calls to copy with memmove whenever
> possible. (Memmove, not memcpy,
> // because the input and output ranges are
> permitted to overlap.)
> // (2) If we're using random access iterators,
> then write the loop as
> // a for loop with an explicit count.
>
> Kris
>
>
> > -----Original Message-----
> > From: insight-developers-bounces at itk.org
> > [mailto:insight-developers-bounces at itk.org]
> On Behalf Of
> > Johnson, Hans J
> > Sent: 28 March 2011 17:56
> > To: Bradley Lowekamp
> > Cc: ITK; Luis Ibanez
> > Subject: Re: [Insight-developers] memcpy VS
> iterators copy!!
> >
> > Brad,
> >
> > I support pulling in some form of
> >
> std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> > This was exactly what I was thinking of.
> I"ve looked into
> > this once, and I think it has several places
> where this
> > paradigm can help solve key performance
> bottle necks where
> > our need to support general pixel types (I.e.
> We need to
> > support pixel types of "elephant objects")
> severely impact
> > the most common scalar/POD performance.
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:49:00 -0400
> > To: Hans Johnson <hans-johnson at uiowa.edu>
> > Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers] memcpy VS
> iterators copy!!
> >
> >
> > Hans,
> >
> > On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> >
> >
> > Brad,
> >
> >
>
> > 1. There is only one time listed.
> IS 16.8704s
>
> > fast or slow? What is the comparison time?
> >
> > The comparison times were at the bottom of
> the e-mail, from a
> > prior message. The ImageSeriesReader is
> performing a copy for
> > the ImageReader to the output image on a
> per-slice basis, by
> > test images were unsigned chars of:
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # memcpy at for each slice
> > Executed 10 times with mean 16.8704s
> >
> > # current ITK iterator for loop with progress
> Executed 10
> > times with mean 24.4403s
> >
> > # iterator for loop with out progress
> > Executed 10 times with mean 20.7206s
> >
> > # no for loop
> > Executed with 10 times with mean 16.5306s
> >
> > # gerrit patch version, which avoided the
> copy by setting a buffer.
> > Executed 10 times with mean 16.9262s
> >
> >
> >
>
> > 2. This should now be possible now
> that we have
>
> > partial template specialization. How do we
> identify POD
> > that is suitable for memcopy algorithm vs.
> the required iterators?
> >
> > In an ideal world we could just use
> >
> std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> > case we may need to introduce a NumericTraits
> field for this.
> > How does the image duplicator deal with it now?
> >
> >
>
> > 3. Where was this loop copy
> iteration listed?
>
> > The FillBuffer is also a prime candidate for a speed
> > improvement. Especially the FillBuffer(0) paradigm for
> > integer and floating point types.
> >
> >
> > I see a class call ImageRegionDuplicator,
> which would be
> > similar to the ImageDuplicator in some respects.
> >
> > The trick is also going to be determining the
> maximum input
> > stride and output stride, so that the longest
> contiguous
> > section of memory can be copied. I would be
> willing to bet
> > that even memcpying row by row would be
> faster than the
> > iterator loop. What I am unsure about, is if
> multiple threads
> > have any benefits.
> >
> > When I am doing streaming there is a lot of
> region extraction
> > and compositing, which are really just buffer
> coping. I think
> > this could have a big impact on this type of
> operation aswell.
> >
> >
> >
> > Hans
> >
> >
> > From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Date: Mon, 28 Mar 2011 11:00:10 -0400
> > To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> > Cc: ITK <insight-developers at itk.org>,
> Luis Ibanez
> > <luis.ibanez at kitware.com>
> > Subject: Re: [Insight-developers]
> memcpy VS iterators copy!!
> >
> >
> > Hello,
> >
> > This is another performance improvement
> that I think
> > should me a MUST for v4! We need to replace
> the for loop
> > image iterator copies with an abstraction
> that can use memcpy
> > when possible!
> >
> > I have been wanting to run the
> performance comparison
> > for a while and this was the opportunity to
> do so! I replaced
> > the for loop in question here with a memcpy (
> it still has
> > bugs it it but it's doing the needed work
> extremely fast! )
> >
> > # memcpy loop
> > Executed 10 times with mean 16.8704s
> >
> > I just replaced the for loop with a memcpy:
> > {
> > const IdentifierType numberOfPixelsInSlice =
> > sliceRegionToRequest.GetNumberOfPixels();
> > const size_t numberOfComponents =
> > output->GetNumberOfComponentsPerPixel();
> > const IdentifierType
> numberOfPixelsUpToSlice =
> > numberOfPixelsInSlice * i * numberOfComponents;
> >
> > typename TOutputImage::InternalPixelType *
> > outputSliceBuffer = outputBuffer +
> numberOfPixelsUpToSlice;
> > typename TOutputImage::InternalPixelType *
> > inputBuffer =c
> reader->GetOutput()->GetBufferPointer();
> >
> > memcpy( outputSliceBuffer,
> inputBuffer, sizeof(
> > typename TOutputImage::InternalPixelType ) *
> > numberOfPixelsInSlice * numberOfComponents );
> > }
> >
> > Still for this case, no copy is still
> better then memcpy.
> >
> > On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> > (NIH/NLM/LHC) [C] wrote:
> >
> >
> > Hello Roger,
> >
> > Your benchmark program had a few more
> > dependencies, the just ITK so I wrote my own
> and attached it.
> > I used a series of tiff I have, so I hope it would be
> > comparable. I have also arrived at a similar
> conclusion that
> > the copy loop is expensive and should be
> avoided. However, my
> > benchmark does indicate that the progress
> reporting is taking
> > 50% of the additional execution time, which is rather
> > different then your experiment.
> >
> >
> > Testing series reader with 349 files.
> > Image Size: [2048, 1536, 349]
> >
> > # current ITK
> > Executed 10 times with mean 24.4403s
> >
> > # progress commented out
> > Executed 10 times with mean 20.7206s
> >
> > # copy loop commented out
> > Executed with 10 times with
> mean 16.5306s
> >
> > # gerrit patch version
> > Executed 10 times with mean 16.9262s
> >
> >
> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> >
> >
> >
> ========================================================
> > Bradley Lowekamp
> > Lockheed Martin Contractor for
> > Office of High Performance Computing
> and Communications
> > National Library of Medicine
> > blowekamp at mail.nih.gov
> >
> >
> >
> >
> _______________________________________________ Powered
>
> > by www.kitware.com <http://www.kitware.com/>
> <http://www.kitware.com <http://www.kitware.com/> > Visit other
>
> > Kitware open-source projects at
> > http://www.kitware.com/opensource/opensource.html
> >
> <http://www.kitware.com/opensource/opensource.html> Kitware
> > offers ITK Training Courses, for more
> information visit:
> > http://kitware.com/products/protraining.html
> Please keep
> > messages on-topic and check the ITK FAQ at:
> > http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> > subscribe/unsubscribe:
> > http://www.itk.org/mailman/listinfo/insight-developers
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including
> > attachments) is covered by the Electronic
> Communications
> > Privacy Act, 18 U.S.C. 2510-2521, is
> confidential and may be
> > legally privileged. If you are not the
> intended recipient,
> > you are hereby notified that any retention,
> dissemination,
> > distribution, or copying of this
> communication is strictly
> > prohibited. Please reply to the sender that you have
> > received the message in error, then delete
> it. Thank you.
> > ________________________________
> >
> >
> >
> >
> ========================================================
> >
> > Bradley Lowekamp
> >
> > Lockheed Martin Contractor for
> >
> > Office of High Performance Computing and
> Communications
> >
> > National Library of Medicine
> >
> > blowekamp at mail.nih.gov
> >
> >
> >
> >
> >
> > ________________________________
> >
> > Notice: This UI Health Care e-mail (including
> attachments) is
> > covered by the Electronic Communications
> Privacy Act, 18
> > U.S.C. 2510-2521, is confidential and may be legally
> > privileged. If you are not the intended
> recipient, you are
> > hereby notified that any retention, dissemination,
> > distribution, or copying of this
> communication is strictly
> > prohibited. Please reply to the sender that you have
> > received the message in error, then delete
> it. Thank you.
> > ________________________________
> >
> >
>
>
>
> _______________________________________________
>
> Powered by www.kitware.com <http://www.kitware.com/>
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more
> information visit:
> http://kitware.com/products/protraining.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>
>
>
>
> ========================================================
>
> Bradley Lowekamp
>
> Lockheed Martin Contractor for
>
> Office of High Performance Computing and Communications
>
> National Library of Medicine
>
> blowekamp at mail.nih.gov
>
>
>
>
More information about the Insight-developers
mailing list