[Insight-developers] memcpy VS iterators copy!!

Tue Mar 29 18:19:28 EDT 2011

Hi

I'm afraid I don't know current (or v4) status of ITK iterators well enough,
but if there's now an STL-complaint iterator concept in ITK, std::copy
should do the proper thing (or it's a bug in ITK or the compiler). The
compiler's is_pod and hence std::copy as well should be pretty smart. Maybe
not as smart as a human, but possibly less error-prone.

Of course, if you have a dumb compiler, it doesn't help (but I'm not sure if
you should cater for that anymore). Seems easy to redo the timings with
std:copy to see what happens in any case.

Kris

PS: if you decide for memcpy, the gcc note implies it should be memmove.
> -----Original Message-----
> From: Bradley Lowekamp [mailto:blowekamp at mail.nih.gov] 
> Sent: 29 March 2011 15:25
> To: Bill Lorensen
> Cc: Kris Thielemans; Johnson, Hans J; ITK; Luis Ibanez
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> 
> Bill and Kris,
> 
> Using std::copy over memcpy would leave it to the compiler to 
> make the is_pod check, and may be easier. It could also be a 
> little more flexible is some situations.
> 
> The real challenge is going to be drill down the ITK layers 
> of abstractions to ensure that such a method will produce the 
> correct results for Image, Image of complex, Image of 
> Vectors, VectorImage, Adapted Image and Adapted Vector Image etc...
> 
> 
> I propose the following, create the interface similar to 
> ImageDuplicatior to perform Image region to region coping. 
> Initially, implement it with the iterator copy for loop. 
> Ensure that for all ITK images it works. Then at some point 
> we can begin to optimize the cases which we can detect. 
> During the review process, these types of loops should be 
> replaces with this new Region duplicator class.
> 
> Brad
> 
> 
> 
> On Mar 28, 2011, at 7:24 PM, Bill Lorensen wrote:
> 
> 
> 	Should we be using std:;copy instead of memcpy?
> 	
> 	
> 	On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans 
> <kris.thielemans at csc.mrc.ac.uk> wrote:
> 	
> 
> 		Hi
> 		
> 		any decent C++ compiler should have these 
> optimisations when using
> 		std::copy. For example, from
> 		 
> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
> 		
> 		 // All of these auxiliary functions serve two 
> purposes.  (1) Replace
> 		 // calls to copy with memmove whenever 
> possible.  (Memmove, not memcpy,
> 		 // because the input and output ranges are 
> permitted to overlap.)
> 		 // (2) If we're using random access iterators, 
> then write the loop as
> 		 // a for loop with an explicit count.
> 		
> 		Kris
> 		
> 
> 		> -----Original Message-----
> 		> From: insight-developers-bounces at itk.org
> 		> [mailto:insight-developers-bounces at itk.org] 
> On Behalf Of
> 		> Johnson, Hans J
> 		> Sent: 28 March 2011 17:56
> 		> To: Bradley Lowekamp
> 		> Cc: ITK; Luis Ibanez
> 		> Subject: Re: [Insight-developers] memcpy VS 
> iterators copy!!
> 		>
> 		> Brad,
> 		>
> 		> I support pulling in some form of
> 		> 
> std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
> 		> This was exactly what I was thinking of.  
> I"ve looked into
> 		> this once, and I think it has several places 
> where this
> 		> paradigm can help solve key performance 
> bottle necks where
> 		> our need to support general pixel types (I.e. 
> We need to
> 		> support pixel types of "elephant objects") 
> severely impact
> 		> the most common scalar/POD performance.
> 		>
> 		>
> 		> Hans
> 		>
> 		>
> 		> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> 		> Date: Mon, 28 Mar 2011 11:49:00 -0400
> 		> To: Hans Johnson <hans-johnson at uiowa.edu>
> 		> Cc: ITK <insight-developers at itk.org>, Luis Ibanez
> 		> <luis.ibanez at kitware.com>
> 		> Subject: Re: [Insight-developers] memcpy VS 
> iterators copy!!
> 		>
> 		>
> 		> Hans,
> 		>
> 		> On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> 		>
> 		>
> 		>       Brad,
> 		>
> 		>
> 		
> 		>       1.      There is only one time listed.  
> IS 16.8704s
> 		
> 		> fast or slow?  What is the comparison time?
> 		>
> 		> The comparison times were at the bottom of 
> the e-mail, from a
> 		> prior message. The ImageSeriesReader is 
> performing a copy for
> 		> the ImageReader to the output image on a 
> per-slice basis, by
> 		> test images were unsigned chars of:
> 		>
> 		> Testing series reader with 349 files.
> 		> Image Size: [2048, 1536, 349]
> 		>
> 		> # memcpy at for each slice
> 		> Executed 10 times with mean 16.8704s
> 		>
> 		> # current ITK iterator for loop with progress 
> Executed 10
> 		> times with mean 24.4403s
> 		>
> 		> # iterator for loop with out progress
> 		> Executed  10 times with mean 20.7206s
> 		>
> 		> # no for loop
> 		> Executed with 10 times with mean 16.5306s
> 		>
> 		> # gerrit patch version, which avoided the 
> copy by setting a buffer.
> 		> Executed 10 times with mean 16.9262s
> 		>
> 		>
> 		>
> 		
> 		>       2.      This should now be possible now 
> that we have
> 		
> 		> partial template specialization.   How do we 
> identify POD
> 		> that is suitable for memcopy algorithm vs. 
> the required iterators?
> 		>
> 		> In an ideal world we could just use
> 		> 
> std::tr1::is_pod<ImageType::InternalPixelType>. But in our
> 		> case we may need to introduce a NumericTraits 
> field for this.
> 		> How does the image duplicator deal with it now?
> 		>
> 		>
> 		
> 		>       3.      Where was this loop copy 
> iteration listed?
> 		
> 		> The FillBuffer is also a prime candidate for a speed
> 		> improvement. Especially the FillBuffer(0) paradigm for
> 		> integer and floating point types.
> 		>
> 		>
> 		> I see a class call ImageRegionDuplicator, 
> which would be
> 		> similar to the ImageDuplicator in some respects.
> 		>
> 		> The trick is also going to be determining the 
> maximum input
> 		> stride and output stride, so that the longest 
> contiguous
> 		> section of memory can be copied. I would be 
> willing to bet
> 		> that even memcpying row by row would be 
> faster than the
> 		> iterator loop. What I am unsure about, is if 
> multiple threads
> 		> have any benefits.
> 		>
> 		> When I am doing streaming there is a lot of 
> region extraction
> 		> and compositing, which are really just buffer 
> coping. I think
> 		> this could have a big impact on this type of 
> operation aswell.
> 		>
> 		>
> 		>
> 		>       Hans
> 		>
> 		>
> 		>       From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> 		>       Date: Mon, 28 Mar 2011 11:00:10 -0400
> 		>       To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> 		>       Cc: ITK <insight-developers at itk.org>, 
> Luis Ibanez
> 		> <luis.ibanez at kitware.com>
> 		>       Subject: Re: [Insight-developers] 
> memcpy VS iterators copy!!
> 		>
> 		>
> 		>       Hello,
> 		>
> 		>       This is another performance improvement 
> that I think
> 		> should me a MUST for v4! We need to replace 
> the for loop
> 		> image iterator copies with an abstraction 
> that can use memcpy
> 		> when possible!
> 		>
> 		>       I have been wanting to run the 
> performance comparison
> 		> for a while and this was the opportunity to 
> do so! I replaced
> 		> the for loop in question here with a memcpy ( 
> it still has
> 		> bugs it it but it's doing the needed work 
> extremely fast! )
> 		>
> 		>       # memcpy loop
> 		>       Executed 10 times with mean 16.8704s
> 		>
> 		>       I just replaced the for loop with a memcpy:
> 		>           {
> 		>           const IdentifierType numberOfPixelsInSlice =
> 		> sliceRegionToRequest.GetNumberOfPixels();
> 		>           const size_t numberOfComponents =
> 		> output->GetNumberOfComponentsPerPixel();
> 		>           const IdentifierType 
> numberOfPixelsUpToSlice =
> 		> numberOfPixelsInSlice * i * numberOfComponents;
> 		>
> 		>           typename  TOutputImage::InternalPixelType *
> 		> outputSliceBuffer = outputBuffer + 
> numberOfPixelsUpToSlice;
> 		>           typename  TOutputImage::InternalPixelType *
> 		> inputBuffer =c 
> reader->GetOutput()->GetBufferPointer();
> 		>
> 		>           memcpy( outputSliceBuffer, 
> inputBuffer, sizeof(
> 		> typename TOutputImage::InternalPixelType ) *
> 		> numberOfPixelsInSlice * numberOfComponents );
> 		>           }
> 		>
> 		>       Still for this case, no copy is still 
> better then memcpy.
> 		>
> 		>       On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
> 		> (NIH/NLM/LHC) [C] wrote:
> 		>
> 		>
> 		>               Hello Roger,
> 		>
> 		>               Your benchmark program had a few more
> 		> dependencies, the just ITK so I wrote my own 
> and attached it.
> 		> I used a series of tiff I have, so I hope it would be
> 		> comparable. I have also arrived at a similar 
> conclusion that
> 		> the copy loop is expensive and should be 
> avoided. However, my
> 		> benchmark does indicate that the progress 
> reporting is taking
> 		> 50% of the additional execution time, which is rather
> 		> different then your experiment.
> 		>
> 		>
> 		>               Testing series reader with 349 files.
> 		>               Image Size: [2048, 1536, 349]
> 		>
> 		>               # current ITK
> 		>               Executed 10 times with mean 24.4403s
> 		>
> 		>               # progress commented out
> 		>               Executed  10 times with mean 20.7206s
> 		>
> 		>               # copy loop commented out
> 		>               Executed with 10 times with 
> mean 16.5306s
> 		>
> 		>               # gerrit patch version
> 		>               Executed 10 times with mean 16.9262s
> 		>
> 		>               
> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> 		>
> 		>
> 		>       
> ========================================================
> 		>       Bradley Lowekamp
> 		>       Lockheed Martin Contractor for
> 		>       Office of High Performance Computing 
> and Communications
> 		>       National Library of Medicine
> 		>       blowekamp at mail.nih.gov
> 		>
> 		>
> 		>
> 		>       
> _______________________________________________ Powered
> 		
> 		> by www.kitware.com <http://www.kitware.com/>  
> <http://www.kitware.com <http://www.kitware.com/> >  Visit other
> 		
> 		> Kitware open-source projects at
> 		> http://www.kitware.com/opensource/opensource.html
> 		> 
> <http://www.kitware.com/opensource/opensource.html>  Kitware
> 		> offers ITK Training Courses, for more 
> information visit:
> 		> http://kitware.com/products/protraining.html 
> Please keep
> 		> messages on-topic and check the ITK FAQ at:
> 		> http://www.itk.org/Wiki/ITK_FAQ Follow this link to
> 		> subscribe/unsubscribe:
> 		> http://www.itk.org/mailman/listinfo/insight-developers
> 		>
> 		>
> 		> ________________________________
> 		>
> 		>       Notice: This UI Health Care e-mail (including
> 		> attachments) is covered by the Electronic 
> Communications
> 		> Privacy Act, 18 U.S.C. 2510-2521, is 
> confidential and may be
> 		> legally privileged.  If you are not the 
> intended recipient,
> 		> you are hereby notified that any retention, 
> dissemination,
> 		> distribution, or copying of this 
> communication is strictly
> 		> prohibited.  Please reply to the sender that you have
> 		> received the message in error, then delete 
> it.  Thank you.
> 		> ________________________________
> 		>
> 		>
> 		>
> 		> 
> ========================================================
> 		>
> 		> Bradley Lowekamp
> 		>
> 		> Lockheed Martin Contractor for
> 		>
> 		> Office of High Performance Computing and 
> Communications
> 		>
> 		> National Library of Medicine
> 		>
> 		> blowekamp at mail.nih.gov
> 		>
> 		>
> 		>
> 		>
> 		>
> 		> ________________________________
> 		>
> 		> Notice: This UI Health Care e-mail (including 
> attachments) is
> 		> covered by the Electronic Communications 
> Privacy Act, 18
> 		> U.S.C. 2510-2521, is confidential and may be legally
> 		> privileged.  If you are not the intended 
> recipient, you are
> 		> hereby notified that any retention, dissemination,
> 		> distribution, or copying of this 
> communication is strictly
> 		> prohibited.  Please reply to the sender that you have
> 		> received the message in error, then delete 
> it.  Thank you.
> 		> ________________________________
> 		>
> 		>
> 		
> 		
> 		
> 		_______________________________________________
> 		
> 		Powered by www.kitware.com <http://www.kitware.com/> 
> 		
> 		Visit other Kitware open-source projects at
> 		http://www.kitware.com/opensource/opensource.html
> 		
> 		Kitware offers ITK Training Courses, for more 
> information visit:
> 		http://kitware.com/products/protraining.html
> 		
> 		Please keep messages on-topic and check the ITK FAQ at:
> 		http://www.itk.org/Wiki/ITK_FAQ
> 		
> 		Follow this link to subscribe/unsubscribe:
> 		http://www.itk.org/mailman/listinfo/insight-developers
> 		
> 
> 
> 
> ========================================================
> 
> Bradley Lowekamp  
> 
> Lockheed Martin Contractor for
> 
> Office of High Performance Computing and Communications
> 
> National Library of Medicine 
> 
> blowekamp at mail.nih.gov
> 
> 
> 
>