[Insight-developers] memcpy VS iterators copy!!

Mon Mar 28 16:05:54 EDT 2011

Hi

any decent C++ compiler should have these optimisations when using
std::copy. For example, from 
 /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h

 // All of these auxiliary functions serve two purposes.  (1) Replace
 // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
 // because the input and output ranges are permitted to overlap.)
 // (2) If we're using random access iterators, then write the loop as
 // a for loop with an explicit count. 

Kris

> -----Original Message-----
> From: insight-developers-bounces at itk.org 
> [mailto:insight-developers-bounces at itk.org] On Behalf Of 
> Johnson, Hans J
> Sent: 28 March 2011 17:56
> To: Bradley Lowekamp
> Cc: ITK; Luis Ibanez
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> 
> Brad,
> 
> I support pulling in some form of 
> std::tr1::is_pod<ImageType::InternalPixelType> paradigm.  
> This was exactly what I was thinking of.  I"ve looked into 
> this once, and I think it has several places where this 
> paradigm can help solve key performance bottle necks where 
> our need to support general pixel types (I.e. We need to 
> support pixel types of "elephant objects") severely impact 
> the most common scalar/POD performance.
> 
> 
> Hans
> 
> 
> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> Date: Mon, 28 Mar 2011 11:49:00 -0400
> To: Hans Johnson <hans-johnson at uiowa.edu>
> Cc: ITK <insight-developers at itk.org>, Luis Ibanez 
> <luis.ibanez at kitware.com>
> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> 
> 
> Hans, 
> 
> On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
> 
> 
> 	Brad,
> 
> 
> 	1.	There is only one time listed.  IS 16.8704s 
> fast or slow?  What is the comparison time?
> 
> The comparison times were at the bottom of the e-mail, from a 
> prior message. The ImageSeriesReader is performing a copy for 
> the ImageReader to the output image on a per-slice basis, by 
> test images were unsigned chars of:
> 
> Testing series reader with 349 files.
> Image Size: [2048, 1536, 349]
> 
> # memcpy at for each slice
> Executed 10 times with mean 16.8704s
> 
> # current ITK iterator for loop with progress Executed 10 
> times with mean 24.4403s
> 
> # iterator for loop with out progress
> Executed  10 times with mean 20.7206s
> 
> # no for loop
> Executed with 10 times with mean 16.5306s
> 
> # gerrit patch version, which avoided the copy by setting a buffer.
> Executed 10 times with mean 16.9262s
> 
> 
> 
> 	2.	This should now be possible now that we have 
> partial template specialization.   How do we identify POD 
> that is suitable for memcopy algorithm vs. the required iterators?
> 
> In an ideal world we could just use 
> std::tr1::is_pod<ImageType::InternalPixelType>. But in our 
> case we may need to introduce a NumericTraits field for this. 
> How does the image duplicator deal with it now?
> 
> 
> 	3.	Where was this loop copy iteration listed?   
> The FillBuffer is also a prime candidate for a speed 
> improvement. Especially the FillBuffer(0) paradigm for 
> integer and floating point types.
> 
> 
> I see a class call ImageRegionDuplicator, which would be 
> similar to the ImageDuplicator in some respects. 
> 
> The trick is also going to be determining the maximum input 
> stride and output stride, so that the longest contiguous 
> section of memory can be copied. I would be willing to bet 
> that even memcpying row by row would be faster than the 
> iterator loop. What I am unsure about, is if multiple threads 
> have any benefits.
> 
> When I am doing streaming there is a lot of region extraction 
> and compositing, which are really just buffer coping. I think 
> this could have a big impact on this type of operation aswell.
> 
> 
> 
> 	Hans
> 
> 	
> 	From: Bradley Lowekamp <blowekamp at mail.nih.gov>
> 	Date: Mon, 28 Mar 2011 11:00:10 -0400
> 	To: Bradley Lowekamp <blowekamp at mail.nih.gov>
> 	Cc: ITK <insight-developers at itk.org>, Luis Ibanez 
> <luis.ibanez at kitware.com>
> 	Subject: Re: [Insight-developers] memcpy VS iterators copy!!
> 	
> 
> 	Hello,
> 
> 	This is another performance improvement that I think 
> should me a MUST for v4! We need to replace the for loop 
> image iterator copies with an abstraction that can use memcpy 
> when possible!
> 
> 	I have been wanting to run the performance comparison 
> for a while and this was the opportunity to do so! I replaced 
> the for loop in question here with a memcpy ( it still has 
> bugs it it but it's doing the needed work extremely fast! )
> 
> 	# memcpy loop
> 	Executed 10 times with mean 16.8704s
> 
> 	I just replaced the for loop with a memcpy:
> 	    {
> 	    const IdentifierType numberOfPixelsInSlice = 
> sliceRegionToRequest.GetNumberOfPixels();
> 	    const size_t numberOfComponents = 
> output->GetNumberOfComponentsPerPixel();
> 	    const IdentifierType numberOfPixelsUpToSlice = 
> numberOfPixelsInSlice * i * numberOfComponents;
> 
> 	    typename  TOutputImage::InternalPixelType * 
> outputSliceBuffer = outputBuffer + numberOfPixelsUpToSlice;
> 	    typename  TOutputImage::InternalPixelType * 
> inputBuffer =c reader->GetOutput()->GetBufferPointer();
> 
> 	    memcpy( outputSliceBuffer, inputBuffer, sizeof( 
> typename TOutputImage::InternalPixelType ) * 
> numberOfPixelsInSlice * numberOfComponents );
> 	    }
> 
> 	Still for this case, no copy is still better then memcpy. 
> 
> 	On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley 
> (NIH/NLM/LHC) [C] wrote:
> 
> 
> 		Hello Roger,
> 
> 		Your benchmark program had a few more 
> dependencies, the just ITK so I wrote my own and attached it. 
> I used a series of tiff I have, so I hope it would be 
> comparable. I have also arrived at a similar conclusion that 
> the copy loop is expensive and should be avoided. However, my 
> benchmark does indicate that the progress reporting is taking 
> 50% of the additional execution time, which is rather 
> different then your experiment.
> 
> 
> 		Testing series reader with 349 files.
> 		Image Size: [2048, 1536, 349]
> 
> 		# current ITK
> 		Executed 10 times with mean 24.4403s
> 
> 		# progress commented out
> 		Executed  10 times with mean 20.7206s
> 
> 		# copy loop commented out
> 		Executed with 10 times with mean 16.5306s
> 
> 		# gerrit patch version
> 		Executed 10 times with mean 16.9262s
> 
> 		<itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
> 
> 
> 	========================================================
> 	Bradley Lowekamp  
> 	Lockheed Martin Contractor for
> 	Office of High Performance Computing and Communications
> 	National Library of Medicine 
> 	blowekamp at mail.nih.gov
> 	
> 	
> 
> 	_______________________________________________ Powered 
> by www.kitware.com <http://www.kitware.com>  Visit other 
> Kitware open-source projects at 
> http://www.kitware.com/opensource/opensource.html 
> <http://www.kitware.com/opensource/opensource.html>  Kitware 
> offers ITK Training Courses, for more information visit: 
> http://kitware.com/products/protraining.html Please keep 
> messages on-topic and check the ITK FAQ at: 
> http://www.itk.org/Wiki/ITK_FAQ Follow this link to 
> subscribe/unsubscribe: 
> http://www.itk.org/mailman/listinfo/insight-developers
> 	
> 	
> ________________________________
> 
> 	Notice: This UI Health Care e-mail (including 
> attachments) is covered by the Electronic Communications 
> Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be 
> legally privileged.  If you are not the intended recipient, 
> you are hereby notified that any retention, dissemination, 
> distribution, or copying of this communication is strictly 
> prohibited.  Please reply to the sender that you have 
> received the message in error, then delete it.  Thank you. 
> ________________________________
> 
> 
> 
> ========================================================
> 
> Bradley Lowekamp  
> 
> Lockheed Martin Contractor for
> 
> Office of High Performance Computing and Communications
> 
> National Library of Medicine 
> 
> blowekamp at mail.nih.gov
> 
> 
> 
> 
> 
> ________________________________
> 
> Notice: This UI Health Care e-mail (including attachments) is 
> covered by the Electronic Communications Privacy Act, 18 
> U.S.C. 2510-2521, is confidential and may be legally 
> privileged.  If you are not the intended recipient, you are 
> hereby notified that any retention, dissemination, 
> distribution, or copying of this communication is strictly 
> prohibited.  Please reply to the sender that you have 
> received the message in error, then delete it.  Thank you. 
> ________________________________
> 
>