[Insight-developers] memcpy VS iterators copy!!

Bradley Lowekamp blowekamp at mail.nih.gov
Tue Mar 29 22:55:18 EDT 2011


Hello,

Bill also posted similar comments on the gerrit patch. It it likely best to keep the discussion more on the mailing list then gerrit.

The ITK iterators do not conform to the STL iterator concept. Therefore, they can not directly be used with the STL algorithms such as std::copy or std::fill etc... 

A key feature of ITK iterator is support for iterating on a subset of the image defined by a region. This makes the increment operators non-trivial. Even if the ITK iterator was implemented as a STL iterator, it would not be very optimized by the compiler because of the conditional needed at each increment for the region. I have spent time trying to get the Intel compiler to vectorize ITK iterator loops, even removing many checks, the compiler just will not do it.  The structure of ITK iterators is just too complicated. 

The iterator for the std::vector is frequently optimized for template algorithms such as std::copy, and iterator loops can be vectorized by compilers too. The std::vector is very close to a pointer, which is why these optimization occur. Additionally, pointers conform to the STL iterator concept, and these optimizations can occur for them as well.

Using std::copy on ITK iterators, will not work, even if we reinvent them.


To implement optimized region copying, we must first determine if the image's buffers can be directly copies, then determine the maximal continuous stride where memcpy or std::copy can be used on the image component. If std::copy is used here it will be passed pointers to the buffer for the iterators, which should be optimized to memcpy if pod. 

To check if the image's buffers are compatible, they must both be itk::Image or itk::VectorImage. If Image, they need to be the same pixel type ( dimension can differ ). If VectorImage, then they need the same component, and the same number of elements. This should rule out blox, adaptors, and label maps etc. For these other cases we fall back to the implementation currently in gerrit.

Jim has been raising the question if the gerrit implementation should even be a object. He is suggesting only a function is needed. My implementation was just following the lead of the itk::ImageDuplicator, which is a class.

Brad

On Mar 29, 2011, at 6:19 PM, Kris Thielemans wrote:

> Hi
> 
> I'm afraid I don't know current (or v4) status of ITK iterators well enough,
> but if there's now an STL-complaint iterator concept in ITK, std::copy
> should do the proper thing (or it's a bug in ITK or the compiler). The
> compiler's is_pod and hence std::copy as well should be pretty smart. Maybe
> not as smart as a human, but possibly less error-prone.
> 
> Of course, if you have a dumb compiler, it doesn't help (but I'm not sure if
> you should cater for that anymore). Seems easy to redo the timings with
> std:copy to see what happens in any case.
> 
> Kris
> 
> PS: if you decide for memcpy, the gcc note implies it should be memmove.
>> -----Original Message-----
>> From: Bradley Lowekamp [mailto:blowekamp at mail.nih.gov]
>> Sent: 29 March 2011 15:25
>> To: Bill Lorensen
>> Cc: Kris Thielemans; Johnson, Hans J; ITK; Luis Ibanez
>> Subject: Re: [Insight-developers] memcpy VS iterators copy!!
>> 
>> Bill and Kris,
>> 
>> Using std::copy over memcpy would leave it to the compiler to
>> make the is_pod check, and may be easier. It could also be a
>> little more flexible is some situations.
>> 
>> The real challenge is going to be drill down the ITK layers
>> of abstractions to ensure that such a method will produce the
>> correct results for Image, Image of complex, Image of
>> Vectors, VectorImage, Adapted Image and Adapted Vector Image etc...
>> 
>> 
>> I propose the following, create the interface similar to
>> ImageDuplicatior to perform Image region to region coping.
>> Initially, implement it with the iterator copy for loop.
>> Ensure that for all ITK images it works. Then at some point
>> we can begin to optimize the cases which we can detect.
>> During the review process, these types of loops should be
>> replaces with this new Region duplicator class.
>> 
>> Brad
>> 
>> 
>> 
>> On Mar 28, 2011, at 7:24 PM, Bill Lorensen wrote:
>> 
>> 
>>      Should we be using std:;copy instead of memcpy?
>> 
>> 
>>      On Mon, Mar 28, 2011 at 4:05 PM, Kris Thielemans
>> <kris.thielemans at csc.mrc.ac.uk> wrote:
>> 
>> 
>>              Hi
>> 
>>              any decent C++ compiler should have these
>> optimisations when using
>>              std::copy. For example, from
>> 
>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_algobase.h
>> 
>>               // All of these auxiliary functions serve two
>> purposes.  (1) Replace
>>               // calls to copy with memmove whenever
>> possible.  (Memmove, not memcpy,
>>               // because the input and output ranges are
>> permitted to overlap.)
>>               // (2) If we're using random access iterators,
>> then write the loop as
>>               // a for loop with an explicit count.
>> 
>>              Kris
>> 
>> 
>>> -----Original Message-----
>>> From: insight-developers-bounces at itk.org
>>> [mailto:insight-developers-bounces at itk.org]
>> On Behalf Of
>>> Johnson, Hans J
>>> Sent: 28 March 2011 17:56
>>> To: Bradley Lowekamp
>>> Cc: ITK; Luis Ibanez
>>> Subject: Re: [Insight-developers] memcpy VS
>> iterators copy!!
>>> 
>>> Brad,
>>> 
>>> I support pulling in some form of
>>> 
>> std::tr1::is_pod<ImageType::InternalPixelType> paradigm.
>>> This was exactly what I was thinking of.
>> I"ve looked into
>>> this once, and I think it has several places
>> where this
>>> paradigm can help solve key performance
>> bottle necks where
>>> our need to support general pixel types (I.e.
>> We need to
>>> support pixel types of "elephant objects")
>> severely impact
>>> the most common scalar/POD performance.
>>> 
>>> 
>>> Hans
>>> 
>>> 
>>> From: Bradley Lowekamp <blowekamp at mail.nih.gov>
>>> Date: Mon, 28 Mar 2011 11:49:00 -0400
>>> To: Hans Johnson <hans-johnson at uiowa.edu>
>>> Cc: ITK <insight-developers at itk.org>, Luis Ibanez
>>> <luis.ibanez at kitware.com>
>>> Subject: Re: [Insight-developers] memcpy VS
>> iterators copy!!
>>> 
>>> 
>>> Hans,
>>> 
>>> On Mar 28, 2011, at 11:29 AM, Johnson, Hans J wrote:
>>> 
>>> 
>>>      Brad,
>>> 
>>> 
>> 
>>>      1.      There is only one time listed.
>> IS 16.8704s
>> 
>>> fast or slow?  What is the comparison time?
>>> 
>>> The comparison times were at the bottom of
>> the e-mail, from a
>>> prior message. The ImageSeriesReader is
>> performing a copy for
>>> the ImageReader to the output image on a
>> per-slice basis, by
>>> test images were unsigned chars of:
>>> 
>>> Testing series reader with 349 files.
>>> Image Size: [2048, 1536, 349]
>>> 
>>> # memcpy at for each slice
>>> Executed 10 times with mean 16.8704s
>>> 
>>> # current ITK iterator for loop with progress
>> Executed 10
>>> times with mean 24.4403s
>>> 
>>> # iterator for loop with out progress
>>> Executed  10 times with mean 20.7206s
>>> 
>>> # no for loop
>>> Executed with 10 times with mean 16.5306s
>>> 
>>> # gerrit patch version, which avoided the
>> copy by setting a buffer.
>>> Executed 10 times with mean 16.9262s
>>> 
>>> 
>>> 
>> 
>>>      2.      This should now be possible now
>> that we have
>> 
>>> partial template specialization.   How do we
>> identify POD
>>> that is suitable for memcopy algorithm vs.
>> the required iterators?
>>> 
>>> In an ideal world we could just use
>>> 
>> std::tr1::is_pod<ImageType::InternalPixelType>. But in our
>>> case we may need to introduce a NumericTraits
>> field for this.
>>> How does the image duplicator deal with it now?
>>> 
>>> 
>> 
>>>      3.      Where was this loop copy
>> iteration listed?
>> 
>>> The FillBuffer is also a prime candidate for a speed
>>> improvement. Especially the FillBuffer(0) paradigm for
>>> integer and floating point types.
>>> 
>>> 
>>> I see a class call ImageRegionDuplicator,
>> which would be
>>> similar to the ImageDuplicator in some respects.
>>> 
>>> The trick is also going to be determining the
>> maximum input
>>> stride and output stride, so that the longest
>> contiguous
>>> section of memory can be copied. I would be
>> willing to bet
>>> that even memcpying row by row would be
>> faster than the
>>> iterator loop. What I am unsure about, is if
>> multiple threads
>>> have any benefits.
>>> 
>>> When I am doing streaming there is a lot of
>> region extraction
>>> and compositing, which are really just buffer
>> coping. I think
>>> this could have a big impact on this type of
>> operation aswell.
>>> 
>>> 
>>> 
>>>      Hans
>>> 
>>> 
>>>      From: Bradley Lowekamp <blowekamp at mail.nih.gov>
>>>      Date: Mon, 28 Mar 2011 11:00:10 -0400
>>>      To: Bradley Lowekamp <blowekamp at mail.nih.gov>
>>>      Cc: ITK <insight-developers at itk.org>,
>> Luis Ibanez
>>> <luis.ibanez at kitware.com>
>>>      Subject: Re: [Insight-developers]
>> memcpy VS iterators copy!!
>>> 
>>> 
>>>      Hello,
>>> 
>>>      This is another performance improvement
>> that I think
>>> should me a MUST for v4! We need to replace
>> the for loop
>>> image iterator copies with an abstraction
>> that can use memcpy
>>> when possible!
>>> 
>>>      I have been wanting to run the
>> performance comparison
>>> for a while and this was the opportunity to
>> do so! I replaced
>>> the for loop in question here with a memcpy (
>> it still has
>>> bugs it it but it's doing the needed work
>> extremely fast! )
>>> 
>>>      # memcpy loop
>>>      Executed 10 times with mean 16.8704s
>>> 
>>>      I just replaced the for loop with a memcpy:
>>>          {
>>>          const IdentifierType numberOfPixelsInSlice =
>>> sliceRegionToRequest.GetNumberOfPixels();
>>>          const size_t numberOfComponents =
>>> output->GetNumberOfComponentsPerPixel();
>>>          const IdentifierType
>> numberOfPixelsUpToSlice =
>>> numberOfPixelsInSlice * i * numberOfComponents;
>>> 
>>>          typename  TOutputImage::InternalPixelType *
>>> outputSliceBuffer = outputBuffer +
>> numberOfPixelsUpToSlice;
>>>          typename  TOutputImage::InternalPixelType *
>>> inputBuffer =c
>> reader->GetOutput()->GetBufferPointer();
>>> 
>>>          memcpy( outputSliceBuffer,
>> inputBuffer, sizeof(
>>> typename TOutputImage::InternalPixelType ) *
>>> numberOfPixelsInSlice * numberOfComponents );
>>>          }
>>> 
>>>      Still for this case, no copy is still
>> better then memcpy.
>>> 
>>>      On Mar 28, 2011, at 10:32 AM, Lowekamp, Bradley
>>> (NIH/NLM/LHC) [C] wrote:
>>> 
>>> 
>>>              Hello Roger,
>>> 
>>>              Your benchmark program had a few more
>>> dependencies, the just ITK so I wrote my own
>> and attached it.
>>> I used a series of tiff I have, so I hope it would be
>>> comparable. I have also arrived at a similar
>> conclusion that
>>> the copy loop is expensive and should be
>> avoided. However, my
>>> benchmark does indicate that the progress
>> reporting is taking
>>> 50% of the additional execution time, which is rather
>>> different then your experiment.
>>> 
>>> 
>>>              Testing series reader with 349 files.
>>>              Image Size: [2048, 1536, 349]
>>> 
>>>              # current ITK
>>>              Executed 10 times with mean 24.4403s
>>> 
>>>              # progress commented out
>>>              Executed  10 times with mean 20.7206s
>>> 
>>>              # copy loop commented out
>>>              Executed with 10 times with
>> mean 16.5306s
>>> 
>>>              # gerrit patch version
>>>              Executed 10 times with mean 16.9262s
>>> 
>>> 
>> <itkImageSeriesReaderPerformance.cxx><ATT00001..htm>
>>> 
>>> 
>>> 
>> ========================================================
>>>      Bradley Lowekamp
>>>      Lockheed Martin Contractor for
>>>      Office of High Performance Computing
>> and Communications
>>>      National Library of Medicine
>>>      blowekamp at mail.nih.gov
>>> 
>>> 
>>> 
>>> 
>> _______________________________________________ Powered
>> 
>>> by www.kitware.com <http://www.kitware.com/>
>> <http://www.kitware.com <http://www.kitware.com/> >  Visit other
>> 
>>> Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>> 
>> <http://www.kitware.com/opensource/opensource.html>  Kitware
>>> offers ITK Training Courses, for more
>> information visit:
>>> http://kitware.com/products/protraining.html
>> Please keep
>>> messages on-topic and check the ITK FAQ at:
>>> http://www.itk.org/Wiki/ITK_FAQ Follow this link to
>>> subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-developers
>>> 
>>> 
>>> ________________________________
>>> 
>>>      Notice: This UI Health Care e-mail (including
>>> attachments) is covered by the Electronic
>> Communications
>>> Privacy Act, 18 U.S.C. 2510-2521, is
>> confidential and may be
>>> legally privileged.  If you are not the
>> intended recipient,
>>> you are hereby notified that any retention,
>> dissemination,
>>> distribution, or copying of this
>> communication is strictly
>>> prohibited.  Please reply to the sender that you have
>>> received the message in error, then delete
>> it.  Thank you.
>>> ________________________________
>>> 
>>> 
>>> 
>>> 
>> ========================================================
>>> 
>>> Bradley Lowekamp
>>> 
>>> Lockheed Martin Contractor for
>>> 
>>> Office of High Performance Computing and
>> Communications
>>> 
>>> National Library of Medicine
>>> 
>>> blowekamp at mail.nih.gov
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> Notice: This UI Health Care e-mail (including
>> attachments) is
>>> covered by the Electronic Communications
>> Privacy Act, 18
>>> U.S.C. 2510-2521, is confidential and may be legally
>>> privileged.  If you are not the intended
>> recipient, you are
>>> hereby notified that any retention, dissemination,
>>> distribution, or copying of this
>> communication is strictly
>>> prohibited.  Please reply to the sender that you have
>>> received the message in error, then delete
>> it.  Thank you.
>>> ________________________________
>>> 
>>> 
>> 
>> 
>> 
>>              _______________________________________________
>> 
>>              Powered by www.kitware.com <http://www.kitware.com/>
>> 
>>              Visit other Kitware open-source projects at
>>              http://www.kitware.com/opensource/opensource.html
>> 
>>              Kitware offers ITK Training Courses, for more
>> information visit:
>>              http://kitware.com/products/protraining.html
>> 
>>              Please keep messages on-topic and check the ITK FAQ at:
>>              http://www.itk.org/Wiki/ITK_FAQ
>> 
>>              Follow this link to subscribe/unsubscribe:
>>              http://www.itk.org/mailman/listinfo/insight-developers
>> 
>> 
>> 
>> 
>> ========================================================
>> 
>> Bradley Lowekamp
>> 
>> Lockheed Martin Contractor for
>> 
>> Office of High Performance Computing and Communications
>> 
>> National Library of Medicine
>> 
>> blowekamp at mail.nih.gov
>> 
>> 
>> 
>> 
> 



More information about the Insight-developers mailing list