[Insight-developers] Profiling of
Examples/ImageLinearIteratorWithIndex.cxx
Karl Krissian
karl at bwh.harvard.edu
Fri Jun 10 12:06:53 EDT 2005
Hi Jim,
I guess that if it was a scalar case,
there would be less difference between Get() and Value(),
(but I guess Get() would still be the most consuming method).
There would be no need to overload an operator =,
but the other timings should remain approximately the same.
The next step can be to do the same on neighborhood
iterators in the scalar case, comparing with an equivalent C code.
Karl
Miller, James V (Research) wrote:
>Karl,
>
>Thanks for formalizing these suggestions. Get() vs Value() will have to
>be discussed in more detail but I don't see a reason why we couldn't
>do the rest of the optimizations.
>
>Do you have a feeling for how the timings change if we are using
>scalar images instead of images of RGBPixels?
>
>Jim
>
>
>
>-----Original Message-----
>From: Karl Krissian [mailto:karl at bwh.harvard.edu]
>Sent: Wednesday, June 08, 2005 9:08 PM
>To: Miller, James V (Research)
>Cc: ITK; itk users
>Subject: Re: [Insight-developers] Profiling of
>Examples/ImageLinearIteratorWithIndex.cxx
>
>
>
>Hi,
>
>Here are my suggestions for improving ITK speed at least on this example
>and the code that uses the same
>classes (the same technique can also be used in other parts of the ITK
>code):
>
>1. Replacing the operators= in itkRGBPixel.txx by the ones given
>itkRGBPixel2.txx
>2. Adding inline to the ++ and -- operators in the following classes:
>
>itkImageLinearConstIteratorWithIndex.h
>itkImageSliceConstIteratorWithIndex.h
>
>3. Adding the new classes for linear iterators without index along the line:
>
>itkImageLinearConstIterator.{h,txx}
>itkImageLinearIterator.{h,txx}
>
>4. For even more optimization, adding the corresponding
>OrientedLinearIterators, which inherit from the previous ones,
> where the Orientation is a template,
>allowing a template specialization for the case of the X axis
>(Orientation=0) (replacing +=m_Jump by ++):
>
>itkImageOrientedLinearConstIterator.{h,txx}
>itkImageOrientedLinearIterator.{h,txx}
>
>The nice part of these changes is that it allows the itk code to be
>almost as fast as
>its C equivalent for the ImageLinearIteratorWithIndex example.
>
>I profiled the example
>ImageLinearIteratorWithIndex_test.cxx
>
>with the following results:
>
> ProcessITK 11.12 sec initial example
> ProcessITK1 6.70 sec replacing Get() by Value()
> ProcessITK2 2.44 sec using the redefined = operator for RGBPixel
> ProcessITK3 0.99 sec adding inline to the ++ and -- operators
> ProcessITK4 0.71 sec using ImageLinear(Const)Iterator
> ProcessITK5 0.57 sec using ImageOrientedLinear(Const)Iterator
> ProcessPointers 0.49 sec C style
>
>Best,
>
>Karl
>
>
>Miller, James V (Research) wrote:
>
>
>
>>Karl,
>>
>>This is a nice experiment. We could certainly add an ImageLinearIterator
>>in constrast to the ImageLinearIteratorWithIndex much like we have an
>>ImageRegionIterator and an ImageRegionWithIndex. The intent of the
>>WithIndex variants was to provide direct access to the index for those
>>algorithms that needed it.
>>
>>The Get()/Set() vs Value() argument is one of supporting ImageAdaptors.
>>Algorithms that use Get()/Set() can support ImageAdaptors. Algorithms that
>>use Value() cannot use ImageAdaptors. This is a decision that must be
>>made carefully. For instance, the NeighborhoodIterators do not support
>>Value() for similar efficiency reasons. Perhaps there could be another
>>mechanism to identify when ImageAdaptors are not be used and Set/Get
>>can use a faster path to the data. However, part of the speed of Value()
>>is that allows you to write algorithms in a manner that avoids the creation of
>>temporaries. Something that the Set/Get approach sometimes cannot.
>>
>>We ran a number of similar experiments when we first developed the
>>ImageRegionIterator and the ImageRegionIteratorWithIndex iterators.
>>I don't think we ever tried to time the LinearIterators. However,
>>one thing we found with timing the iterators that results would change
>>drastically depending on which iterator you used first. For instance,
>>for a simple test that traverses a volume and sets/gets every pixel.
>>IteratorA may take timeA to traverse the volume and IteratorB may
>>take timeB to traverse the volume with timeA >> timeB. When the
>>order of the experiment was reversed, IteratorB took timeB2 and IteratorA
>>took timeA2 wit timeB2 >> timeA2. This inconsistency made it difficult
>>to come to any real conclusions of the relative timings. Relating this
>>to your experiment, the magnitude of the differences may not really be
>>as extreme as the numbers below. But I do believe much of the timing
>>differences are real.
>>
>>Finally, while using gcc is a real world scenario, historically it has
>>not had the best optimizer. I have had cases where gcc compiled code
>>had severe bottlenecks compared to DevStudio .Net compiled code.
>>
>>Since iterators are such an integral part of ITK, we should continue
>>to experiment with methods to improve performance. Hopefully, there
>>are opportunities to improve performance while still supporting
>>ImageAdaptors (for backward compatibility). But we should also
>>look for mechanisms and opportunities to improve algorithm performance
>>where we can reasonably identify ImageAdaptors are not being used.
>>
>>Jim
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: insight-developers-bounces+millerjv=crd.ge.com at itk.org
>>[mailto:insight-developers-bounces+millerjv=crd.ge.com at itk.org]On Behalf
>>Of Karl Krissian
>>Sent: Thursday, June 02, 2005 6:41 PM
>>To: ITK; itk users
>>Subject: [Insight-developers] Profiling
>>ofExamples/ImageLinearIteratorWithIndex.cxx
>>
>>
>>
>>Hi,
>>
>>I decided to compare the processing time of some simple itk iterator
>>example with
>>its equivalent in C.
>>
>>I think the result can be interesting to ITK community.
>>I used a ITK version on linux (mobile pentium centrino 1.7GHz)
>>compiled with profiling and optimization: -pg -O3 and the profiler is
>>gprof (GNU).
>>
>>I added the following classes for the experiment:
>>
>>Code/Common/itkImageLinearIteratorWithIndex2.h
>>Code/Common/itkImageLinearIteratorWithIndex2.txx
>>
>>Code/Common/itkImageLinearConstIteratorWithIndex2.h
>>Code/Common/itkImageLinearConstIteratorWithIndex2.txx
>>
>>and changed the example:
>>Examples/Iterators/ImageLinearIteratorWithIndex.cxx
>>
>>The code is attached to this email.
>>
>>The new ImageLinearIteratorWithIndex2 could also be called
>>ImageLinearIteratorWithoutIndex
>>because it does not update the index during the ++ and -- operations
>>which speed up
>>the evolution.
>>
>>The ImageLinearIteratorWithIndex example does basically a flip of an RGB
>>image in the X direction.
>>The idea is to compare the time of this operation using ITK with the
>>time of the equivalent
>>operation using standard C programming (directly accessing pointers to
>>the data).
>>
>>I created different procedure with some slight changes to compare their
>>speed:
>>
>>1. ProcessITK is the original code
>>2. ProcessITK1 replaces inputIt.Get() by inputIt.Value()
>>3. ProcessITK2 replaces outputIt.Set( inputIt.Value() ) by
>>outputIt.Value().Set(inputIt.Value().GetRed(),inputIt.Value().GetGreen(),inputIt.Value().GetBlue())
>>4. ProcessITK3 is like ProcessITK2 but using the new Iterator
>>5. ProcessITK4 is like ProcessITK3 but replaces the ++ and -- operations
>>but IncPos() and DecPos() which are actual ++ and -- on the pointers
>>6. ProcessPointer does the same operation (without ITK generality) in a
>>C style.
>>
>>The results are the following:
>>
>>1. 17.51 sec
>>2. 9.94 sec
>>3. 3.54 sec
>>4. 1.64 sec
>>5. 0.81 sec
>>6. 0.62 sec
>>
>>The details are in the file 'profile' but in summary:
>>
>>1 --> 2 : we avoid creating and deleting an RGB value, which saves
>>approx. 6 sec (FixedArray constructor and destructor)
>>2 --> 3 : we avoid the operator= of FixedArray (loops over the number of
>>elements) and we save 6.74 sec
>>3 --> 4: not updating the index in the iterator decreases the time of ++
>>and -- operators, GoToEndOfLine() and NextLine() are also faster
>>4 --> 5: using ++ and -- instead of += m_Jump and -= m_Jump gains 1.1 sec
>>5 --> 6: there is still some overhead in the iterator, but a small
>>difference.
>>
>>Surprisingly, the procedure GoToBegin() takes 0.05 sec and is only
>>called twice,
>>and most of its time is spent calling
>>itk::ImageRegion<3u>::GetNumberOfPixels() const,
>>which just multiplies the different dimensions and put the result in a
>>unsigned long (is it a bug of the processor or of the profiler??...).
>>
>>
>>Anyway, I think this experiment can be instructive, and it shows that
>>C++ can be as fast as C,
>>but with a lot of care.
>>Also some of the generality of itk is lost (like cast from one type to
>>another), but for specific filters it is probably be worth.
>>
>>Any comment is welcome,
>>
>>
>>Karl
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
--
Karl Krissian, PhD
Instructor in Radiology, Harvard Medical School
Laboratory of Mathematics in Imaging, Brigham and Women's Hospital
Tel:617-525-6232, Fax:617-525-6220
More information about the Insight-developers
mailing list