[Insight-developers] Profiling of Examples/ImageLinearIteratorWithIndex.cxx

Karl Krissian karl at bwh.harvard.edu
Fri Jun 10 12:06:53 EDT 2005


Hi Jim,

I guess that if it was a scalar case,
there would be less difference between Get() and Value(),
(but I guess Get() would still be the most consuming method).
There would be no need to overload an operator =,
but the other timings should remain approximately the same.

The next step can be to do the same on neighborhood
iterators in the scalar case, comparing with an equivalent C code.

Karl

Miller, James V (Research) wrote:

>Karl,
>
>Thanks for formalizing these suggestions.  Get() vs Value() will have to
>be discussed in more detail but I don't see a reason why we couldn't 
>do the rest of the optimizations.
>
>Do you have a feeling for how the timings change if we are using 
>scalar images instead of images of RGBPixels?
>
>Jim
>
>
>
>-----Original Message-----
>From: Karl Krissian [mailto:karl at bwh.harvard.edu]
>Sent: Wednesday, June 08, 2005 9:08 PM
>To: Miller, James V (Research)
>Cc: ITK; itk users
>Subject: Re: [Insight-developers] Profiling of
>Examples/ImageLinearIteratorWithIndex.cxx
>
>
>
>Hi,
>
>Here are my suggestions for improving ITK speed at least on this example 
>and the code that uses the same
>classes (the same technique can also be used in other parts of the ITK 
>code):
>
>1. Replacing the operators= in itkRGBPixel.txx by the ones  given 
>itkRGBPixel2.txx
>2. Adding inline to the ++ and -- operators in the following classes:
>
>itkImageLinearConstIteratorWithIndex.h
>itkImageSliceConstIteratorWithIndex.h
>
>3. Adding the new classes for linear iterators without index along the line:
>
>itkImageLinearConstIterator.{h,txx}
>itkImageLinearIterator.{h,txx}
>
>4. For even more optimization, adding the corresponding
>OrientedLinearIterators, which inherit from the previous ones,
> where the Orientation is a template,
>allowing a template specialization for the case of the X axis 
>(Orientation=0) (replacing +=m_Jump by ++):
>
>itkImageOrientedLinearConstIterator.{h,txx}
>itkImageOrientedLinearIterator.{h,txx}
>
>The nice part of these changes is that it allows the itk code to be 
>almost as fast as
>its C equivalent for the ImageLinearIteratorWithIndex example.
>
>I profiled the example
>ImageLinearIteratorWithIndex_test.cxx
>
>with the following results:
>
> ProcessITK        11.12 sec  initial example
> ProcessITK1        6.70 sec  replacing Get() by Value()
> ProcessITK2        2.44 sec  using the redefined = operator for RGBPixel
> ProcessITK3        0.99 sec  adding inline to the ++ and -- operators
> ProcessITK4        0.71 sec  using ImageLinear(Const)Iterator
> ProcessITK5        0.57 sec  using ImageOrientedLinear(Const)Iterator
> ProcessPointers   0.49 sec  C style
>
>Best,
>
>Karl
>
>
>Miller, James V (Research) wrote:
>
>  
>
>>Karl,
>>
>>This is a nice experiment. We could certainly add an ImageLinearIterator
>>in constrast to the ImageLinearIteratorWithIndex much like we have an 
>>ImageRegionIterator and an ImageRegionWithIndex.  The intent of the 
>>WithIndex variants was to provide direct access to the index for those 
>>algorithms that needed it.
>>
>>The Get()/Set() vs Value() argument is one of supporting ImageAdaptors.
>>Algorithms that use Get()/Set() can support ImageAdaptors.  Algorithms that
>>use Value() cannot use ImageAdaptors.  This is a decision that must be 
>>made carefully.  For instance, the NeighborhoodIterators do not support
>>Value() for similar efficiency reasons.  Perhaps there could be another
>>mechanism to identify when ImageAdaptors are not be used and Set/Get 
>>can use a faster path to the data.  However, part of the speed of Value()
>>is that allows you to write algorithms in a manner that avoids the creation of 
>>temporaries.  Something that the Set/Get approach sometimes cannot.
>>
>>We ran a number of similar experiments when we first developed the 
>>ImageRegionIterator and the ImageRegionIteratorWithIndex iterators. 
>>I don't think we ever tried to time the LinearIterators.  However,
>>one thing we found with timing the iterators that results would change
>>drastically depending on which iterator you used first.  For instance, 
>>for a simple test that traverses a volume and sets/gets every pixel. 
>>IteratorA may take timeA to traverse the volume and IteratorB may
>>take timeB to traverse the volume with timeA >> timeB.  When the 
>>order of the experiment was reversed, IteratorB took timeB2 and IteratorA
>>took timeA2 wit timeB2 >> timeA2. This inconsistency made it difficult
>>to come to any real conclusions of the relative timings. Relating this
>>to your experiment, the magnitude of the differences may not really be
>>as extreme as the numbers below.  But I do believe much of the timing
>>differences are real.
>>
>>Finally, while using gcc is a real world scenario, historically it has
>>not had the best optimizer.  I have had cases where gcc compiled code
>>had severe bottlenecks compared to DevStudio .Net compiled code.
>>
>>Since iterators are such an integral part of ITK, we should continue
>>to experiment with methods to improve performance. Hopefully, there 
>>are opportunities to improve performance while still supporting 
>>ImageAdaptors (for backward compatibility).  But we should also
>>look for mechanisms and opportunities to improve algorithm performance
>>where we can reasonably identify ImageAdaptors are not being used.
>>
>>Jim
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: insight-developers-bounces+millerjv=crd.ge.com at itk.org
>>[mailto:insight-developers-bounces+millerjv=crd.ge.com at itk.org]On Behalf
>>Of Karl Krissian
>>Sent: Thursday, June 02, 2005 6:41 PM
>>To: ITK; itk users
>>Subject: [Insight-developers] Profiling
>>ofExamples/ImageLinearIteratorWithIndex.cxx
>>
>>
>>
>>Hi,
>>
>>I decided to compare the processing time of some simple itk iterator
>>example with
>>its equivalent in C.
>>
>>I think the result can be interesting to ITK community.
>>I used a ITK version on linux (mobile pentium centrino 1.7GHz)
>>compiled with profiling and optimization: -pg -O3 and the profiler is
>>gprof (GNU).
>>
>>I added the following classes for the experiment:
>>
>>Code/Common/itkImageLinearIteratorWithIndex2.h
>>Code/Common/itkImageLinearIteratorWithIndex2.txx
>>
>>Code/Common/itkImageLinearConstIteratorWithIndex2.h
>>Code/Common/itkImageLinearConstIteratorWithIndex2.txx
>>
>>and changed the example:
>>Examples/Iterators/ImageLinearIteratorWithIndex.cxx
>>
>>The code is attached to this email.
>>
>>The new ImageLinearIteratorWithIndex2 could also be called
>>ImageLinearIteratorWithoutIndex
>>because it does not update the index during the ++ and -- operations
>>which speed up
>>the evolution.
>>
>>The ImageLinearIteratorWithIndex example does basically a flip of an RGB
>>image in the X direction.
>>The idea is to compare the time of this operation using ITK with the
>>time of the equivalent
>>operation using standard C programming (directly accessing pointers to
>>the data).
>>
>>I created different procedure with some slight changes to compare their
>>speed:
>>
>>1. ProcessITK is the original code
>>2. ProcessITK1 replaces inputIt.Get() by inputIt.Value()
>>3. ProcessITK2 replaces outputIt.Set( inputIt.Value() )  by
>>outputIt.Value().Set(inputIt.Value().GetRed(),inputIt.Value().GetGreen(),inputIt.Value().GetBlue())
>>4. ProcessITK3 is like ProcessITK2 but using the new Iterator
>>5. ProcessITK4 is like ProcessITK3 but replaces the ++ and -- operations
>>but IncPos() and DecPos() which are actual ++ and -- on the pointers
>>6. ProcessPointer does the same operation (without ITK generality) in a
>>C style.
>>
>>The results are the following:
>>
>>1.   17.51 sec
>>2.     9.94 sec
>>3.     3.54 sec
>>4.     1.64 sec
>>5.     0.81 sec
>>6.     0.62 sec
>>
>>The details are in the file 'profile' but in summary:
>>
>>1 --> 2 : we avoid creating and deleting an RGB value, which saves
>>approx. 6 sec (FixedArray constructor and destructor)
>>2 --> 3 : we avoid the operator= of FixedArray (loops over the number of
>>elements) and we save 6.74 sec
>>3 --> 4: not updating the index in the iterator decreases the time of ++
>>and -- operators, GoToEndOfLine() and NextLine() are also faster
>>4 --> 5: using ++ and -- instead of += m_Jump and -= m_Jump gains 1.1 sec
>>5 --> 6: there is still some overhead in the iterator, but a small
>>difference.
>>
>>Surprisingly, the procedure GoToBegin() takes 0.05 sec and is only
>>called twice,
>>and most of its time is spent calling
>>itk::ImageRegion<3u>::GetNumberOfPixels() const,
>>which just multiplies the different dimensions and put the result in a
>>unsigned long (is it a bug of the processor or of the profiler??...).
>>
>>
>>Anyway, I think this experiment can be instructive, and it shows that
>>C++ can be as fast as C,
>>but with a lot of care.
>>Also some of the generality of itk is lost (like cast from one type to
>>another), but for specific filters it is probably be worth.
>>
>>Any comment is welcome,
>>
>>
>>Karl
>>
>>
>>
>> 
>>
>>    
>>
>
>
>  
>


-- 
Karl Krissian, PhD
Instructor in Radiology, Harvard Medical School
Laboratory of Mathematics in Imaging, Brigham and Women's Hospital
Tel:617-525-6232, Fax:617-525-6220 



More information about the Insight-developers mailing list