[Insight-users] RE: [Insight-developers] Profiling of Examples/ImageLinearIteratorWithIndex.cxx

Miller, James V (Research) millerjv at crd.ge.com
Thu Jun 9 08:23:30 EDT 2005


Karl,

Thanks for formalizing these suggestions.  Get() vs Value() will have to
be discussed in more detail but I don't see a reason why we couldn't 
do the rest of the optimizations.

Do you have a feeling for how the timings change if we are using 
scalar images instead of images of RGBPixels?

Jim



-----Original Message-----
From: Karl Krissian [mailto:karl at bwh.harvard.edu]
Sent: Wednesday, June 08, 2005 9:08 PM
To: Miller, James V (Research)
Cc: ITK; itk users
Subject: Re: [Insight-developers] Profiling of
Examples/ImageLinearIteratorWithIndex.cxx



Hi,

Here are my suggestions for improving ITK speed at least on this example 
and the code that uses the same
classes (the same technique can also be used in other parts of the ITK 
code):

1. Replacing the operators= in itkRGBPixel.txx by the ones  given 
itkRGBPixel2.txx
2. Adding inline to the ++ and -- operators in the following classes:

itkImageLinearConstIteratorWithIndex.h
itkImageSliceConstIteratorWithIndex.h

3. Adding the new classes for linear iterators without index along the line:

itkImageLinearConstIterator.{h,txx}
itkImageLinearIterator.{h,txx}

4. For even more optimization, adding the corresponding
OrientedLinearIterators, which inherit from the previous ones,
 where the Orientation is a template,
allowing a template specialization for the case of the X axis 
(Orientation=0) (replacing +=m_Jump by ++):

itkImageOrientedLinearConstIterator.{h,txx}
itkImageOrientedLinearIterator.{h,txx}

The nice part of these changes is that it allows the itk code to be 
almost as fast as
its C equivalent for the ImageLinearIteratorWithIndex example.

I profiled the example
ImageLinearIteratorWithIndex_test.cxx

with the following results:

 ProcessITK        11.12 sec  initial example
 ProcessITK1        6.70 sec  replacing Get() by Value()
 ProcessITK2        2.44 sec  using the redefined = operator for RGBPixel
 ProcessITK3        0.99 sec  adding inline to the ++ and -- operators
 ProcessITK4        0.71 sec  using ImageLinear(Const)Iterator
 ProcessITK5        0.57 sec  using ImageOrientedLinear(Const)Iterator
 ProcessPointers   0.49 sec  C style

Best,

Karl


Miller, James V (Research) wrote:

>Karl,
>
>This is a nice experiment. We could certainly add an ImageLinearIterator
>in constrast to the ImageLinearIteratorWithIndex much like we have an 
>ImageRegionIterator and an ImageRegionWithIndex.  The intent of the 
>WithIndex variants was to provide direct access to the index for those 
>algorithms that needed it.
>
>The Get()/Set() vs Value() argument is one of supporting ImageAdaptors.
>Algorithms that use Get()/Set() can support ImageAdaptors.  Algorithms that
>use Value() cannot use ImageAdaptors.  This is a decision that must be 
>made carefully.  For instance, the NeighborhoodIterators do not support
>Value() for similar efficiency reasons.  Perhaps there could be another
>mechanism to identify when ImageAdaptors are not be used and Set/Get 
>can use a faster path to the data.  However, part of the speed of Value()
>is that allows you to write algorithms in a manner that avoids the creation of 
>temporaries.  Something that the Set/Get approach sometimes cannot.
>
>We ran a number of similar experiments when we first developed the 
>ImageRegionIterator and the ImageRegionIteratorWithIndex iterators. 
>I don't think we ever tried to time the LinearIterators.  However,
>one thing we found with timing the iterators that results would change
>drastically depending on which iterator you used first.  For instance, 
>for a simple test that traverses a volume and sets/gets every pixel. 
>IteratorA may take timeA to traverse the volume and IteratorB may
>take timeB to traverse the volume with timeA >> timeB.  When the 
>order of the experiment was reversed, IteratorB took timeB2 and IteratorA
>took timeA2 wit timeB2 >> timeA2. This inconsistency made it difficult
>to come to any real conclusions of the relative timings. Relating this
>to your experiment, the magnitude of the differences may not really be
>as extreme as the numbers below.  But I do believe much of the timing
>differences are real.
>
>Finally, while using gcc is a real world scenario, historically it has
>not had the best optimizer.  I have had cases where gcc compiled code
>had severe bottlenecks compared to DevStudio .Net compiled code.
>
>Since iterators are such an integral part of ITK, we should continue
>to experiment with methods to improve performance. Hopefully, there 
>are opportunities to improve performance while still supporting 
>ImageAdaptors (for backward compatibility).  But we should also
>look for mechanisms and opportunities to improve algorithm performance
>where we can reasonably identify ImageAdaptors are not being used.
>
>Jim
>
>
>
>
>
>-----Original Message-----
>From: insight-developers-bounces+millerjv=crd.ge.com at itk.org
>[mailto:insight-developers-bounces+millerjv=crd.ge.com at itk.org]On Behalf
>Of Karl Krissian
>Sent: Thursday, June 02, 2005 6:41 PM
>To: ITK; itk users
>Subject: [Insight-developers] Profiling
>ofExamples/ImageLinearIteratorWithIndex.cxx
>
>
>
>Hi,
>
>I decided to compare the processing time of some simple itk iterator
>example with
>its equivalent in C.
>
>I think the result can be interesting to ITK community.
>I used a ITK version on linux (mobile pentium centrino 1.7GHz)
>compiled with profiling and optimization: -pg -O3 and the profiler is
>gprof (GNU).
>
>I added the following classes for the experiment:
>
>Code/Common/itkImageLinearIteratorWithIndex2.h
>Code/Common/itkImageLinearIteratorWithIndex2.txx
>
>Code/Common/itkImageLinearConstIteratorWithIndex2.h
>Code/Common/itkImageLinearConstIteratorWithIndex2.txx
>
>and changed the example:
>Examples/Iterators/ImageLinearIteratorWithIndex.cxx
>
>The code is attached to this email.
>
>The new ImageLinearIteratorWithIndex2 could also be called
>ImageLinearIteratorWithoutIndex
>because it does not update the index during the ++ and -- operations
>which speed up
>the evolution.
>
>The ImageLinearIteratorWithIndex example does basically a flip of an RGB
>image in the X direction.
>The idea is to compare the time of this operation using ITK with the
>time of the equivalent
>operation using standard C programming (directly accessing pointers to
>the data).
>
>I created different procedure with some slight changes to compare their
>speed:
>
>1. ProcessITK is the original code
>2. ProcessITK1 replaces inputIt.Get() by inputIt.Value()
>3. ProcessITK2 replaces outputIt.Set( inputIt.Value() )  by
>outputIt.Value().Set(inputIt.Value().GetRed(),inputIt.Value().GetGreen(),inputIt.Value().GetBlue())
>4. ProcessITK3 is like ProcessITK2 but using the new Iterator
>5. ProcessITK4 is like ProcessITK3 but replaces the ++ and -- operations
>but IncPos() and DecPos() which are actual ++ and -- on the pointers
>6. ProcessPointer does the same operation (without ITK generality) in a
>C style.
>
>The results are the following:
>
>1.   17.51 sec
>2.     9.94 sec
>3.     3.54 sec
>4.     1.64 sec
>5.     0.81 sec
>6.     0.62 sec
>
>The details are in the file 'profile' but in summary:
>
>1 --> 2 : we avoid creating and deleting an RGB value, which saves
>approx. 6 sec (FixedArray constructor and destructor)
>2 --> 3 : we avoid the operator= of FixedArray (loops over the number of
>elements) and we save 6.74 sec
>3 --> 4: not updating the index in the iterator decreases the time of ++
>and -- operators, GoToEndOfLine() and NextLine() are also faster
>4 --> 5: using ++ and -- instead of += m_Jump and -= m_Jump gains 1.1 sec
>5 --> 6: there is still some overhead in the iterator, but a small
>difference.
>
>Surprisingly, the procedure GoToBegin() takes 0.05 sec and is only
>called twice,
>and most of its time is spent calling
>itk::ImageRegion<3u>::GetNumberOfPixels() const,
>which just multiplies the different dimensions and put the result in a
>unsigned long (is it a bug of the processor or of the profiler??...).
>
>
>Anyway, I think this experiment can be instructive, and it shows that
>C++ can be as fast as C,
>but with a lot of care.
>Also some of the generality of itk is lost (like cast from one type to
>another), but for specific filters it is probably be worth.
>
>Any comment is welcome,
>
>
>Karl
>
>
>
>  
>



More information about the Insight-users mailing list