[Insight-users] Release 2.0.0 slower than 1.6.0 and 1.8.1-Cygwin1.5.13, GCC 3.3.3-3

Miller, James V (Research) millerjv at crd.ge.com
Mon Mar 7 17:13:44 EST 2005


I checked in a change to itkImageBase.h and itkImage.h to inline the
the computations in the Transform* methods in a manner that should 
still work for GCC 3.4

They are available in the CVS version of ITK.  I'll ask Luis to patch
the 2.0 release.

Thanks to everyone who helped track down this performance issue.  With 
the fix, the timings for the test routine went from 2.0 seconds to 0.07 
seconds (ImageMomentsCalculator).

Jim



-----Original Message-----
From: insight-users-bounces at itk.org
[mailto:insight-users-bounces at itk.org]On Behalf Of Miller, James V
(Research)
Sent: Monday, March 07, 2005 3:32 PM
To: Frederic Perez; insight-users at itk.org
Cc: Stephen R. Aylward; Karthik.Krishnan at kitware.com
Subject: RE: [Insight-users] Release 2.0.0 slower than 1.6.0 and
1.8.1-Cygwin1.5.13, GCC 3.3.3-3


While making spacing and origin protected instead of private improves
the performance, I don't think it addresses the GCC 3.4 issue of 
subclasses accessing data from a templated superclass.

ITK 1.8.1 (Version:   $Revision: 1.119 $):
 point[i] = static_cast<TCoordRep>( m_Spacing[i] *
   static_cast<double>( index[i] ) + m_Origin[i] );

ITK 2.0.0 (Version:   $Revision: 1.122 $):
 point[i] = static_cast<TCoordRep>( this->GetSpacing()[i] *
   static_cast<double>( index[i] ) + this->GetOrigin()[i] );

Instead of implementation in 1.119, the code should probably be
 point[i] = static_cast<TCoordRep>( this->m_Spacing[i] *
   static_cast<double>( index[i] ) + this->m_Origin[i] );




-----Original Message-----
From: Frederic Perez [mailto:fredericpcx at gmail.com]
Sent: Monday, March 07, 2005 2:39 PM
To: insight-users at itk.org
Cc: Miller, James V (Research); Stephen R. Aylward;
Karthik.Krishnan at kitware.com
Subject: Re: [Insight-users] Release 2.0.0 slower than 1.6.0 and 1.8.1
-Cygwin1.5.13, GCC 3.3.3-3


Hello everyone,

thank you very much for your responses. The posts by James V. Miller
and Karthik
Krishnan convinced me that something strange is happening for the Cygwin 1.5.13,
GCC 3.3.3-3 combination. (By the way, my own output of "gcc -v" matches 
Karthik's.)

I've got the suspicion that the itkGetConstReferenceMacro is not as fast as 
expected---the sequence of actions driving me to that conclusion is given below.
The macro is defined in Common/itkMacro.h as follows:

#define itkGetConstReferenceMacro(name,type) \
  virtual const type & Get##name () const \
  { \
    itkDebugMacro("returning " << #name " of " << this->m_##name ); \
    return this->m_##name; \
  }

I would say that since it defines a virtual method is up to the
compiler to inline
it or not, and in my particular case it is not inlined. Since this
method is called
a lot of times, the executions slow down significatively.

Perhaps you know a method to force the compiler to inline the macro---if 
I'm right about that.

Thanks again for your consideration,

Frederic Perez

--

Reaching the macro (and proposed workaround):

First I built profiled versions of ITK 1.8.1 and 2.0.0, to later link
the testing
program against them both. Unfortunately I got linking errors. Since I didn't 
feel like fixing that, I tried profiling the 
itk::ImageMomentsCalculator<ImageType>::Update method directly. Thus,
I edited the Code/Algorithms/itkImageMomentsCalculator.txx adding some
itk::TimeProbe objects to narrow down the problem. (I haven't seen 
anything in that code related to random number seeding. Maybe I 
misunderstood Stephen R. Aylward post.) The test program was linked to 
**debug** relases for versions 1.8.1 and 2.0.0 of the Insight toolkit. Their 
corresponding executions are summarized as follows:

ITK 1.8.1:
Time region iterator loop: 39.969 s
 Time region iterator inner loop 1:    6.903 s
 Time region iterator transform calls: 5.942 s
 Time region iterator inner loop 2:    7.355 s
GetTotalMass = 4.33882e+07 [ Time: 39.969 seconds ]

ITK 2.0.0:
Time region iterator loop: 75.25 s
 Time region iterator inner loop 1:    7.381 s
 Time region iterator transform calls: 40.031 s
 Time region iterator inner loop 2:    7.716 s
GetTotalMass = 4.33882e+07 [ Time: 75.25 seconds ]

This shows that in debug mode there is still a decrease in efficiency
(recall: Cygwin 1.5.13, GCC 3.3.3-3), and that it is due to the line 
exhibited below:

template<class TImage>
void ImageMomentsCalculator<TImage>::Compute()
{
 <<snipped>>
 m_Image->TransformIndexToPhysicalPoint(indexPosition, physicalPosition);  
 <<snipped>>
}

The core of the Image's TransformIndexToPhysicalPoint inspector is slightly
different in the two versions:

ITK 1.8.1 (Version:   $Revision: 1.119 $):
 point[i] = static_cast<TCoordRep>( m_Spacing[i] *
   static_cast<double>( index[i] ) + m_Origin[i] );

ITK 2.0.0 (Version:   $Revision: 1.122 $):
 point[i] = static_cast<TCoordRep>( this->GetSpacing()[i] *
   static_cast<double>( index[i] ) + this->GetOrigin()[i] );

This particular change occurred between revisions 1.121 and 1.122, due 
to GCC 3.4 considerations. Thus, we have edited the file itkImageBase.h to 
change the spacing and origin attributes to be protected instead of private, 
and reverted the point[i] computation to the 1.119 version. With those 
changes, which I admit are ugly, after rebuilding we obtain:

ITK 2.0.0 MODIFIED:
Time region iterator loop: 41.25 s
 Time region iterator inner loop 1:    7.275 s
 Time region iterator transform calls: 6.28 s
 Time region iterator inner loop 2:    7.581 s
GetTotalMass = 4.33882e+07 [ Time: 41.25 seconds ]

These values match the results for the 1.8.1 build.
_______________________________________________
Insight-users mailing list
Insight-users at itk.org
http://www.itk.org/mailman/listinfo/insight-users


More information about the Insight-users mailing list