[Insight-developers] vnl updates : Rounding Test : Performance and Correctness

Wed Dec 31 09:44:00 EST 2008

Hi Tom,

Thanks for all this additional information.

This is indeed very useful.

Please see comments below.

     Thanks

         Luis

-----------------------
Tom Vercauteren wrote:
> Hey,
> 
> First of all, I apologize for not giving enough background on this
> issue. The fact is that I had already spent a lot of time with this on
> the vxl side. See e.g.
> http://sourceforge.net/mailarchive/forum.php?thread_name=28392e8b0811190226i20ab46a5m2a72a5d9518ce557%40mail.gmail.com&forum_name=vxl-maintainers
> 

    Thanks for pointing out to this discussion in the VNL list.

    Have the issue of SSE2 been resolved ?

> 
> 1) Let me first discuss the performance side.
> 
> According to my timings shown here
>   http://sourceforge.net/mailarchive/forum.php?thread_name=28392e8b0811270614o7f042cf7o5828d2b31c52c88b%40mail.gmail.com&forum_name=vxl-maintainers
> in release mode, the performance improvement is rather a factor 3 or 4
> between an optimized version and the basic if version:
>   time_ref / time_sse : 4.37907
>   time_ref / time_asm : 3.86511
> 
> This is important for image registration as this may lead to a 10%
> improvement on the entire registration.
>

     An improvement of 10% in registration time sound modest
     at the price of having to deal with non-portable code.

     There are many other places in an image registration
     problem where we could squeeze 10% out. From number of
     samples, to number of iterations, to step lengths (or
     its equivalent) in the optimizer...

     In the context of medical imaging, the capability of
     getting the same answer on every machine is a lot more
     important than saving 10% on the execution of a registration.

> Note that timing this is not trivial since the compiler may optimize
> many things out if they are not used in the test program. I provided a
> basic timing test here:
>   http://sourceforge.net/mailarchive/attachment.php?list_name=vxl-maintainers&message_id=28392e8b0811270614o7f042cf7o5828d2b31c52c88b%40mail.gmail.com&counter=1
> 
>

      We have numbers today from the ITK dashboard

http://www.cdash.org/CDash/testSummary.php?project=2&name=itkVNLRoundProfileTest1&date=2008-12-31

      Which also illustrate the platform dependency
      that you pointed out.

> 2) About correctness.
> 
> The implementation of vnl_math_round shipped with ITK is already
> plateform dependent. Here is a snippet that contains only the double
> version:
>   #if VXL_C_MATH_HAS_LROUND
>   inline int vnl_math_rnd(double x) { return static_cast<int>(lround(x)); }
>   #elif defined (VCL_VC) && !defined(__GCCXML__) && !defined(_WIN64)
>    __inline int
>    vnl_math_rnd (double flt)
>    { int intgr;
>     _asm
>     { fld flt
>      fistp intgr
>      } ;
>     return intgr ;
>    }
>   #else
>   inline int vnl_math_rnd(double x) { return (x>=0.0)?(int)(x +
> 0.5):(int)(x - 0.5); }
>   #endif
> 
> You can see that in the #elif block, the operation for half integers
> will be round to nearest even. In the other cases, it will be round
> away from zero.

     This is a problem.
     We shouldn't have this platform dependency.

> 
> The problem with this implementation is that on many plateforms
> including mine, lround is actually slower than the basic if
> implementation ( time_ref / time_lround : 0.657615 ). This is why I
> decided to reimplement vnl_round. Since I was focused on performance
> and vnl_math_rnd did not enforce any specific rounding scheme for
> half-integer, I simply chose to implement the fastest version.
> 

      You bring a good point in your discussion on the VNL list.
      We should deal not only with 'round' but also with 'ceil'
      and 'floor'.

> That being said, it is possible to implement a fast rounding scheme
> that rounds half-integer away from zero. The limitations are:
> - Only works within INT_MIN/2 – 1 and INT_MAX/2 + 1
> - Assumes that the FPU rounding mode has not been changed from the
> default one (same as the asm version above)
> 
> It could thus be possible to have two version of vnl_math_round (one
> that rounds away from zero and one that doesn't care about
> hal-integers) or we could also modify vnl_math_round to round
> half-integers away from zero.
> 
> What will not be trivial, is to have a version of vnl_math_round that
> rounds half-integers to the nearest even integer on all platforms.
> 

      I'm not sure that we need this anti-bias property,
      at least not in the context of image registration.

      I can certainly see its importance in finance...   :-)

      but for the purpose of mapping physical coordinates to
      image grid IJK numbers, the fraction of point that will
      fall directly on .5 locations will already be very small.
      I found unlikely that this small set of points may afect
      the outcome of a registration.

> 
> 3) Which half-integer rounding mode should be preferred?
> 
> That's a tough question.
> * As mentioned by Steve, for statistical analysis, round to even might
> be preferred.
> * Round to even might not be very intuitive at first sight
> * If you use nearest-neighbor interpolation, then you might need round
> up or round down (but not round away from zero) to avoid weird things
> for translations of half integers
> * In most cases, you don't really care :-)
> 

       We could still provide several implementations in ITK,
       but each one properly named.

       the rounding strategy shouldn't be hidden in the details
       of the implementation, and the compiler options of the day.
       It should be clearly marked in the name of the rounding
       function.

       We should have something very explicit, such as:

               itk::Math::Round()
               itk::Math::RoundNonBiased()

        In this way, when writing ITK code we can select the
        rounding strategy that is appropriate for the context.

> 
> 4) VNL implementation or ITK implementation?
> 
> * VNL seems to be the good place for this since it is already there
> and there is currently no such basic operations in ITK
> * It is easier for the user if only one version is available
> * An ITK version would however be easier to maintain for ITK developpers
> * The all set of rounding functions takes 200 lines in vnl_math.h and
> needs some cmake magic.
> 
> 

         I agree that VNL is in principle the right place
         to put these methods. However, given the different
         domain emphasis of ITK and VNL, the requirements
         for these methods is somehow different.

         We have been bitten in the past by the rounding
         issues in VNL, and it is not worth the trouble.

         Note that we already have an ITK-variant of the
         vnl_math_round method in the itkIndex.h file, in
         lines 263-290:
http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&r1=1.56&r2=1.57

         That we had to add in order to get around the
         platform-dependent issues of VNL.

> My 2cts,
> Tom
> 
> 

           Very valuable 2cts, I must say.