[Insight-developers] vnl updates : Rounding Test : Performance and Correctness
Tom Vercauteren
tom.vercauteren at m4x.org
Wed Dec 31 04:38:05 EST 2008
Hey,
First of all, I apologize for not giving enough background on this
issue. The fact is that I had already spent a lot of time with this on
the vxl side. See e.g.
http://sourceforge.net/mailarchive/forum.php?thread_name=28392e8b0811190226i20ab46a5m2a72a5d9518ce557%40mail.gmail.com&forum_name=vxl-maintainers
1) Let me first discuss the performance side.
According to my timings shown here
http://sourceforge.net/mailarchive/forum.php?thread_name=28392e8b0811270614o7f042cf7o5828d2b31c52c88b%40mail.gmail.com&forum_name=vxl-maintainers
in release mode, the performance improvement is rather a factor 3 or 4
between an optimized version and the basic if version:
time_ref / time_sse : 4.37907
time_ref / time_asm : 3.86511
This is important for image registration as this may lead to a 10%
improvement on the entire registration.
Note that timing this is not trivial since the compiler may optimize
many things out if they are not used in the test program. I provided a
basic timing test here:
http://sourceforge.net/mailarchive/attachment.php?list_name=vxl-maintainers&message_id=28392e8b0811270614o7f042cf7o5828d2b31c52c88b%40mail.gmail.com&counter=1
2) About correctness.
The implementation of vnl_math_round shipped with ITK is already
plateform dependent. Here is a snippet that contains only the double
version:
#if VXL_C_MATH_HAS_LROUND
inline int vnl_math_rnd(double x) { return static_cast<int>(lround(x)); }
#elif defined (VCL_VC) && !defined(__GCCXML__) && !defined(_WIN64)
__inline int
vnl_math_rnd (double flt)
{ int intgr;
_asm
{ fld flt
fistp intgr
} ;
return intgr ;
}
#else
inline int vnl_math_rnd(double x) { return (x>=0.0)?(int)(x +
0.5):(int)(x - 0.5); }
#endif
You can see that in the #elif block, the operation for half integers
will be round to nearest even. In the other cases, it will be round
away from zero.
The problem with this implementation is that on many plateforms
including mine, lround is actually slower than the basic if
implementation ( time_ref / time_lround : 0.657615 ). This is why I
decided to reimplement vnl_round. Since I was focused on performance
and vnl_math_rnd did not enforce any specific rounding scheme for
half-integer, I simply chose to implement the fastest version.
That being said, it is possible to implement a fast rounding scheme
that rounds half-integer away from zero. The limitations are:
- Only works within INT_MIN/2 – 1 and INT_MAX/2 + 1
- Assumes that the FPU rounding mode has not been changed from the
default one (same as the asm version above)
It could thus be possible to have two version of vnl_math_round (one
that rounds away from zero and one that doesn't care about
hal-integers) or we could also modify vnl_math_round to round
half-integers away from zero.
What will not be trivial, is to have a version of vnl_math_round that
rounds half-integers to the nearest even integer on all platforms.
3) Which half-integer rounding mode should be preferred?
That's a tough question.
* As mentioned by Steve, for statistical analysis, round to even might
be preferred.
* Round to even might not be very intuitive at first sight
* If you use nearest-neighbor interpolation, then you might need round
up or round down (but not round away from zero) to avoid weird things
for translations of half integers
* In most cases, you don't really care :-)
4) VNL implementation or ITK implementation?
* VNL seems to be the good place for this since it is already there
and there is currently no such basic operations in ITK
* It is easier for the user if only one version is available
* An ITK version would however be easier to maintain for ITK developpers
* The all set of rounding functions takes 200 lines in vnl_math.h and
needs some cmake magic.
My 2cts,
Tom
On Wed, Dec 31, 2008 at 12:25 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>
> And when adding a template and Macro based implementations
> to the test, and compiling for Release (in the same platform)
> we get the following numbers:
>
>
> Initial Value = -10
> Value Increment = 2e-05
> Probe Tag Starts Stops Time (s)
> Functor 100 100 0.0109601
> Macro 100 100 0.00650821
> if-round 100 100 0.00651316
> std::vector 100 100 0.00443856
> vnl_math_rnd 100 100 0.00450361
>
>
> Error in : -8.5 : -9 : -8
> Error in : -6.5 : -7 : -6
>
>
>
> These results don't quite make sense to me...
> I would have blamed memory-coherence, but after adding
> a global for-loop that repeats the computation 100 times,
> the values remain about the same.
>
>
> ----
>
>
> Any guesses are welcome...
>
>
> BTW,
> If somebody can also explain why the rounding
> errors appear only in:
>
> -8.5 and -6.5
>
> and not in all even numbers in the range -10:10
>
> -8.5, -6.5, -4.5, -2.5, 0.5, 2.5, 4.5, 6.5, 8.5
>
>
> that will be very kind as well... :-)
>
>
>
> Luis
>
>
>
> ------------------
> Luis Ibanez wrote:
>>
>>
>> Just for completeness,
>>
>> Here is the test output
>> when run on Debian Linux with gcc 4.2 (no Debug, no Release)
>>
>>
>> cd Insight/Testing/Code/Common
>>
>> ctest -R itkVNLRound -V
>>
>> 101/184 Testing itkVNLRoundProfileTest1
>> Test command: /home/ibanez/bin/Insight/bin/itkCommonTests2
>> itkVNLRoundProfileTest1
>>
>>
>> Initial Value = -10
>> Value Increment = 2e-05
>>
>> Probe Tag Starts Stops Time (s)
>> if-round 1 1 0.0325642
>> std::vector 1 1 0.0169749
>> vnl_math_rnd 1 1 0.026947
>>
>>
>> Error in : -8.5 : -9 : -8
>> Error in : -6.5 : -7 : -6
>>
>> Tested 1000000 entries
>>
>> -- Process completed
>> ***Failed
>>
>> 0% tests passed, 1 tests failed out of 1
>>
>> The following tests FAILED:
>>
>> -------------------------------
>>
>> (All number below are in "seconds"):
>>
>> The net time of running vnl_math_rnd (one million times) is
>>
>> 0.026947 - 0.0169749 = 0.0099721
>>
>> The net time of running the if-based rounding (one million times) is
>>
>> 0.0325642 - 0.0169749 = 0.0155893
>>
>> The difference between vnl_math_rnd() and the
>> if-based implementation is
>>
>> -0.0056172
>>
>> which is
>>
>> 36 % of the time of running the if-based version
>>
>> or
>>
>> 56 % of the time of running the vnl_math_rnd() version
>>
>>
>> Note that in the great scheme of things we are talking about
>> a difference of
>>
>> 5 nanoseconds
>>
>> per execution.
>>
>>
>> NOTE: The implementation of the if-based version is made in a
>> C function call. Converting it to a template or a macro
>> should already bring them closer.
>>
>>
>> Luis
>>
>>
>> ---
>>
>> Machine Characteristics:
>>
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 23
>> model name : Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz
>> stepping : 6
>> cpu MHz : 800.000
>> cache size : 6144 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 2
>> fdiv_bug : no
>> hlt_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 10
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pb
>> x smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
>> bogomips : 5590.52
>> clflush size : 64
>>
>> processor : 1
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 23
>> model name : Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz
>> stepping : 6
>> cpu MHz : 800.000
>> cache size : 6144 KB
>> physical id : 0
>> siblings : 2
>> core id : 1
>> cpu cores : 2
>> fdiv_bug : no
>> hlt_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 10
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pb
>> x smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
>> bogomips : 5585.97
>> clflush size : 64
>>
>>
>>
>> -----------------------
>> Luis Ibanez wrote:
>>
>>>
>>> To get some hard data on timing and correctness
>>> I just added a test:
>>>
>>> itkVNLRoundProfileTest1.cxx
>>>
>>> to Insight/Testing/Code/Common
>>>
>>>
>>> It does the following:
>>>
>>>
>>> A) Compares the "if" implementation
>>> of rounding to the vnl_math_rnd()
>>>
>>> B) Measures the computation time needed
>>> for executing vnl_math_rnd()
>>>
>>>
>>> After running the test on Linux, gcc 4.2 we observe that
>>> it fails due to the policy of rounding to Even...
>>>
>>>
>>> Here are my early impressions:
>>>
>>> 1) Rounding is too much of a basic operation.
>>>
>>> 2) Implementing it is a very minimal amount of code
>>>
>>> 3) We use it everywhere
>>>
>>> 4) We need it to be cross-platform
>>>
>>> 5) We need it to be reasonably fast,
>>> but it is not the most demanding part of our
>>> algorithms.
>>>
>>>
>>>
>>> That leads me to suggest:
>>>
>>>
>>> a) Let's stop using vnl_math_rnd()
>>>
>>> b) Write our own (let's discuss the details in a separate email)
>>>
>>> We are talking about 10 ~ 20 lines of code,
>>> we should be able to do that :-)
>>>
>>>
>>> c) Have a biased and non-biased implementation
>>>
>>> * double itkRound( double )
>>> * double itkNonBiasRound( double )
>>> * ... and maybe other flavors too...
>>>
>>>
>>> and use one or the other depending on the application.
>>>
>>>
>>>
>>> The overall impression is that: ITK being a library with
>>> more than 130,000 lines of code, it would be shameful to
>>> depend on a third party library for implementing correctly
>>> an operation as basic as rounding, particularly when we
>>> can implement it in 10 lines of code.
>>>
>>>
>>>
>>> Luis
>>>
>>>
>>>
>>> --------------------
>>> Bill Lorensen wrote:
>>>
>>>> I wonder what the radiology community says about this?
>>>>
>>>> On Tue, Dec 30, 2008 at 4:32 PM, Steve M. Robbins <steve at sumost.ca>
>>>> wrote:
>>>>
>>>>> On Tue, Dec 30, 2008 at 01:58:28PM -0500, Luis Ibanez wrote:
>>>>>
>>>>>
>>>>>> Choosing the rounding standard based on the performance of the
>>>>>> platform
>>>>>> doesn't seem to be a good solution for ITK.
>>>>>>
>>>>>> We would want a round() function that produces the *same* output on
>>>>>> *every* platform.
>>>>>
>>>>>
>>>>>
>>>>> That makes sense to me.
>>>>>
>>>>>
>>>>>
>>>>>> If we pick a rounding policy, it should be the same for all
>>>>>> platforms, and it should also include a specification on how
>>>>>> it will apply to negative numbers.
>>>>>
>>>>>
>>>>>
>>>>> I've always been under the impression that round-to-even is to be
>>>>> preferred, since the other methods (round-up, round-down, or
>>>>> round-to-zero) will bias the calculations.
>>>>>
>>>>> Quoting Wikipedia:
>>>>>
>>>>> Despite the custom of rounding the number 4.5 up to 5, in fact 4.5
>>>>> is no nearer to 5 than it is to 4 (it is 0.5 away from both). When
>>>>> dealing with large sets of scientific or statistical data, where
>>>>> trends are important, traditional rounding on average biases the
>>>>> data upwards slightly. Over a large set of data, or when many
>>>>> subsequent rounding operations are performed as in digital signal
>>>>> processing, the round-to-even rule tends to reduce the total
>>>>> rounding error, with (on average) an equal portion of numbers
>>>>> rounding up as rounding down. This generally reduces upwards
>>>>> skewing of the result.
>>>>>
>>>>> http://en.wikipedia.org/wiki/Rounding
>>>>>
>>>>> Also, according to Goldberg's classic paper [1] Knuth also prefers
>>>>> round-to-even, which is a strong recommendation in my books ;-)
>>>>>
>>>>> [1] http://docs.sun.com/source/806-3568/ncg_goldberg.html
>>>>>
>>>>> Cheers,
>>>>> -Steve
>>>>>
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>>>>
>>>>> iD8DBQFJWpNp0i2bPSHbMcURAkkEAKCeGnxwyjeQYc+NiJ7tcPiw/3birgCfZ9zv
>>>>> F5W5uGojrYesCPeonZloUpw=
>>>>> =E/CS
>>>>> -----END PGP SIGNATURE-----
>>>>>
>>>>> _______________________________________________
>>>>> Insight-developers mailing list
>>>>> Insight-developers at itk.org
>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
More information about the Insight-developers
mailing list