[vtk-developers] [EXTERNAL] Comparing doubles with floats: precision issues

Fri Jun 16 17:28:38 EDT 2017

Haven’t seen a great reply, and this was a while ago, but here goes.  This is also all theoretical, from days 20 years ago when I worked on OpenGL drivers.

How about the following:

test = (minval – static_cast<double>fval) & (static_cast<double>fval – maxval)

Now, test will be positive if your case should pass, negative if not.  I’m sure I missed some casts above, but you can see what I am doing.  By And’ing the positive bit on the two floats, you see if result is  positive. So, either
if(test >= 0)
  succeed;
else
  not so much;

Or, if your hardware is faster in integer space (and dirty),
if (<unsigned double>test & 0x800000000k)

I suspect this whole test should run in under 10 clocks...

Be sure to test with different compilers, and especially optimized.

Alan

From: vtk-developers [mailto:vtk-developers-bounces at vtk.org] On Behalf Of David Gobbi
Sent: Monday, June 12, 2017 12:16 PM
To: VTK Developers <vtk-developers at vtk.org>
Subject: [EXTERNAL] [vtk-developers] Comparing doubles with floats: precision issues

Hi All,

This is one of those picky math questions that deals with numerical precision.  Let's say that one has a data set with scalar type "float", and wants to select values within a range (minval, maxval) where minval, maxval are of type "double":

    if (fval >= minval && fval <= maxval) { ... }

Now let's say you don't want "fval" to be converted to double, because floats are faster than doubles on your hardware:

   float fminval = static_cast<float>(minval);
   float fmaxval = static_cast<float>(maxval);
   ...
   if (fval >= fminval && fval <= fmaxval) { ... }

Unfortunately, there are some cases where fval <= fmaxval even though fval > maxval.  Reducing the precision of the range has invalidated the check.  In order to fix things, you must choose fminval to be the value closest to but not more than minval, and choose fmaxval to be the value closest to but not less than maxval.

    float fminval = NearestFloatNotGreaterThan(minval);
    float fmaxval = NearestFloatNotLessThan(maxval);

With these, (fval >= fminval && fval <= fmaxval) gives the same result as (fval >= minval && val <= fmaxval) and all is right with the world.

So my question is, have any other devs created a solution for this issue, either for VTK or for related code?  I'm considering a solution based on the C++11 function std::nextafter(), as described on this stackoverflow page: https://stackoverflow.com/questions/15294046/round-a-double-to-the-closest-and-greater-float

 - David

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20170616/cf31fe0c/attachment.html>