[vtk-developers] [EXTERNAL] Comparing doubles with floats: precision issues

Robert Maynard robert.maynard at kitware.com
Wed Jun 21 08:29:26 EDT 2017


We did something fairly similar to the proposed solution when doing
range computations in the past for ParaView, with the increased
challenge of wanting to move N values in a given direction. I think
that using nextafter will be the best way to get correct results for
your problem though. I would also state that the float -> integer bit
trick is trickier than it looks on the surface and I recommend reading
more on this subject before you dive in (
https://randomascii.wordpress.com/2012/01/23/stupid-float-tricks-2/ ).


On Fri, Jun 16, 2017 at 6:31 PM, David Gobbi <david.gobbi at gmail.com> wrote:
> Hi Alan,
>
> Thanks for the idea, these tricks are always useful to know.  They don't
> solve my issue, though, because my goal isn't just optimization.
>
> The thing is, I already have closed classes that do "if (fval >= fminval &&
> fval <= fmaxval)" where all variables are of type "float".  My problem is,
> that I have a range (minval, maxval) in double-precision, and I have to
> compute (fminval, fminval) in single precision to provide to the existing
> code.  As described above, a naive typecast gives the wrong answer in edge
> cases, which is why fminval = NearestFloatNotGreaterThan(minval) and fmaxval
> = NearestFloatNotLessThan(maxval) are necessary.
>
> Cheers,
>  - David
>
>
> On Fri, Jun 16, 2017 at 3:28 PM, Scott, W Alan <wascott at sandia.gov> wrote:
>>
>> Haven’t seen a great reply, and this was a while ago, but here goes.  This
>> is also all theoretical, from days 20 years ago when I worked on OpenGL
>> drivers.
>>
>>
>>
>> How about the following:
>>
>>
>>
>> test = (minval – static_cast<double>fval) & (static_cast<double>fval –
>> maxval)
>>
>>
>>
>> Now, test will be positive if your case should pass, negative if not.  I’m
>> sure I missed some casts above, but you can see what I am doing.  By And’ing
>> the positive bit on the two floats, you see if result is  positive. So,
>> either
>>
>> if(test >= 0)
>>
>>   succeed;
>>
>> else
>>
>>   not so much;
>>
>>
>>
>> Or, if your hardware is faster in integer space (and dirty),
>>
>> if (<unsigned double>test & 0x800000000k)
>>
>>
>>
>> I suspect this whole test should run in under 10 clocks...
>>
>>
>>
>> Be sure to test with different compilers, and especially optimized.
>>
>>
>>
>> Alan
>>
>>
>>
>> From: vtk-developers [mailto:vtk-developers-bounces at vtk.org] On Behalf Of
>> David Gobbi
>> Sent: Monday, June 12, 2017 12:16 PM
>> To: VTK Developers <vtk-developers at vtk.org>
>> Subject: [EXTERNAL] [vtk-developers] Comparing doubles with floats:
>> precision issues
>>
>>
>>
>> Hi All,
>>
>>
>>
>> This is one of those picky math questions that deals with numerical
>> precision.  Let's say that one has a data set with scalar type "float", and
>> wants to select values within a range (minval, maxval) where minval, maxval
>> are of type "double":
>>
>>
>>
>>     if (fval >= minval && fval <= maxval) { ... }
>>
>>
>>
>> Now let's say you don't want "fval" to be converted to double, because
>> floats are faster than doubles on your hardware:
>>
>>
>>
>>    float fminval = static_cast<float>(minval);
>>
>>    float fmaxval = static_cast<float>(maxval);
>>
>>    ...
>>
>>    if (fval >= fminval && fval <= fmaxval) { ... }
>>
>>
>>
>> Unfortunately, there are some cases where fval <= fmaxval even though fval
>> > maxval.  Reducing the precision of the range has invalidated the check.
>> In order to fix things, you must choose fminval to be the value closest to
>> but not more than minval, and choose fmaxval to be the value closest to but
>> not less than maxval.
>>
>>
>>
>>     float fminval = NearestFloatNotGreaterThan(minval);
>>
>>     float fmaxval = NearestFloatNotLessThan(maxval);
>>
>>
>>
>> With these, (fval >= fminval && fval <= fmaxval) gives the same result as
>> (fval >= minval && val <= fmaxval) and all is right with the world.
>>
>>
>>
>> So my question is, have any other devs created a solution for this issue,
>> either for VTK or for related code?  I'm considering a solution based on the
>> C++11 function std::nextafter(), as described on this stackoverflow page:
>> https://stackoverflow.com/questions/15294046/round-a-double-to-the-closest-and-greater-float
>>
>>
>>
>>  - David
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Search the list archives at: http://markmail.org/search/?q=vtk-developers
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/vtk-developers
>
>


More information about the vtk-developers mailing list