[vtk-developers] floating-point vs. integer performance

Sun Jun 17 22:14:13 EDT 2001

Hi Christopher,

I found this nifty little trick at http://www.stereopsis.com/FPU.html
that works with IEEE doubles on a little endian machine for numbers
in the range [-32768,+32767]

static inline int FastFloor(double x)
{
  x += 274877906944.0*1.5;  // (2**(52-16))*1.5
  return (((int *)&x))[0] >> 16;
}

I clocked it at 7.8 times faster than using int(floor(x)),
and 3.3 times faster than using just int(x) on my PIII computer.

After converting vtkImageReslice to use the above function, the
integer version of the code is only 25% faster than the float version
instead of being 100% faster (the code contains a large number
of float->int conversions).  I haven't done any low-level
profiling of float vs. integer operations yet.

 - David

--
  David Gobbi, MSc                       dgobbi at irus.rri.ca
  Advanced Imaging Research Group
  Robarts Research Institute, University of Western Ontario

On Fri, 15 Jun 2001, Volpe, Christopher R (CRD) wrote:

>
> |> No, but I could try and get back to you.  I wouldn't expect any
> |> difference.
>
> Hmmm. I wonder what would explain the discrepancy then.
>
> |>
> |> > I've done some experiments with integer arithmetic
> |> > vs. floating point arithmetic when I was toying with the
> |> idea of representing points and doing
> |> > transformations in fixed point. (This was mainly because
> |> the results were to be used as indices into
> |> > a 3D volume, and in CPU time, the compiler-generated x86
> |> processor state change to trunc mode to do
> |> > an integer cast is like watching continental drift.
> |>
> |> I don't use the floor() function, it is pitifully slow on
> |> x86 processors.
>
> Yes, like I said, continental drift.
>
> |> You can just do the float->int conversion in the current
> |> mode, followed
> |> by an 'if' statement that adds or subtracts 1 as necessary.
> |> Much faster.
>
> Which is what I was alluding to below, except that I skip the
> check because my data range is all positive and round mode works
> just as well for me.
>
> |>
> |> > Using inline assembler to hard code a round
> |> > instruction speeds this up so that it is merely
> |> agonizingly slow, but I digress...) I've found that
> |> > floating point beats integer arithmetic by a small margin
> |> on a PII. I think my test was something
> |> > simple like multiplying two volatile variables and storing
> |> results in a tight loop. Not the best of
> |> > tests, I'm sure.
> |>
> |> Ah, but there are several tricks you can play with
> |> fixed-point to speed
> |> things up.
>
> Yes, which amounts to avoiding unnecessary work when possible.
> But unfortunately that doesn't really help you if your CPU takes
> more time doing integer primitive ops than floating point primitive
> ops. I should probably take another look at that and see if I can
> determine whether there is a difference between the PII and P///
> in this.
>
>
> |> > I don't know about Irix, but a while back, on Solaris, an
> |> integer multiplication would generate a
> |> > FUNCTION CALL (_imul(), I believe) to do the operation,
> |> unless you specified some obscure compiler
> |> > flag to generate SPARC v8 architecture instructions, or
> |> something along those lines.
> |>
> |> That is almost beyond comprehension.
>
> Yes, I know.
>
> |>  I'll check the SGI compiler
> |> man pages to be sure it is not the same.
>
> Or just generate assembly output and have a look at it. I don't
> think this is the problem though. The imul thing was, if I recall,
> only on a Sun, and it was several (at least 4) years ago.
>
> -Chris
>