memcpy speed

Mon Mar 27 12:50:10 EST 2000

Hi Sebastien,

I'm certainly not suggesting that all 'for' loops are optimized
into 'while' or 'do' loops... I think that 'for' loops are
superior programming style, and the other forms should be used
very sparingly -- only when they provide a fairly significant
performance boost.  Hopefully, gcc will soon have optimizations that
eliminate the performance differences between the various forms.

As for using *ptr++ to go through a matrix instead of mat[i][j],
I think that is a very bad idea in almost all cases... it makes
the code much harder to understand.  Unrolling the inner
loop into 4 statements provides a higher performance boost
and is much, much easier on the eyes.

 - David

--
  David Gobbi, MSc                    dgobbi at irus.rri.on.ca
  Advanced Imaging Research Group
  Robarts Research Institute, University of Western Ontario

On Mon, 27 Mar 2000, Sebastien Barre wrote:

> At 13:06 26/03/00 -0500, David Gobbi a écrit:
> 
> >Well, I decided to run a benchmark of memcpy() versus
> >loops of the form
> >
> >j = count;
> >while (--j >= 0)
> >{
> >   *cp1++ = *cp2++;
> >}
> >
> >and loops of the form
> >
> >for (j = 0; j < count; j++)
> >{
> >   *cp1++ = *cp2++;
> >}
> >
> >
> >Depending on the architecture and the data type, you need to copy
> >at least 32 bytes or memcpy is much slower than copying
> >the data in a loop.  There is often a factor of >5 improvement
> >in looping over using memcpy!
> 
> I've also observed that "feature" sometimes. It depends on the 
> architecture, but it might be worth having a front-end for memcpy that will 
> use either memcpy or that kind of loop (choice made at compilation time 
> depening on a #DEFINE for exemple).
> 
> >Also, with gcc, the 'j = count; while (--j >= 0)' form of looping is
> >around 15% to 30% faster than 'for (j = 0; j < count; j++)' form
> >for copying less than 16 bytes.
> 
> Oh yes, that's common optimization trick :) (actually I thought it was true 
> for more than 16 bytes also, I guess compilers get better and better in 
> that game).
> 
> BTW, I'm quite sure that there is a lot of places in VTK where optimization 
> might be done by replacing loop indices like mat[i][j] = ... by a pointer 
> notation like *ptr = ... (not **ptr, but really *ptr).
> 
> --
> Sebastien BARRE
> IRCOM-SIC, UMR-CNRS 6615 - Université de Poitiers
> Bât. SP2MI, Bvd 3 - Téléport 2, BP 179 F-86960 Futuroscope Cedex
> Tel. : +33 (0)5 49 49 65 95 / 65 83, Fax : +33 (0)5 49 49 65 70
> http://www-sic.univ-poitiers.fr/barre/ ou  http://www.hds.utc.fr/~barre/
>