[Insight-developers] Performance impact of smart pointers.

Brad King brad.king@kitware.com
Mon, 6 Aug 2001 16:39:25 -0400 (EDT)


Hello, all:

I wrote a small test program to test the performance impact of using
itk::SmartPointer and the corresponding ::New() method for allocation
instead of real pointers with little-new.  The tests used the pentium
time-stamp-counter to get exact cycle counts for the test loops.

This was not very formal, but it gives us a quick idea of what goes on. I
wrote a LightObject duplicate called TestObject that contains the
reference count and register/unregister functionality for the smart
pointer.  Two test loops were used:

With smart pointer (TObj inherits TestObject and its reference count):
for(unsigned int i = 0; i < 100000; ++i)
  {
  // Allocate with New() and assign to smart pointer.
  // Destructor of smart pointer frees the memory at end of each iteration.
  TObj::Pointer ptr = TObj::New();
  }

Without smart pointer (TObj does not inherit from TestObject):
for(unsigned int i = 0; i < 100000; ++i)
  {
  // Allocate with new and assign to real pointer.
  TObj* ptr = new TObj;
  // Explicitly free the memory at the end of the iteration.
  delete ptr;
  }

The tests were run with TObj object types anywhere from 10 to 100000
bytes, plus the extra 4 bytes for the reference count in the case of smart
pointers.  They were compiled with gcc -O2 so that the smart pointer code
is inlined nicely.

Results:
Regardless of the size of the object, the smart pointer case took an
average of 2e6 more cycles total for the execution of the loop (out of
about 25e6).  This comes out to 20 cycles more per iteration, or about
3.3e-8 seconds more per iteration on a Pentium II 600.  The fractional
increase in time ranged from 6% to 8% depending on the size of the object.  
Note that the loop does nothing but memory allocation/deallocation on the
heap.  These fractional times would go down when other actual work is done
in the loop.

I would be interested in seeing the real code in Insight that uses memory
allocation in the middle of a tight loop.  There should be another way of
going about it.  If all you need is a pointer to a transform object, then
real pointers can be passed around and used in the loop, as long as one
smart pointer holds a reference to the object somewhere.

My conclusion is that performance is not an issue with using
reference-counted objects.  The only time it should make a difference is
when there will be lots of little instances of a class allocated (like a
pixel type).  Then, the individual reference counting is not necessary,
and provides overhead in the memory requirements due to the reference
count.  This is why itk::Array does not inherit from itk::LightObject.

If there are no objections within the next day or so, I'll close the GNATS
entry on this with a comment including the results described above.

-Brad