[Insight-developers] Empty FixedArray destructor: Performancehit using gcc (times 2)

Sat Jun 7 11:33:38 EDT 2008

Brad's explanation is a major eye opener to me as well.  :-)  I never 
realized before that there could be such a relation between the byte 
alignment of the object allocated by new T[N], and the destructor 
T::~T().  Thanks Brad!

Basically the relevant new-operation is at 
"itkImportImageContainer.txx", right? 
ImportImageContainer::AllocateElements does:
  data = new TElement[size];

So removing the user-defined destructor from FixedArray would /only/ 
solve the issue when TElement (the pixel type) is a FixedArray. And even 
in that case, I guess it /only/ solves the issue when the element type 
of FixedArray doesn't have a user-defined destructor. Right?

I think the issue /could/ be solved more generally, by avoiding new [], 
and using malloc instead.  Within 
ImportImageContainer::AllocateElements:

  data = static_cast<TElement *>(
    std::malloc( sizeof(TElement)* size) );

I'm not a fan of using malloc in general, but this /might/ be a valid 
reason to do so...  Unfortunately std::malloc does not construct the 
data elements, so it should be followed by a placement new, to do the 
construction in-place:

  new ( data ) TElement [size];

If we would do so, each delete [] call in itkImportImageContainer.txx 
should be replaced by explicitly destructing the elements of 
m_ImportPointer, followed by std::free(m_ImportPointer)...

What do you think?

Kind regards, Niels

P.S. FWIW, Boost's array implementation is somewhat similar to 
FixedArray, and doesn't have a user defined destructor. 
http://www.boost.org/doc/libs/1_35_0/boost/array.hpp

David Cole wrote:
> Now, the question is: will the GCC developers use extra bytes in
> future releases to record the length of the array to avoid this sort
> of performance hit...? (Matching the alignment that the allocator is
> careful to return would be a good feature for 'new'...)

Brad King wrote:
> Sorry for the delayed response but I've been away from email all
> afternoon.  Here is what is going on, using Luis's test case from
> elsewhere in this thread:
>
> class MyArray
> {
> public:
>  MyArray() {};
>  ~MyArray() {};
>  double operator[](unsigned int k) { return foo[k]; }
>  double foo[2];
> };
>
> Since double does not *have* to be aligned at 8 bytes on x86 the
> alignment of this struct is 4 bytes...WITH OR WITHOUT the destructor.
> Now consider the code
>
>  void* x = new MyArray[2];
>
> It's the address to which 'x' points that changes when the destructor
> is added or removed.  The malloc that is done returns an 8-byte
> aligned
> memory buffer in both cases.  It's operator new that adjusts it.
>
> When the destructor is present GCC needs a place to record the length
> of the array so that upon deletion it knows how many times the
> destructor needs to be called.  Their implementation allocates 4
> additional bytes
> of memory and chooses the first 4 bytes of the resulting buffer to
> store this count so the data are placed starting at 4 bytes past what
> was returned by malloc, and that value is stored in x.  Then all the
> doubles get poorly aligned and performance suffers.
>
> When the destructor is not present it does not need to do anything at
> deletion time but free the memory, so no count is needed and the
> buffer returned by malloc is used directly.  The doubles end up with
> good alignment.
>
> -Brad