[Insight-developers] Empty FixedArray destructor: Performance hit using gcc (times 2)

Karthik Krishnan karthik.krishnan at kitware.com
Thu Jun 5 13:23:43 EDT 2008


An empty destructor that is not virtual is just unnecessary lines of code.

An empty constructor that is not virtual is also unnecessary lines of
code, unless a copy constructor is explicitly provided, in which case
it is required to provide a constructor.

So I would vote that the destructor be removed. Its less code to maintain :)

However, the empty constructor must be kept, since a copy constructor
is explicitly provided.

Please correct me if I am wrong.

--
karthik


On Thu, Jun 5, 2008 at 4:27 PM, Tom Vercauteren <tom.vercauteren at m4x.org> wrote:
> Hi,
>
> Thanks for your tests, it's great to have see such reactivity!
>
> Below is another test that will show the performance hit. You don't
> need to recompile ITK to use it. What we did was to run a simple loop
> on an C array of FixedArray. Then we hack around to get an 8 byte
> aligned C array of FixedArray and run the loop again.
>
> In this case, the performance hit is clearly not as large as the one
> we get in the real world case but is still large enough to be
> conclusive.
>
>   Initial alignment: 4
>   Initial execution time: 920ms
>   New alignment: 0
>   Execution time: 880ms
>
> Let me know what it gives on your setup.
>
> If the destructor is not implemented you would get ( Initial
> alignment: 0 ) and the same timing results.
>
> Tom
>
>
>
> #include <iostream>
> #include <itkFixedArray.h>
>
> int main()
> {
>   // Define the number of elements in the array
>   const unsigned int nelements = 10000000;
>
>   // Define the number of runs used for timing
>   const unsigned int nrun = 10;
>
>   // Declare a simple timer
>   clock_t t;
>
>   typedef itk::FixedArray<double,2> ArrayType;
>
>   // Declare an array of nelements FixedArray
>   // and add a small margin to play with pointers
>   // but not map outside the allocated memory
>   ArrayType * vec = new ArrayType[nelements+8];
>
>   // Fill it up with zeros
>   memset(vec,0,(nelements+8)*sizeof(ArrayType));
>
>
>
>
>   // Display the alignment of the array
>   std::cout << "Initial alignment: " << (((int)vec)& 7) << "\n";
>
>   // Start a simple experiment
>   t = clock();
>   double acc1 = 0.0;
>   for (unsigned int i=0;i<nrun;++i)
>   {
>      for (unsigned int j=0;j<nelements;++j)
>      {
>         acc1+=vec[j][0];
>      }
>   }
>
>   // Get the final timing and display it
>   t=clock() - t;
>
>   std::cout << "Initial execution time: "
>             << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>
>
>
>
>
>   // We now emulate an 8 bytes aligned array
>
>   // Cast the pointer to char to play with bytes
>   char * p = reinterpret_cast<char*>( vec );
>
>   // Move the char pointer until is aligned on 8 bytes
>   while (((int)p)%8) ++p;
>
>   // Cast the 8 bytes aligned pointer back to the original type
>   ArrayType * vec2 = reinterpret_cast<ArrayType*>( p );
>
>   // Make sure the new pointer is well aligned by
>   // displaying the alignment
>   std::cout << "New alignment: " << (((int)vec2)& 7) << "\n";
>
>   // Start the simple experiment on the 8 byte aligned array
>   t = clock();
>   double acc2 = 0.0;
>   for (unsigned int i=0;i<nrun;++i)
>   {
>      for (unsigned int j=0;j<nelements;++j)
>      {
>         acc2+=vec2[j][0];
>      }
>   }
>
>   // Get the final timing and display it
>   t=clock() - t;
>
>   std::cout << "Execution time: "
>             << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>
>
>
>
>   // Free up the memory
>   delete [] vec;
>
>   // Make sure we do something with the sums otherwise everything
>   // could be optimized away by the compiler
>   return acc1+acc2;
> }
>
>
>
> On Thu, Jun 5, 2008 at 5:04 PM, Gert Wollny <gert at die.upm.es> wrote:
>> Am Donnerstag, den 05.06.2008, 10:24 -0400 schrieb Luis Ibanez:
>>> Hi Gert,
>>>
>>> Thanks for the quick report !
>>>
>>> It makes sense that -g flag will prevent the method
>>> from being optimized away.
>>>
>>> If you have a chance,
>>> could you please test what happens when no -g is
>>> used, and the optimization flag is set to -O3 ?
>> It was not be optimized away, and valgrind/kcachegrind tells me  the
>> destructor is located in libITKCommon.so.
>>
>> Actually, with -O3 the whole loop was optimized away. This is wired, to
>> say the least, because, if the compiler doesn't see the implementation
>> of the constructor and the destructor and uses the explicitly
>> instanciated one, it can not know whether there is done something
>> essential in one of the both, like changing a global variable.
>>
>> I've added some code to force the loop (attached).
>>
>> BTW: I think -g doesn't change the optimizers at all (with g++).
>>
>> Best
>>
>> Gert
>>
>>
>>
>>
>>
> _______________________________________________
> Insight-developers mailing list
> Insight-developers at itk.org
> http://www.itk.org/mailman/listinfo/insight-developers


More information about the Insight-developers mailing list