[Insight-developers] Empty FixedArray destructor: Performance hit using gcc (times 2)

Bill Lorensen bill.lorensen at gmail.com
Thu Jun 5 13:29:15 EDT 2008


Here is an FAQ on destructors:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=370

They say that an empty destructor is pretty much useless.

Bill

On Thu, Jun 5, 2008 at 1:23 PM, Karthik Krishnan
<karthik.krishnan at kitware.com> wrote:
> An empty destructor that is not virtual is just unnecessary lines of code.
>
> An empty constructor that is not virtual is also unnecessary lines of
> code, unless a copy constructor is explicitly provided, in which case
> it is required to provide a constructor.
>
> So I would vote that the destructor be removed. Its less code to maintain :)
>
> However, the empty constructor must be kept, since a copy constructor
> is explicitly provided.
>
> Please correct me if I am wrong.
>
> --
> karthik
>
>
> On Thu, Jun 5, 2008 at 4:27 PM, Tom Vercauteren <tom.vercauteren at m4x.org> wrote:
>> Hi,
>>
>> Thanks for your tests, it's great to have see such reactivity!
>>
>> Below is another test that will show the performance hit. You don't
>> need to recompile ITK to use it. What we did was to run a simple loop
>> on an C array of FixedArray. Then we hack around to get an 8 byte
>> aligned C array of FixedArray and run the loop again.
>>
>> In this case, the performance hit is clearly not as large as the one
>> we get in the real world case but is still large enough to be
>> conclusive.
>>
>>   Initial alignment: 4
>>   Initial execution time: 920ms
>>   New alignment: 0
>>   Execution time: 880ms
>>
>> Let me know what it gives on your setup.
>>
>> If the destructor is not implemented you would get ( Initial
>> alignment: 0 ) and the same timing results.
>>
>> Tom
>>
>>
>>
>> #include <iostream>
>> #include <itkFixedArray.h>
>>
>> int main()
>> {
>>   // Define the number of elements in the array
>>   const unsigned int nelements = 10000000;
>>
>>   // Define the number of runs used for timing
>>   const unsigned int nrun = 10;
>>
>>   // Declare a simple timer
>>   clock_t t;
>>
>>   typedef itk::FixedArray<double,2> ArrayType;
>>
>>   // Declare an array of nelements FixedArray
>>   // and add a small margin to play with pointers
>>   // but not map outside the allocated memory
>>   ArrayType * vec = new ArrayType[nelements+8];
>>
>>   // Fill it up with zeros
>>   memset(vec,0,(nelements+8)*sizeof(ArrayType));
>>
>>
>>
>>
>>   // Display the alignment of the array
>>   std::cout << "Initial alignment: " << (((int)vec)& 7) << "\n";
>>
>>   // Start a simple experiment
>>   t = clock();
>>   double acc1 = 0.0;
>>   for (unsigned int i=0;i<nrun;++i)
>>   {
>>      for (unsigned int j=0;j<nelements;++j)
>>      {
>>         acc1+=vec[j][0];
>>      }
>>   }
>>
>>   // Get the final timing and display it
>>   t=clock() - t;
>>
>>   std::cout << "Initial execution time: "
>>             << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>>
>>
>>
>>
>>
>>   // We now emulate an 8 bytes aligned array
>>
>>   // Cast the pointer to char to play with bytes
>>   char * p = reinterpret_cast<char*>( vec );
>>
>>   // Move the char pointer until is aligned on 8 bytes
>>   while (((int)p)%8) ++p;
>>
>>   // Cast the 8 bytes aligned pointer back to the original type
>>   ArrayType * vec2 = reinterpret_cast<ArrayType*>( p );
>>
>>   // Make sure the new pointer is well aligned by
>>   // displaying the alignment
>>   std::cout << "New alignment: " << (((int)vec2)& 7) << "\n";
>>
>>   // Start the simple experiment on the 8 byte aligned array
>>   t = clock();
>>   double acc2 = 0.0;
>>   for (unsigned int i=0;i<nrun;++i)
>>   {
>>      for (unsigned int j=0;j<nelements;++j)
>>      {
>>         acc2+=vec2[j][0];
>>      }
>>   }
>>
>>   // Get the final timing and display it
>>   t=clock() - t;
>>
>>   std::cout << "Execution time: "
>>             << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>>
>>
>>
>>
>>   // Free up the memory
>>   delete [] vec;
>>
>>   // Make sure we do something with the sums otherwise everything
>>   // could be optimized away by the compiler
>>   return acc1+acc2;
>> }
>>
>>
>>
>> On Thu, Jun 5, 2008 at 5:04 PM, Gert Wollny <gert at die.upm.es> wrote:
>>> Am Donnerstag, den 05.06.2008, 10:24 -0400 schrieb Luis Ibanez:
>>>> Hi Gert,
>>>>
>>>> Thanks for the quick report !
>>>>
>>>> It makes sense that -g flag will prevent the method
>>>> from being optimized away.
>>>>
>>>> If you have a chance,
>>>> could you please test what happens when no -g is
>>>> used, and the optimization flag is set to -O3 ?
>>> It was not be optimized away, and valgrind/kcachegrind tells me  the
>>> destructor is located in libITKCommon.so.
>>>
>>> Actually, with -O3 the whole loop was optimized away. This is wired, to
>>> say the least, because, if the compiler doesn't see the implementation
>>> of the constructor and the destructor and uses the explicitly
>>> instanciated one, it can not know whether there is done something
>>> essential in one of the both, like changing a global variable.
>>>
>>> I've added some code to force the loop (attached).
>>>
>>> BTW: I think -g doesn't change the optimizers at all (with g++).
>>>
>>> Best
>>>
>>> Gert
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> Insight-developers mailing list
>> Insight-developers at itk.org
>> http://www.itk.org/mailman/listinfo/insight-developers
> _______________________________________________
> Insight-developers mailing list
> Insight-developers at itk.org
> http://www.itk.org/mailman/listinfo/insight-developers
>


More information about the Insight-developers mailing list