[Insight-developers] Empty FixedArray destructor: Performance hit using gcc (times 2)
Karthik Krishnan
karthik.krishnan at kitware.com
Thu Jun 5 13:23:43 EDT 2008
An empty destructor that is not virtual is just unnecessary lines of code.
An empty constructor that is not virtual is also unnecessary lines of
code, unless a copy constructor is explicitly provided, in which case
it is required to provide a constructor.
So I would vote that the destructor be removed. Its less code to maintain :)
However, the empty constructor must be kept, since a copy constructor
is explicitly provided.
Please correct me if I am wrong.
--
karthik
On Thu, Jun 5, 2008 at 4:27 PM, Tom Vercauteren <tom.vercauteren at m4x.org> wrote:
> Hi,
>
> Thanks for your tests, it's great to have see such reactivity!
>
> Below is another test that will show the performance hit. You don't
> need to recompile ITK to use it. What we did was to run a simple loop
> on an C array of FixedArray. Then we hack around to get an 8 byte
> aligned C array of FixedArray and run the loop again.
>
> In this case, the performance hit is clearly not as large as the one
> we get in the real world case but is still large enough to be
> conclusive.
>
> Initial alignment: 4
> Initial execution time: 920ms
> New alignment: 0
> Execution time: 880ms
>
> Let me know what it gives on your setup.
>
> If the destructor is not implemented you would get ( Initial
> alignment: 0 ) and the same timing results.
>
> Tom
>
>
>
> #include <iostream>
> #include <itkFixedArray.h>
>
> int main()
> {
> // Define the number of elements in the array
> const unsigned int nelements = 10000000;
>
> // Define the number of runs used for timing
> const unsigned int nrun = 10;
>
> // Declare a simple timer
> clock_t t;
>
> typedef itk::FixedArray<double,2> ArrayType;
>
> // Declare an array of nelements FixedArray
> // and add a small margin to play with pointers
> // but not map outside the allocated memory
> ArrayType * vec = new ArrayType[nelements+8];
>
> // Fill it up with zeros
> memset(vec,0,(nelements+8)*sizeof(ArrayType));
>
>
>
>
> // Display the alignment of the array
> std::cout << "Initial alignment: " << (((int)vec)& 7) << "\n";
>
> // Start a simple experiment
> t = clock();
> double acc1 = 0.0;
> for (unsigned int i=0;i<nrun;++i)
> {
> for (unsigned int j=0;j<nelements;++j)
> {
> acc1+=vec[j][0];
> }
> }
>
> // Get the final timing and display it
> t=clock() - t;
>
> std::cout << "Initial execution time: "
> << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>
>
>
>
>
> // We now emulate an 8 bytes aligned array
>
> // Cast the pointer to char to play with bytes
> char * p = reinterpret_cast<char*>( vec );
>
> // Move the char pointer until is aligned on 8 bytes
> while (((int)p)%8) ++p;
>
> // Cast the 8 bytes aligned pointer back to the original type
> ArrayType * vec2 = reinterpret_cast<ArrayType*>( p );
>
> // Make sure the new pointer is well aligned by
> // displaying the alignment
> std::cout << "New alignment: " << (((int)vec2)& 7) << "\n";
>
> // Start the simple experiment on the 8 byte aligned array
> t = clock();
> double acc2 = 0.0;
> for (unsigned int i=0;i<nrun;++i)
> {
> for (unsigned int j=0;j<nelements;++j)
> {
> acc2+=vec2[j][0];
> }
> }
>
> // Get the final timing and display it
> t=clock() - t;
>
> std::cout << "Execution time: "
> << (t*1000.0) / CLOCKS_PER_SEC << "ms\n";
>
>
>
>
> // Free up the memory
> delete [] vec;
>
> // Make sure we do something with the sums otherwise everything
> // could be optimized away by the compiler
> return acc1+acc2;
> }
>
>
>
> On Thu, Jun 5, 2008 at 5:04 PM, Gert Wollny <gert at die.upm.es> wrote:
>> Am Donnerstag, den 05.06.2008, 10:24 -0400 schrieb Luis Ibanez:
>>> Hi Gert,
>>>
>>> Thanks for the quick report !
>>>
>>> It makes sense that -g flag will prevent the method
>>> from being optimized away.
>>>
>>> If you have a chance,
>>> could you please test what happens when no -g is
>>> used, and the optimization flag is set to -O3 ?
>> It was not be optimized away, and valgrind/kcachegrind tells me the
>> destructor is located in libITKCommon.so.
>>
>> Actually, with -O3 the whole loop was optimized away. This is wired, to
>> say the least, because, if the compiler doesn't see the implementation
>> of the constructor and the destructor and uses the explicitly
>> instanciated one, it can not know whether there is done something
>> essential in one of the both, like changing a global variable.
>>
>> I've added some code to force the loop (attached).
>>
>> BTW: I think -g doesn't change the optimizers at all (with g++).
>>
>> Best
>>
>> Gert
>>
>>
>>
>>
>>
> _______________________________________________
> Insight-developers mailing list
> Insight-developers at itk.org
> http://www.itk.org/mailman/listinfo/insight-developers
More information about the Insight-developers
mailing list