[ITK-dev] [ITK] About the last optimization on VariableLengthVector

Luc Hermitte luc.hermitte at c-s.fr
Mon Nov 16 08:55:36 EST 2015


Hi,

Le 14/11/2015 00:04, Matt McCormick a écrit :
> Thanks for all your awesome work on increasing the performance of
> itk::VariableLengthVector's and ITK.

Thank you.

> [...snip...]

>> Third Problem: Local Variables
>> 3.1- Auxiliary variables
>> Some algorithms (e.g. BSplines interpolations) require dynamic data
>> structures like matrices. This implies allocation at each iteration.
>> This particular problem isn't addressed in this patch. The solution to
>> 3.2 will help to provide an solution.
>>
>> 3.2- Factorized computations
>> [...]
>>
>> We need a C++98 solution.
>> The only way I see is to rely on cached variables.
>> This means that the ImageFunctions are no longer [[pure]] functions :
>> they'll mutate some variables that we want to allocate once per
>> ImageFunction and ... per thread.
>> Indeed the ImageFunctions shall be reentrant.
>>
>> I was first hoping to use thread_local variables. Alas they cannot be
>> non-static attributes. This is a problem: if I'm not wrong, a same
>> ImageFunction could be used multiple times in a same processing chain.
>>
>> We still have a simple solution, but it requires a second modification
>> to Evaluate() signature: we need to pass the current ThreadId.
>>
>> The new drawback: code migration requires more work. We can not write
>> Evaluate() in terms of EvaluateNoCopy() as the latter requires the
>> threadId that the former doesn't know. This is the reason why I expect
>> the tests to fail.
>> Instead we need to convert all client code (like the WarpImageFilter)
>> and have the default implementation of EvaluateNoCopy() to call the
>> already written and not yet converted Evalutate().
> 
> This would be messy, and it would not work very well since the
> ImageFunction's are not involved in orchestration of the threading.
> Instead, it would be better to make the necessary auxiliary variables
> private variables.  Parallel code that uses ImageFunction's should
> create one ImageFunction instance per thread.

I'm afraid that with this choice, code that currently works well in
parallel won't work anymore. End users will need to rewrite all uses of
image functions in order to use them in MT code. And I guess we will
also need to rewrite several test cases.

Unless we duplicate the Evaluate() code : i.e. unless we keep the
current version, and we add the new and optimized rewrite (for VLV in
most cases, or for every kind of variables in some other cases (e.g.
BSpline interpolation where caching matrix allocations helps reduce
computation time))
Duplicating image function algorithms doesn't seem to be a good choice
either.


>> Fourth Problem: heterogeneous data types (i.e. castings)
>> The input image type, the output image type, and the internal pixel type
>> may all be different. That's why the code contain so many static_casts.
>>
>> Alas a static_cast<> on pixels will induce the creation of new pixel values.
>> This is contrary to the purpose of this patch.
>> Copy constructors and assignment operators already support on-the-fly
>> implicit static_casting.
>> We still need to be able to require casting only when it's required.
>> That's where the new function CastInto(), MoveInto() and As() come into
>> rescue.
> 
> So compiler's are not intelligent to make a static_cast<> a no-op when
> the types are the same?

Unfortunately, it doesn't seem so. Check
http://stackoverflow.com/questions/19106826/can-static-cast-to-same-type-introduce-runtime-overhead

Moreover, in the case of converting VLV<double> to VLV<int>, with a
static_cast, we'll force the creation of a new variable which is what I
striving to avoid.

Hence the new helper functions that'll use the converting assignment
operators in case of VLV, or static_cast in all other cases.

Regards,
-- 
Luc Hermitte


More information about the Insight-developers mailing list