[ITK-dev] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

Simon Alexander skalexander at gmail.com
Fri Mar 28 16:57:40 EDT 2014


There is a lot going on here, and I'm not certain that I've got all the
moving pieces straight in my mind yet, but I've had an quick look at the
implementation now. I believe the Mattes v4 implementation is similar to
 other metrics it it's approach.

As I suggested earlier in the thread: I believe accumulations like this:

 for( ThreadIdType threadID = 1; threadID < this->GetNumberOfThreadsUsed();
> threadID++ )
>     {
>     this->m_ThreaderJointPDFSum[0] +=
> this->m_ThreaderJointPDFSum[threadID];
>     }


will guarantee that we don't have absolute consistent results between
different threadcounts, due to lack of associativity.

When I perform only transform initialization and a single evaluation of
 the metric (i.e. outside of the registration routines), I get results
consistent with this, for example, results for an center-of-mass
initialization between two MR image volumes give me (double precision):

   - 1 thread :  -0.396771472451519
   - 2 threads: -0.396771472450998
   - 8 threads: -0.396771472451149

for the metric evalution (i.e. via GetValue() of the metric)

AFAICS, This is consistent magnitude of delta from the above.  It will mean
not chance of binary equivalence between different
threadcounts/partitioning but you can do this accumulation quite a few
times before the accumulated divergence gets into digits to worry about.
 This sort of thing is avoidable, but at some space/speed cost.

However, In the registration for this case it takes only about twenty steps
for divergence in the third significant digit between metric estimates!
(via registration->GetOptimizer()->GetCurrentMetricValue() )

Clearly the optimizer is not following the same path, so I think something
else must be going on.

So at this point I don't think the data partitioning of the metric is the
root cause, but I will have a more careful look later.

Any holes in this analysis you can see so far?

When I have time to get back into this, I plan to have a look at the
optimizer next, unless you have better suggestions of where to look next.

cheers,
Simon



On Wed, Mar 19, 2014 at 12:56 PM, Simon Alexander <skalexander at gmail.com>wrote:

> Brian, my apologies for the typo.
>
> I assume you all are at least as busy as I am; just didn't want to leave
> the impression that I would definitely be able to pursue this, but I will
> try.
>
>
> On Wed, Mar 19, 2014 at 12:45 PM, brian avants <stnava at gmail.com> wrote:
>
>> it's brian - and, yes, we all have "copious free time" of course.
>>
>>
>> brian
>>
>>
>>
>>
>> On Wed, Mar 19, 2014 at 12:43 PM, Simon Alexander <skalexander at gmail.com>wrote:
>>
>>> Thanks for the summary Brain.
>>>
>>> A lot of partitioning issues fundamentally  come down to the lack of
>>> associativity & distributivity  of fp operations.  Not sure I can do
>>> anything practical to improve it  but I will have a look if I can find a
>>> bit of my "copious free time" .
>>>
>>>
>>> On Wed, Mar 19, 2014 at 12:29 PM, brian avants <stnava at gmail.com> wrote:
>>>
>>>> yes - i understand.
>>>>
>>>> * matt mccormick implemented compensated summation to address - it
>>>> helps but is not a full fix
>>>>
>>>> * truncating floating point precision greatly reduces the effect you
>>>> are talking about but is unatisfactory to most people ... not sure if the
>>>> functionality for that truncation was taken out of the v4 metrics but it
>>>> was in there at one point.
>>>>
>>>> * there may be a small and undiscovered bug that contributes to this in
>>>> mattes specificallly but i dont think that's the issue.  we saw this effect
>>>> even in mean squares.  if there is a bug it may be beyond just mattes.   we
>>>> cannot disprove that there is a bug.  if anyone knows of way to do that,
>>>> let me know.
>>>>
>>>> * any help is appreciated
>>>>
>>>>
>>>> brian
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 12:24 PM, Simon Alexander <
>>>> skalexander at gmail.com> wrote:
>>>>
>>>>> Brain,
>>>>>
>>>>> I could have sworn I had initially added a follow up email clarifying
>>>>> this but since I can't find it in the current quoted exchange, let me
>>>>> reiterate:
>>>>>
>>>>> This is not a case of with different results on different systems.
>>>>>  This is a case of different results on the same system if you use a
>>>>> different number of threads.
>>>>>
>>>>> So while that possibly could be some odd intrinsics issue, for
>>>>> example, the far more likely thing is that data partitioning is not being
>>>>> handled in a way that ensures consistency.
>>>>>
>>>>>  Originally I was also seeing intra-system differences due to internal
>>>>> precision, but that was a separate issue and has been solved.
>>>>>
>>>>> Hope that is more clear!
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander <
>>>>> skalexander at gmail.com> wrote:
>>>>>
>>>>>> Brian,
>>>>>>
>>>>>> Do you mean the generality of my AVX  internal precision problem?
>>>>>>
>>>>>> I agree that is a very common issue, the surprising thing there was
>>>>>> that we were already constraining the code generation in way that worked as
>>>>>> over the different processor generations and types we used, up until we hit
>>>>>> the first Haswell cpus with AVX2 support (even though no AVX2 instructions
>>>>>> were generated).  Perhaps it shouldn't have surprised me, but It took me a
>>>>>> few tests to work that out because the problem was confounded with the
>>>>>> problem I discuss in this thread (which is unrelated).  Once I separated
>>>>>> them it was easy to spot.
>>>>>>
>>>>>> So that is a solved issue for now, but I am still interested the
>>>>>> partitioning issue in the image metric, as I only have a work around for
>>>>>> now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <stnava at gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
>>>>>>>
>>>>>>> just as an example of the generality of this problem
>>>>>>>
>>>>>>>
>>>>>>> brian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander <
>>>>>>> skalexander at gmail.com> wrote:
>>>>>>>
>>>>>>>> Brian, Luis,
>>>>>>>>
>>>>>>>> Thanks.  I have been using Mattes as you suspect.
>>>>>>>>
>>>>>>>> I don't quite understand how precision is specifically the issue
>>>>>>>> with # of cores.  There are all kinds of issues with precision and order of
>>>>>>>> operations in numerical analysis, but often data partitioning (i.e. for
>>>>>>>> concurrency) schemes can be set up so that the actual sums are done the
>>>>>>>> same way regardless of number of workers, which keeps your final results
>>>>>>>> identical.  Is there some reason this can't be done for the Matte's metric?
>>>>>>>>   I really should look at the implementation to answer that, of course.
>>>>>>>>
>>>>>>>> Do you have a pointer to earlier discussions?  If I can find the
>>>>>>>> time I'd like to dig into this a bit, but I'm not sure when I'll have the
>>>>>>>> bandwidth.  I've "solved" this currently by constraining the core count.
>>>>>>>>
>>>>>>>> Perhaps interestingly, my earlier experiments were confounded a bit
>>>>>>>> by a precision issue, but that had to do with intrinsics generation on my
>>>>>>>> compiler behaving differently on systems with AVX2 (even though only AVX
>>>>>>>> intrinsics were being generated).  So that made things confusing at first
>>>>>>>> until I separated the issues.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <stnava at gmail.com>wrote:
>>>>>>>>
>>>>>>>>> yes - we had several discussions about this during v4 development.
>>>>>>>>>
>>>>>>>>> experiments showed that differences are due to precision.
>>>>>>>>>
>>>>>>>>> one solution was to truncate precision to the point that is
>>>>>>>>> reliable.
>>>>>>>>>
>>>>>>>>> but there are problems with that too.   last i checked, this was
>>>>>>>>> an
>>>>>>>>>
>>>>>>>>> open problem, in general, in computer science.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> brian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez <
>>>>>>>>> luis.ibanez at kitware.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Simon,
>>>>>>>>>>
>>>>>>>>>> We are aware of some multi-threading related issues in
>>>>>>>>>> the registration process that result in metric values changing
>>>>>>>>>> depending on the number of cores used.
>>>>>>>>>>
>>>>>>>>>> Are you using the MattesMutualInformationMetric ?
>>>>>>>>>>
>>>>>>>>>> At some point it was suspected that the problem was the
>>>>>>>>>> result of accumulative rounding, in the contributions that
>>>>>>>>>> each pixel makes to the metric value.... this may or may
>>>>>>>>>> not be related to what you are observing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Thanks
>>>>>>>>>>
>>>>>>>>>>        Luis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander <
>>>>>>>>>> skalexander at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've been finding some regressions in registration results when
>>>>>>>>>>> using systems with different numbers of cores (so the thread count is
>>>>>>>>>>> different).  This is resolved by fixing the global max.
>>>>>>>>>>>
>>>>>>>>>>> It's difficult for me to run the identical code on against
>>>>>>>>>>> 4.4.2, but similar experiments were run in that timeframe without these
>>>>>>>>>>> regressions.
>>>>>>>>>>>
>>>>>>>>>>> I recall that there were changes affecting multhreading in the
>>>>>>>>>>> v4 registration in 4.5.0 release, so I thought this might be a side effect.
>>>>>>>>>>>
>>>>>>>>>>> So a few questions:
>>>>>>>>>>>
>>>>>>>>>>> Is this behaviour expected?
>>>>>>>>>>>
>>>>>>>>>>> Am I correct that this was not the behaviour in 4.4.x ?
>>>>>>>>>>>
>>>>>>>>>>> Does anyone who has a feel for  the recent changes 4.4.2 ->
>>>>>>>>>>> 4.5.[0,1]  have a good idea where to start looking?  I haven't yet dug into
>>>>>>>>>>> the multithreading architecture, but this "smells" like a data partitioning
>>>>>>>>>>> issue to me.
>>>>>>>>>>>
>>>>>>>>>>> Any other thoughts?
>>>>>>>>>>>
>>>>>>>>>>> cheers,
>>>>>>>>>>> Simon
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>>>
>>>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>>>
>>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>>>>>
>>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>>>>>
>>>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Community mailing list
>>>>>>>>>>> Community at itk.org
>>>>>>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>>
>>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>>
>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>>>>
>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>>>>
>>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-developers/attachments/20140328/1c3769df/attachment.html>


More information about the Insight-developers mailing list