[ITK-dev] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

brian avants stnava at gmail.com
Wed Mar 19 12:29:47 EDT 2014


yes - i understand.

* matt mccormick implemented compensated summation to address - it helps
but is not a full fix

* truncating floating point precision greatly reduces the effect you are
talking about but is unatisfactory to most people ... not sure if the
functionality for that truncation was taken out of the v4 metrics but it
was in there at one point.

* there may be a small and undiscovered bug that contributes to this in
mattes specificallly but i dont think that's the issue.  we saw this effect
even in mean squares.  if there is a bug it may be beyond just mattes.   we
cannot disprove that there is a bug.  if anyone knows of way to do that,
let me know.

* any help is appreciated


brian




On Wed, Mar 19, 2014 at 12:24 PM, Simon Alexander <skalexander at gmail.com>wrote:

> Brain,
>
> I could have sworn I had initially added a follow up email clarifying this
> but since I can't find it in the current quoted exchange, let me reiterate:
>
> This is not a case of with different results on different systems.  This
> is a case of different results on the same system if you use a different
> number of threads.
>
> So while that possibly could be some odd intrinsics issue, for example,
> the far more likely thing is that data partitioning is not being handled in
> a way that ensures consistency.
>
>  Originally I was also seeing intra-system differences due to internal
> precision, but that was a separate issue and has been solved.
>
> Hope that is more clear!
>
>
>
> On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander <skalexander at gmail.com>wrote:
>
>> Brian,
>>
>> Do you mean the generality of my AVX  internal precision problem?
>>
>> I agree that is a very common issue, the surprising thing there was that
>> we were already constraining the code generation in way that worked as over
>> the different processor generations and types we used, up until we hit the
>> first Haswell cpus with AVX2 support (even though no AVX2 instructions were
>> generated).  Perhaps it shouldn't have surprised me, but It took me a few
>> tests to work that out because the problem was confounded with the problem
>> I discuss in this thread (which is unrelated).  Once I separated them it
>> was easy to spot.
>>
>> So that is a solved issue for now, but I am still interested the
>> partitioning issue in the image metric, as I only have a work around for
>> now.
>>
>>
>>
>> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <stnava at gmail.com> wrote:
>>
>>>
>>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
>>>
>>> just as an example of the generality of this problem
>>>
>>>
>>> brian
>>>
>>>
>>>
>>>
>>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander <skalexander at gmail.com
>>> > wrote:
>>>
>>>> Brian, Luis,
>>>>
>>>> Thanks.  I have been using Mattes as you suspect.
>>>>
>>>> I don't quite understand how precision is specifically the issue with #
>>>> of cores.  There are all kinds of issues with precision and order of
>>>> operations in numerical analysis, but often data partitioning (i.e. for
>>>> concurrency) schemes can be set up so that the actual sums are done the
>>>> same way regardless of number of workers, which keeps your final results
>>>> identical.  Is there some reason this can't be done for the Matte's metric?
>>>>   I really should look at the implementation to answer that, of course.
>>>>
>>>> Do you have a pointer to earlier discussions?  If I can find the time
>>>> I'd like to dig into this a bit, but I'm not sure when I'll have the
>>>> bandwidth.  I've "solved" this currently by constraining the core count.
>>>>
>>>> Perhaps interestingly, my earlier experiments were confounded a bit by
>>>> a precision issue, but that had to do with intrinsics generation on my
>>>> compiler behaving differently on systems with AVX2 (even though only AVX
>>>> intrinsics were being generated).  So that made things confusing at first
>>>> until I separated the issues.
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <stnava at gmail.com> wrote:
>>>>
>>>>> yes - we had several discussions about this during v4 development.
>>>>>
>>>>> experiments showed that differences are due to precision.
>>>>>
>>>>> one solution was to truncate precision to the point that is reliable.
>>>>>
>>>>> but there are problems with that too.   last i checked, this was an
>>>>>
>>>>> open problem, in general, in computer science.
>>>>>
>>>>>
>>>>> brian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez <luis.ibanez at kitware.com>wrote:
>>>>>
>>>>>> Hi Simon,
>>>>>>
>>>>>> We are aware of some multi-threading related issues in
>>>>>> the registration process that result in metric values changing
>>>>>> depending on the number of cores used.
>>>>>>
>>>>>> Are you using the MattesMutualInformationMetric ?
>>>>>>
>>>>>> At some point it was suspected that the problem was the
>>>>>> result of accumulative rounding, in the contributions that
>>>>>> each pixel makes to the metric value.... this may or may
>>>>>> not be related to what you are observing.
>>>>>>
>>>>>>
>>>>>>    Thanks
>>>>>>
>>>>>>        Luis
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander <
>>>>>> skalexander at gmail.com> wrote:
>>>>>>
>>>>>>> I've been finding some regressions in registration results when
>>>>>>> using systems with different numbers of cores (so the thread count is
>>>>>>> different).  This is resolved by fixing the global max.
>>>>>>>
>>>>>>> It's difficult for me to run the identical code on against 4.4.2,
>>>>>>> but similar experiments were run in that timeframe without these
>>>>>>> regressions.
>>>>>>>
>>>>>>> I recall that there were changes affecting multhreading in the v4
>>>>>>> registration in 4.5.0 release, so I thought this might be a side effect.
>>>>>>>
>>>>>>> So a few questions:
>>>>>>>
>>>>>>> Is this behaviour expected?
>>>>>>>
>>>>>>> Am I correct that this was not the behaviour in 4.4.x ?
>>>>>>>
>>>>>>> Does anyone who has a feel for  the recent changes 4.4.2 ->
>>>>>>> 4.5.[0,1]  have a good idea where to start looking?  I haven't yet dug into
>>>>>>> the multithreading architecture, but this "smells" like a data partitioning
>>>>>>> issue to me.
>>>>>>>
>>>>>>> Any other thoughts?
>>>>>>>
>>>>>>> cheers,
>>>>>>> Simon
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Powered by www.kitware.com
>>>>>>>
>>>>>>> Visit other Kitware open-source projects at
>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>
>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>
>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>
>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Community mailing list
>>>>>>> Community at itk.org
>>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>>
>>>>>> Visit other Kitware open-source projects at
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>
>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>> http://kitware.com/products/protraining.php
>>>>>>
>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-developers/attachments/20140319/c0a6ae2b/attachment.html>


More information about the Insight-developers mailing list