[Insight-users] Debugging bspline registration

Mon Sep 20 16:54:12 EDT 2010

On Thu, Sep 16, 2010 at 13:41, Jim Miller <millerjv at ge.com> wrote:
> Andriy,
> Just to clarify:
> a. Inconsistent behavior relative to fixed and moving selection in Release
> mode

Again, the setup is the following. Image1 is my fixed image, Image2 is
the fixed image with slightly different origin (as I described in my
initial email). The moving image stays the same. The behavior is
inconsistent depending whether I select Image1 or Image2 as fixed.

Yes, I observe inconsistent behavior in Release mode with multiple
threads (convergence vs divergence)

> b. Consistent behavior relative to fixed and moving selection in Debug mode

In the Debug mode, registration converges in both cases with multiple
threads, although the number of iterations is different.

> c. Consistent behavior relative to fixed and moving selection with 1 thread
> (in Release? in Debug?)

Yes, consistent convergence behavior (same number of iterations on
each registration level) with 1 thread both in Release and Debug.

> If the multi-threaded execution time is equivalent to the single threaded,
> then I would guess that a resampling stage is dominating the performance
> rather than the metric evaluation/optimization.

I had off-the-list discussion with Stephen Aylward about this, and he
suggested a similar idea. I use 100K samples, I would think this
should be sufficient to justify multithreading. What I discovered is
that the optimization convergence is very different depending whether
I use one thread or multiple threads, which contributes a lot to the
difference in performance. Here's a sample of the results (compiled in
Debug mode):

16 threads: (Transformation -- Num. Iterations -- Execution time)
Rigid -- 65 -- 20.6sec
ScaleVersor -- 10 -- 8.6sec
ScaleSkewVersor -- 13 -- 9.2 sec
Affine -- 15 -- 9.7sec
BSpline -- 15 -- 38.2sec

1 thread:
Rigid -- 93 -- 102sec
ScaleVersor -- 15 -- 22.8sec
ScaleSkewVersor -- 10 -- 19.3sec
Affine -- 24 -- 35.2sec
BSpline -- 6 -- 114.3sec

> The mutlithreaded b-spline has a number of available optimizations, caching
> different levels of information.  How do you have the method configured? I
> am assuming that you are using the optimized registration methods.

I am using optimized methods. ITK was configured using standard Slicer
parameters (optimized methods + review).

> Certainly sounds like a threading issue with either the various cached
> information being corrupted or used incorrectly, the consolidation of what
> each thread computed being combined incorrectly, or a thread dying.

As Stephen explained to me, in the multithreaded version of the metric
calculation, they were not able to completely eliminate the
discrepancy between the metric value computed with one thread vs
multiple threads. I quote from his email:

==>
For multi-threaded methods, you will get slightly different results
than on single threaded.   We did everything we could to minimize
this, but in the end, accumulating 1/N results (given N threads) and
then summing those N values lead to small rounding differences
compared to simpling summing N values in a single register.   We spent
quite some time trying to minimize this effect, but it still persists.

Given small differences in the very small bits shouldn't have much
effect, but it could be enough when accumulated over tens (hundreds?)
of thousands of samples and repeated over many iteration such that the
two have slightly different results.   Those slight results in one
iteration can cause rapid divergence of answers, particularly if your
scale values are really large.   So, one image converges and the other
goes shooting off.
<==

This is exactly what seems to happen in my case.

Now, my question is, should I use multi-threaded registration for
critical applications or not?

Is the different convergence behavior due to the unavoidable rounding
differences, or this is a more serious bug?

> You can probably check whether a thread died by adding some code where the
> multithreader waits for the threads to return.

There seem to be a check already to make sure thread exited correctly
(see itk::MultiThreader::SingleMethodProxy()). Is this not sufficient?

AF

> Jim
>
> On Sep 13, 2010, at 7:21 PM, Andriy Fedorov wrote:
>
> When I compile the same code in Debug mode, bspline optimizer reaches
> convergence without error independently of the image choice. When I
> run the same example with 1 thread, I also have convergence and no
> error.
>
> Interestingly, the number of iterations with 1 thread is different for
> each of the transformation levels. BSpline converges in 6 iterations.
> Also, execution time is very similar for 1 thread as it is for 16
> threads (just ~1 minute!), with the registration results very similar
> visually.
>
> Jim Miller
> Senior Scientist
> GE Research
> Interventional and Therapy
> GE imagination at work
>