[Insight-users] Small problem using LBFGSB

Wed Apr 30 20:09:32 EDT 2008

Hi Tom,

>  > The line search makes a cubic polynomial fit to the function, and
>  > polynomial fits can be sensitive to erroneous data.  I got around the
>  > problem by making my gradient more well behaved.
>
>  I understand that, but how did you manage to make your gradient better
>  behaved? Similar to many ITK applications, I use an image similarity
>  metric with linear interpolation for the image interpolation and
>  nearest-neighbor interpolation for the image gradient interpolation. I
>  tried using both a smoothed gradient image and a central difference
>  gradient image to compute the gradient of my cost function but I still
>  have the above problem in some cases. Maybe I should try using  linear
>  interpolation also for the image gradient interpolation...

My case is kind of specific, so i didnt go into details.  But since
you ask :-)  I was computing a gradient
of MI Inverse-compositionally, using Mattes (or Thevenaz-Unser,
originally) formulation.  That formulation is
asymmetric, so the gradient was not quite right.

The solution i found might be more general - at least if your using a
metric with a probability distribution in the middle.
One of my colleagues here has a paper on pre-seeding the histogram
with a prior distribution.  By preseeding with a
small, uniform distribution (so just prefilling the bins) I avoided
having any zero bins in my PDF, and this seemed
to clear up the problem.

The paper in question is
M. Toews, D.L. Collins and T. Arbel, "Maximum a posteriori local
histogram estimation for image registration", Proceedings of the 8th
International Conference on Medical Image Computing and Computer
Assisted Intervention , Lecture Notes in Computer Science, Vol. 3750,
pp. 163, Palm Springs, CA, Oct. 2005.
solving this particular problem wasnt his original intent though.

As for using a cached gradient image with nearest neighbor
interpolation to compute the gradient, i've done a few experiments on
how accurate the gradient is using that.  I find it tends to be biased
a little low - using the DerivativeCalculator as in the
MattesMutualInformation metric is a bit better, but theres still a
small low bias.  I think this is because all these methods inherently
pre-filter with a small Gaussian, which smooths the high-frequencies a
bit.  However, the bias is tiny and doesnt really seem to cause any
problems.

> Hessian-based optimizers is definitely something I would like to
> explore. Especially, I would strongly argue to get Gauss-Newton like
> optimizers (e.g. Levenberg-Marquardt, Powell's dog leg, ESM, etc.) to
> work with least-squares like image similarity criteria in ITK (Mean
> squared error, cross-correlation, etc.). These are not strictly
> speaking Hessian-based optimizers but can be seen as
> pseudo-Hessian-based and could be developed in an Hessian-based
> optimizer API.

In fact, I've been working on exactly this kind of stuff as part of my
thesis - which im in the process of finishing up now.  At the moment,
im pretty swamped with that, but once its done, i'd be happy to
discuss.

Regards,
Rupert

-- 
--------------------------------------------------------------
Rupert Brooks
McGill Centre for Intelligent Machines (www.cim.mcgill.ca)
Ph.D Student, Electrical and Computer Engineering
http://www.cyberus.ca/~rbrooks