[Insight-developers] Re: SPSA Optimizer

Stefan Klein stefan at isi.uu.nl
Thu Mar 24 06:55:50 EST 2005


Hi,


>If we refer to the classical gradient descent optimizer,
>http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/itkGradientDescentOptimizer.cxx?root=Insight&view=markup
>it seems there is to phases. In the first one, the gradient is computed
>without taking into account the scales
>         m_CostFunction->GetValueAndDerivative(
>         this->GetCurrentPosition(), m_Value, m_Gradient );
>In the second phase, the scales are applied to the gradient before
>updating the parameters
>         transformedGradient[j] = m_Gradient[j] / scales[j];

Yep, that was also my first thought. In my first implementation I did it 
like this.

>In your implementation, you take into account the scales in the first
>phase (gradient computation) when applying the perturbation
>         m_Delta[j] /= sqrt(scales[j]);

This is what Daniel did in his implementation. And I think it is correct. 
Take for example the rigid registration problem. Increasing the rotation 
with an angle of 0.1 rad would have a big influence on the similarity 
measure. However, increasing the x-translation with 0.1 mm would only have 
a small influence on the similarity measure. This would mean that the 
"f(thetaplus) - f(thetamin)" would be entirely dominated by the 
perturbation of the rotation. This is not good for the optimisation process 
i think.

Now suppose that the rotation is already close to its optimum. This means 
that "f(thetaplus) - f(thetamin)" would be very small, because the 
translation is only perturbed 0.1 mm.
Moreover, suppose we can only obtain noisy measurements of our function (as 
is assumed in the SPSA). So, instead of measuring f(theta) directly, we 
measure:

F(theta) = f(theta) + epsilon
with epsilon normally distributed N(\mu, \sigma).

'F(thetaplus) - F(thetamin)' would become extremely sensitive for noise 
now. Normally we would increase c to be able to cope with high noise, but, 
this would cause the whole perturbation vector to become larger. For 
example, instead of 0.1 mm, we would try a perturbation of 0.5 mm, and 
instead of 0.1 rad we would perturb the rotation with 0.5 rad. This is way 
too much for the rotation, and would make the registration fail.

Groeten!
Stefan.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.itk.org/mailman/private/insight-developers/attachments/20050324/363116b0/attachment.htm


More information about the Insight-developers mailing list