<html>

<body>

<font size=3>Hi,<br><br>

<br>

<blockquote type=cite class=cite cite>As for the termination criteria, I

agree that it sounds like a great idea to have a stable of callbacks that

people could attach to any optimizer to achieve their desired termination

criteria. This would be much better than the current situation.<br>

</font></blockquote>&nbsp;<br>

I like this idea as well.<br><br>

<blockquote type=cite class=cite cite><font size=3>Given that that's in

the future, do we want to do anything about the termination condition as

it exists now in the SPSA code? (Even something as minimal as providing a

reference to where a scheme like the StateOfConvergence stuff is

described, so users could read something for guidance would be

valuable.)<br>

</font></blockquote><br>

Maybe it would be better to just remove this whole convergence check and

only terminate when the maximum number of iterations has been reached. It

will save us from some parameters to choose. :)<br><br>

Stefan.<br><br>

<br><br>

<blockquote type=cite class=cite cite><font size=3>For my use, I'll

probably just attach an observer and equip it with whatever termination

condition I need.<br><br>

Zach<br><br>

On Jun 27, 2005, at 7:34 AM, Mathieu De Craene wrote:<br><br>

<blockquote type=cite class=cite cite>Le Monday 27 June 2005 à 13:32

+0200, Stefan Klein a écrit :<br>

<blockquote type=cite class=cite cite>Hi Zach,<br><br>

Thanks for your careful review of this code :)<br><br>

<br>

<blockquote type=cite class=cite cite>(1) The convergence test is a bit

unfamiliar to me. Since it's not<br>

what is described in Spall's papers on SPSA, reading them didn't<br>

help clarify matters.<br><br>

In general, I'm familiar with convergence tests that look at how<br>

much the value of the objective function has changed, or how much<br>

the parameters have changed, or how close the gradient magnitude is<br>

to zero. However, the &quot;StateOfConvergence&quot; test in the SPSA

class is<br>

more unclear. Optimization terminates if StateOfConvergence is less<br>

than some threshold, where StateOfConvergence starts at zero and is<br>

updated as follows:<br>

StateOfConvergence &lt;= DecayRate * (StateOfConvergence +

LearningRate<br>

* GradientMagnitude)<br><br>

Could anyone give some guidance as to why this criteria was chosen<br>

over more simplistic tests such as mentioned above, and how to<br>

choose the threshold and decay rate parameter appropriately?<br>

(Naively, it seems that if the gradient starts large and then goes<br>

to zero, optimization could continue for some time as the &quot;state

of<br>

convergence&quot; parameter decays, even though the optimum has

already<br>

been found. Likewise, if the decay rate is too large, optimization<br>

could be terminated prematurely. Thus, choosing the right decay 

rate<br>

seems very important, but it is not clear how best to do

so!)</blockquote><br>

Good question. I copied this part from Mathieu's code, so he may be<br>

better able to answer this question. The problem of 'traditional'<br>

convergence tests is that they stop if the test has been satisfied<br>

once. In stochastic optimisation it is not unlikely that in one<br>

iteration a zero gradient is measured, but in the next iteration

still<br>

progress is being made. Moreover, the gradient magnitude does not<br>

always cancel at the optimum (due to the stochasticity; that's why

the<br>

decreasing gain sequence is needed). Mathieu's method tries to

attack<br>

this problem, but I do agree now that the parameters are quite hard

to<br>

choose properly.<br><br>

Would the following test be better maybe?:<br><br>

If ( (newPosition - currentPosition).magnitude()&nbsp; &lt;<br>

m_SomeUserEnteredThreshold )<br>

{<br>

&nbsp; m_NumberOfVerySmallSteps++;<br>

&nbsp; if (m_NumberOfVerySmallSteps &gt;=<br>

m_UserEnteredMaximumNumberOfVerySmallSteps)<br>

&nbsp; {<br>

&nbsp;&nbsp;&nbsp;&nbsp; this-&gt;StopOptimization();<br>

&nbsp; }<br>

}<br>

else<br>

{<br>

&nbsp; m_NumberOfVerySmallSteps = 0;<br>

}<br><br>

The parameters to set are maybe more intuitive in this way.<br>

</blockquote><br>

I have to confess that in my experiments, the convergence test

always<br>

fails (for a registration application I mean) and the optimizer

performs<br>

the maximum number of steps specified by the user. On synthetic 

cost<br>

functions however, it works well. My only conclusion is that this

might<br>

not be the best test for noisy cost function such the ones we play<br>

with ;-) I like the scheme suggested by Stefan.<br><br>

In fact, this discussion is not specific to spsa... Maybe it would

be<br>

better on a general point of view to use an observer on the optimizer

so<br>

that the user can call the StopOptimization() method when his own<br>

convergence criterion is satisfied.&nbsp; This is more a question for

ITK<br>

designers... Having a lot of various convergence tests embedded in<br>

different optimizers is not so modular.<br><br>

<blockquote type=cite class=cite cite><br>

<blockquote type=cite class=cite cite>(2) I think that there might be a

bug in how the Scales parameter is<br>

used to scale the perturbations and the gradient value. I've read<br>

the email thread from a few months ago concerning this, but I think<br>

there's a bug in the implementation that breaks this slightly.<br><br>

The logic for dealing with the scaled delta (perturbation) values,<br>

as commented at the bottom of the ComputeGradient() method in<br>

itkSPSAOptimizer.cxx (lines 414-451), assumes that scaling is<br>

accomplished by dividing by sqrt(Scales). This sqrt(Scales) term<br>

eventually is transferred to a numerator, and thus to correct for<br>

that (i.e. put the scaling term back in the denominator where it<br>

belongs), the comment suggests an additional division by Scales.<br>

However in the actual code, the scaling is done (as it should be!)<br>

by dividing by Scales. However, in the code the correction remains<br>

an additional division by Scales, when it should be by Scales^2 (as<br>

per the logic in the comment). Thus, the scaling factor winds up<br>

being canceled out entirely, instead of being transfered from a<br>

numerator to a denominator as a division by Scales^2 would properly<br>

do.</blockquote><br>

You are very right. If you use sqrt(Scales) you should divide by<br>

Scales later; if you use Scales, you should use Scales^2 later. I

was<br>

in a doubt whether to use the first or second option; i forgot why 

I<br>

chose for the first option, but the second option (that you propose<br>

now) indeed gives the desired scaling, so is better!<br><br>

Kind regards,<br>

Stefan.<br><br>

</blockquote><br>

I am a bit perplex about how we have to deal with scales... As far as

I<br>

understand, we could always use Scales of 1 for each parameter and

adapt<br>

the transformation for having parameters evolving in the same

range...<br>

That's how I deal with Scales in my rigid registration code... But 

I<br>

know this is not the ITK way of seeing things. Sometime, a linear<br>

mapping of the parameters space is not the best thing to do : when<br>

optimizing the parameters of an affine transformation, it's good to

have<br>

the x,y and z scaling factors evolving from -inf,+inf instead of

0,+inf.<br>

So a logarithmic mapping of these parameters is a good trick...<br><br>

Is there a gradient descent optimizer in ITK where the gradient is<br>

estimated using finite differences ? It could give us some

inspiration<br>

no ?<br><br>

Math.<br><br>

-- <br>

Mathieu De Craene &lt;decraene@tele.ucl.ac.be&gt;<br>

010 47 93 53 - 0478 51 54 54 (cell) - 010 47 20 89 (fax)<br>

<a href="http://www.tele.ucl.ac.be/~decraene" eudora="autourl">http://www.tele.ucl.ac.be/~decraene</a><br><br>

_______________________________________________<br>

Insight-users mailing list<br>

Insight-users@itk.org<br>

<a href="http://www.itk.org/mailman/listinfo/insight-users" eudora="autourl">http://www.itk.org/mailman/listinfo/insight-users</a><br>

</blockquote></font></blockquote></body>

<br>

</html>