[Insight-users] Parameter scales for registration (second try)

Tue May 7 17:22:10 EDT 2013

Hi Joel,

Also, could you clarify your question a bit?  The snippet of interest in 
the old itkRegularStepGradientOptimizer is

  for(unsigned int j=0; j<spaceDimension; j++)
    {
    newPosition[j] = currentPosition[j] + transformedGradient[j] * factor;
    }

When you write of "first scaling" are you referring to multiplication by 
"factor"? 

Nick

On May 7, 2013, at 3:42 PM, Joël Schaerer <joel.schaerer at gmail.com> wrote:

> Hi Brian,
> 
> I did read this code, but if I understand correctly, ModifyGradientByScales does the equivalent of the first scaling that was already done in the old itkRegularStepGradientOptimizer.
> 
> However the whole point of my post is that it would make sense (I think!) to apply the scales *again* after adjusting the gradient for step size (or "learning rate"), which doesn't seem to be done in this code either.
> 
> Note that in itkGradientDescentOptimizerv4, the gradient isn't normalized, so I don't think my change is needed. But if you implemented a RegularStepGradientOptimizerv4, I think re-applying the scales after scaling the gradient would be a good thing.
> 
> joel
> 
> 
> On 05/07/2013 06:07 PM, brian avants wrote:
>> yes - there is something you are missing.   read the code below: 
>> 
>>   /* Begin threaded gradient modification.
>>    * Scale by gradient scales, then estimate the learning
>>    * rate if options are set to (using the scaled gradient),
>>    * then modify by learning rate. The m_Gradient variable
>>    * is modified in-place. */
>>   this->ModifyGradientByScales();
>>   this->EstimateLearningRate();
>>   this->ModifyGradientByLearningRate();
>> 
>> the call to    this->ModifyGradientByScales();   changes the gradient according the scales, as the name suggests.   the v4 optimizers all behave in this general manner although this is taken from the gradient descent class.
>> 
>> so - the transform expects that the update was already modified by the scales so that the only thing the transform needs to do ( if anything at all) is multiply by a scalar
>> 
>> also, you are looking at the base class which is only used if the derived class did not implement UpdateTransformParameters.  for instance, the GaussianDisplacementField transform will also smooth the parameters when this function is called.
>> 
>> is this clear enough?
>> 
>> 
>> 
>> brian
>> 
>> 
>> 
>> 
>> On Tue, May 7, 2013 at 11:59 AM, Joël Schaerer <joel.schaerer at gmail.com> wrote:
>> I spent a while looking at the v4 optimization framework. I can follow your reasoning until UpdateTransformParameters is called on the transform. However, at this state, the old scaling is still done:
>> 
>> itkTransform.hxx:
>>   if( factor == 1.0 )
>>     {
>>     for( NumberOfParametersType k = 0; k < numberOfParameters; k++ )
>>       {
>>       this->m_Parameters[k] += update[k];
>>       }
>>     }
>>   else
>>     {
>>     for( NumberOfParametersType k = 0; k < numberOfParameters; k++ )
>>       {
>>       this->m_Parameters[k] += update[k] * factor;
>>       }
>>     }
>> 
>> which makes sense, since parameters scales are an optimizer concept that transforms know nothing about.
>> 
>> So (if I understand correctly), the code has been shuffled around quite a bit, but the behavior is still the same.
>> 
>> Is there something I'm missing?
>> 
>> joel
>> 
>> 
>> 
>> On 07/05/2013 16:40, brian avants wrote:
>>> also - to take away a bit of the "mystery" surrounding v4 optimization, let's see how the gradient descent AdvanceOneStep function works:
>>> 
>>> void
>>> GradientDescentOptimizerv4
>>> ::AdvanceOneStep()
>>> {
>>>   itkDebugMacro("AdvanceOneStep");
>>> 
>>>   /* Begin threaded gradient modification.
>>>    * Scale by gradient scales, then estimate the learning
>>>    * rate if options are set to (using the scaled gradient),
>>>    * then modify by learning rate. The m_Gradient variable
>>>    * is modified in-place. */
>>>   this->ModifyGradientByScales();
>>>   this->EstimateLearningRate();
>>>   this->ModifyGradientByLearningRate();
>>> 
>>>   try
>>>     {
>>>     /* Pass graident to transform and let it do its own updating */
>>>     this->m_Metric->UpdateTransformParameters( this->m_Gradient );
>>>     }
>>>   catch ( ExceptionObject & )
>>>     {
>>>     this->m_StopCondition = UPDATE_PARAMETERS_ERROR;
>>>     this->m_StopConditionDescription << "UpdateTransformParameters error";
>>>     this->StopOptimization();
>>> 
>>>     // Pass exception to caller
>>>     throw;
>>>     }
>>> 
>>>   this->InvokeEvent( IterationEvent() );
>>> }
>>> 
>>> 
>>> i hope this does not look too convoluted.  then the base metric class does this:
>>> 
>>> template<unsigned int TFixedDimension, unsigned int TMovingDimension, class TVirtualImage>
>>> void
>>> ObjectToObjectMetric<TFixedDimension, TMovingDimension, TVirtualImage>
>>> ::UpdateTransformParameters( const DerivativeType & derivative, ParametersValueType factor )
>>> {
>>>   /* Rely on transform::UpdateTransformParameters to verify proper
>>>    * size of derivative */
>>>   this->m_MovingTransform->UpdateTransformParameters( derivative, factor );
>>> }
>>> 
>>> 
>>> so the transform parameters should be updated in a way that is consistent with: 
>>> 
>>> newPosition[j] = currentPosition[j] + transformedGradient[j] * factor / scales[j];
>>> 
>>> factor defaults to 1 ....  anyway, as you can infer from the above discussion, even the basic gradient descent optimizer can be used to take " regular steps "  if you want.
>>> 
>>> 
>>> 
>>> brian
>>> 
>>> 
>>> 
>>> 
>>> On Tue, May 7, 2013 at 10:23 AM, brian avants <stnava at gmail.com> wrote:
>>> brad
>>> 
>>> did this issue ever go up on jira?  i do remember discussing with you at a meeting.   our solution is in the v4 optimizers.
>>> 
>>> the trivial additive parameter update doesnt work in more general cases e.g. when you need to compose parameters with parameter updates. 
>>> 
>>> to resolve this limitation, the v4 optimizers pass the update step to the transformations 
>>> 
>>> this implements the idea that  " the transforms know how to update themselves " 
>>> 
>>> there are several other differences, as nick pointed out, that reduce the need for users to experiment with scales .
>>> 
>>> for basic scenarios like that being discussed by joel, i prefer the conjugate gradient optimizer with line search. 
>>> 
>>> itkConjugateGradientLineSearchOptimizerv4.h
>>> 
>>> when combined with the scale estimators, this leads to registration algorithms with very few parameters to tune.   1 parameter if you dont consider multi-resolution.
>>> 
>>> 
>>> brian
>>> 
>>> 
>>> 
>>> 
>>> On Tue, May 7, 2013 at 9:27 AM, Nick Tustison <ntustison at gmail.com> wrote:
>>> Hi Brad,
>>> 
>>> I certainly don't disagree with Joel's findings.  It seems like a
>>> good fix which should be put up on gerrit.  There were several
>>> components that we kept in upgrading the registration framework.
>>> The optimizers weren't one of them.
>>> 
>>> Also, could you elaborate a bit more on the "convoluted" aspects
>>> of parameter advancement?  There's probably a reason for it and
>>> we could explain why.
>>> 
>>> Nick
>>> 
>>> 
>>> 
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20130507/d66f2d1c/attachment.htm>