Proposals:Statistics Framework Runtime Vector Size: Difference between revisions

From KitwarePublic
Jump to navigationJump to search
No edit summary
 
(9 intermediate revisions by one other user not shown)
Line 15: Line 15:
[2] Neural Computation - Nonlinear component analysis as a Kernel Eigenvalue problem, vol 10, 1998
[2] Neural Computation - Nonlinear component analysis as a Kernel Eigenvalue problem, vol 10, 1998


= Proposed Implementation Plan =


This requires removing MeasurementVectorSize as a static method and making it an iVar.


FixedArrays, itk::Matrix and vnl_fixed that are templated over MeasurementVectorSize will have to be replaced by itk::Array, vnl_matrix and vnl_vector, where the size may be chosen at run time.


The SymmetricEigenAnalysis class must be used for eigen analysis since it is not templated over the dimension.


Bounds checking will have to be performed manually on all methods that use these FixedArrays.


API hiccups are unavoidable.. Warning macros will have to be provided. An open question is how to appropriately provide deprecated warning macros. I would suggest we place them in the constructor of all affected classes.


Very few statistics classes are used outside the statistics group. One of them is itk::Histogram.. Affects Histogram metrics. Typedefs like IndexType and SizeType will have to be replaced.




= Testing =
= API changes for the user =


Currently all tests pass.
The statistics framework can be broadly classified into
 
- classes that derive from Sample (List samples, subsamples, SampleAdaptors, Membership samples, Histogram, VariableDimensionHistogram etc)
See <tt>NAMICSandbox/RefactoringStatisticsClasses/</tt>. The tests can be run as usual after building with CMake. See <tt>NAMICSandbox/RefactoringStatisticsClasses/README </tt>.
  - Algorithms ( that derive from SampleAlgorithmBase )
 
  - DistanceMetrics ( derive from DistanceMetric )
However not all classes in the Statistics framework have tests and I am sure we will find bugs in future :(
  - DensityFunctions
 
  - Others
= Proposed Transition Plan =
 
= Additions =
 
1. <b style="color:red">VariableDimensionHistogram</b>
 
Class to handle variable length histograms added in NAMIC sandbox
<tt>NAMICSandBox/RefactoringITKStatisticsClasses/src/itkVariableDimensionHistogram.h, .txx</tt>
<tt>NAMICSandBox/RefactoringITKStatisticsClasses/Tests/itkVariableDimensionHistogramTest.cxx</tt>
 
The class is similar to itk::Histogram with modifications to allow the dimension of the
histogram (which is dependent on the size of each measurement vector) to be set at run-time.
 
 
2. <b style="color:red">VariableSizeMatrix</b>
<tt>NAMICSandBox/RefactoringITKStatisticsClasses/src/itkVariableSizeMatrix</tt>
Similar to itk::Matrix with a similar API
 
 
3. <b style="color:red">MeasurementVectorTraits</b>
To have consistent API, we've created traits for measurement vectors. The traits are templated over the MeasurementVectorType (which was earlier constrained to be of type FixedArray or its subclasses.). For run-time size capability, we need to support itk::Array and possibly other containers like vnl_vector. To have a consistent way of dealing with GetSize(), SetSize calls etc, traits are used.
This class is templated over the MeasurementVectorType. [From the doxygen headers for the class]
* For instance, the developer can create a measurement vector as
*
* typename SampleType:: MeasurementVectorType m_MeasurementVector
* = MeasurementVectorTraits< typename
*    SampleType::MeasurementVectorType >::SetSize( s ) );
  *
* This will create a measurement vector of length s if it is a FixedArray or
* a vnl_vector_fixed, itkVector etc.. If not it returns an array of length 0
* for the appropriate type. Other useful typedefs are defined to get the
* length of the vector, for the MeanType, RealType for compuatations etc
*
* To get the length of a measurement vector, the user would
*
* MeasurementVectorTraits< MeasurementVectorType >::GetSize( &mv )
  *
* This calls the appropriate functions for the MeasurementVectorType to return
* the size of the measurement vector mv.
*
* MeasurementVectorTraits< MeasurementVectorType >::GetSize()
  *
  * This returns the length of MeasurementVectorType, which will be the true
* length of a FixedArray, Vector, vnl_vector_fixed, Point etc and 0 otherwise
 
NAMICSandBox/RefactoringITKStatisticsClasses/src/itkMeasurementVectorTraits.h
 
= API changes / additions =


1. <tt><b style="color:red">itk::Sample</b></tt>  
1. <tt><b style="color:red">itk::Sample</b></tt>  
Line 99: Line 43:
   sample->PushBack( m );
   sample->PushBack( m );


The earlier method will still be valid, along with all the macros. For instance the following would also work
An exception will usually be thrown by any class that tries to process a sample whose MeasurementVector length has not been set. The StaticConst macro to access it is no longer available. Use the Get/Set methods.
 
  const unsigned int length = 3;
  typedef itk::Sample< FixedArray < double, length > > SampleType;
  SampleType::Pointer sample = SampleType::New();
  SampleType::MeasurementVectorType m;
  m.Fill( 4.57 );
  sample->PushBack( m );
 
The use of the MeasurementVectorSize macro to get the length of the measurement vector is deprecated. For instance,
 
  typedef itk::Sample< FixedArray < double, 3 > > SampleTypeA;
  std::cout << SampleTypeA::MeasurementVectorSize << std::endl;
  typedef itk::Sample< Array < double > > SampleTypeB;
  std::cout << SampleTypeB::MeasurementVectorSize << std::endl;
 
will produce 3 in the first case and 0 in the second. The appropriate/consistent way to do this is to get the size using the get macros <tt>sample->GetMeasurementVectorSize()</tt>which will yield 3 in both cases.
 
All classes that derive from sample or do filtering operations on sample (which is most classes) query the sample for the length of the measurement vector.




2. <b style="color:red">DistanceMetrics</b>
2. <b style="color:red">DistanceMetrics</b>


This class also contains methods to set/Get measurement vector length. As before this only needs to be set in cases here the measurement vector is of variable size. For instance the following code fragments are equivalent.
This class also contains methods to set/Get measurement vector length. Typedefs for MeanType, OriginType, etc have been changed from FixedLength to VariableLength containers. For instance...  


   typedef itk::Vector< float, 2 > MeasurementVectorType;
   typedef itk::Vector< float, 2 > MeasurementVectorType;
   typedef itk::Statistics::EuclideanDistance< MeasurementVectorType > DistanceMetricType;
   typedef itk::Statistics::EuclideanDistance< MeasurementVectorType > DistanceMetricType;
   DistanceMetricType::Pointer distanceMetric = DistanceMetricType::New();
   DistanceMetricType::Pointer distanceMetric = DistanceMetricType::New();
   DistanceMetricType::OriginType originPoint;
   DistanceMetricType::OriginType originPoint( 2 );  // not DistanceMetricType::OriginType originPoint;
   MeasurementVectorType queryPointA;
   MeasurementVectorType queryPointA;
   MeasurementVectorType queryPointB;
   MeasurementVectorType queryPointB;
  originPoint[0] = 0;
  originPoint[1] = 0;
  queryPointA[0] = 2;
  queryPointA[1] = 2;
  queryPointB[0] = 3;
  queryPointB[1] = 3;
  distanceMetric->SetOrigin( originPoint );
  std::cout << "Euclidean distance between the two query points (A and B) = "
            << distanceMetric->Evaluate( queryPointA, queryPointB ) << std::endl;
  typedef itk::Array< float > MeasurementVectorType;
  typedef itk::Statistics::EuclideanDistance< MeasurementVectorType > DistanceMetricType;
  DistanceMetricType::Pointer distanceMetric = DistanceMetricType::New();
  DistanceMetricType::OriginType originPoint( 2 );
  MeasurementVectorType queryPointA( 2 );
  MeasurementVectorType queryPointB( 2 );
   originPoint[0] = 0;
   originPoint[0] = 0;
   originPoint[1] = 0;
   originPoint[1] = 0;
Line 160: Line 69:
3. <b style="color:red">DensityFunctions</b>
3. <b style="color:red">DensityFunctions</b>


The density functions also contain the MeasurementVector length as an ivar. Again, if you are not using a compile-time fixed length container as a measurement vector, you will need to add the following line
The density functions also contain the MeasurementVector length as an ivar.  


   densityfunction->SetMeasurementVectorSize( length );
   densityfunction->SetMeasurementVectorSize( length );
Line 167: Line 76:
4. <b style="color:red">SampleAlgorithms</b>
4. <b style="color:red">SampleAlgorithms</b>


Several statistics algorithms derive from <tt>SampleAlgorithmBase</tt>. They generally take an <tt>itk::Sample</tt> as an input and produce some statistically relevant information or another <tt>sample</tt>. These classes also contain the MeasurementVectorLength as an iVar and contain <tt>public:</tt> Set/Get macros to change the MeasurementVectorSize. At first thought, this might not seem necessary, since these methods should query the sample passed as input for the MeasurementVectorLength. This was done to introduce consistency checks when the measurement vector is a variable length container to ensure for instance that appropriate parameters are passed to the algorithm. For instance, consider the following two code-fragments.
Several statistics algorithms derive from <tt>SampleAlgorithmBase</tt>. They generally take an <tt>itk::Sample</tt> as an input and produce some statistically relevant information or another <tt>sample</tt>. These classes also contain the MeasurementVectorLength as an iVar and contain <tt>public:</tt> Set/Get macros to change the MeasurementVectorSize. They query the sample passed as input for the MeasurementVectorLength. They also contain consistency checks to ensure for instance that appropriate parameters are passed to the algorithm have conistent lengths.


   typedef itk::Sample< Array< float > > SampleType;
   typedef itk::Sample< Array< float > > SampleType;
Line 174: Line 83:
   CalculatorType::Pointer calculator = CalculatorType::New();
   CalculatorType::Pointer calculator = CalculatorType::New();
   calculator->SetMean( mean );
   calculator->SetMean( mean );
   calculator->SetInputSample( sample );
   calculator->SetInputSample( sample ); // queried from sample.. length must be 3 or an exception
   calculator->SetWeightFunction(weightFunction.GetPointer()) ;
   calculator->SetWeightFunction(weightFunction.GetPointer()) ;
  calculator->Update() ;
  std::cout << " variance: " << calculator->GetOutput()->GetVnlMatrix().get(0,0) << std::endl ;
Here it is the responsibility of the <tt>WeightedCovarianceCalculator</tt> to check that the length of the <tt>mean</tt> and the measurement vectors in <tt>sample</tt> have the same length and throw an exception on the line <tt>calculator->SetInputSample( sample );</tt> if they are not.
5. <b style="color:red">itk::Histogram - thoughts</b>
The current working setup has two histogram classes <tt>itk::VariableDimensionHistogram</tt> and the old <tt>itk::Histogram</tt>. This is murky. It necessitates having two sets of histogram user classes such as <tt>itk::ListSampleToHistogramGenerator</tt> and <tt>ListSampleToVariableDimensionHistogramGenerator</tt> etc. I would like to combine these classes into one class. This is possible via partial specialization.
For instance


  template < class TMeasurement = float, unsigned int VMeasurementVectorSize = 1,
A few classes return <tt>Array</tt> and <tt>VariableSizeMatrix</tt>, for instance the co-variance matrix, but their API is largely the same as itk::Matrix, unless you try to access static const ivars in the returned matrix... surprise!
            class TFrequencyContainer = DenseFrequencyContainer< float > >  
  class ITK_EXPORT Histogram
    : public Sample < FixedArray< TMeasurement, VMeasurementVectorSize > >
    {
    //....
    //....
    }


  template < class TMeasurement, class TFrequencyContainer >
  class ITK_EXPORT Histogram< TMeasurement, 0, TFrequencyContainer >
    : public Sample < Array< TMeasurement > >
    {
    //....
    //....
    }


This is also very convenient because the static const macros for MeasurementVectorSize return 0 when the size is dynamic.
5. <b style="color:red">itk::Histogram </b>


For instance, the following definition takes care of selecting the appropriate histogram.
The existing histogram class is untouched. The <tt>itk::VariableDimensionHistogram</tt> handles histograms where the number of histogram axes is not known a-priori. In future the Histogram class may be deprecated/removed and the classes that generate histograms will generate VariableDimensionHistograms


  template< class TListSample,
6. <b style="color:red">Others </b>
            class THistogramMeasurement, 
            class TFrequencyContainer = DenseFrequencyContainer< float > >
  class ITK_EXPORT ListSampleToHistogramGenerator : public Object
  {
  public:
    /** the number of components in a measurement vector */
    itkStaticConstMacro(MeasurementVectorSize, unsigned int, MeasurementVectorTraits<
      typename TListSample::MeasurementVectorType >::MeasurementVectorLength);
    typedef Histogram< THistogramMeasurement,
                      itkGetStaticConstMacro(MeasurementVectorSize),
                      TFrequencyContainer > HistogramType;
  }


This is possible because traits have been set up to return 0 when the MV length is not known at compile time.
A few other classes like GoodnessOfFit, KdTrees etc contain the measurement vector length as an ivar. It is the users responsiblity to set these as appropriate.


<b style="color:blue">Why this has not been done yet?</b> Although it can be confusing for new users of ITK to see two histogram classes, this hasn't been done because partial template specialization is not supported by VS6. We could get over that by two template specializations, one for <tt>DenseFrequencyContainer</tt> and the other for <tt>SparseFrequencyContainer</tt> and forcing the <tt>MeasurementType</tt> to be <tt>float</tt>. I am working on this now.
{{ITK/Template/Footer}}

Latest revision as of 20:40, 20 December 2005

Refactoring the Statistics Framework to have Runtime Length

Currently, the Statistics Framework requires the MeasurementVector to have a length defined at compile time.

Rationale for having compile time length

The statistics classes in ITK have MeasurementVectorSize (length of each measurement vector) as a static const value. This has until now been sufficient since typical statistics operations involve sampling an image where the number of measurement vectors is a variable, but the measurement vector size is usually fixed and depends on the dimension of the parametric space.

Rationale for having run time length

For algorithms such as Normalized cuts [1] and other Kernel PCA feature space projection techniques [2], it may be necessary to keep the dimensionality of the feature space as a variable. This requires removing MeasurementVectorSize as a static method and making it an iVar.

[1] PAMI - Vol26, No2, Spectral Grouping using the Nystrom method , Feb 2004

[2] Neural Computation - Nonlinear component analysis as a Kernel Eigenvalue problem, vol 10, 1998





API changes for the user

The statistics framework can be broadly classified into

- classes that derive from Sample (List samples, subsamples, SampleAdaptors, Membership samples, Histogram, VariableDimensionHistogram etc)
- Algorithms ( that derive from SampleAlgorithmBase )
- DistanceMetrics ( derive from DistanceMetric )
- DensityFunctions 
- Others

1. itk::Sample

This class now supports a method to set/get the MeasurementVector length. This must be set explicitly in cases where measurement vectors are variable size containers (itk::Array etc) as below.

 typedef itk::Sample< Array < double > > SampleType;
 SampleType::Pointer sample = SampleType::New();
 sample->SetMeasurementVectorSize( length );
 SampleType::MeasurementVectorType m(length);
 m.Fill( 4.57 );
 sample->PushBack( m );

An exception will usually be thrown by any class that tries to process a sample whose MeasurementVector length has not been set. The StaticConst macro to access it is no longer available. Use the Get/Set methods.


2. DistanceMetrics

This class also contains methods to set/Get measurement vector length. Typedefs for MeanType, OriginType, etc have been changed from FixedLength to VariableLength containers. For instance...

 typedef itk::Vector< float, 2 > MeasurementVectorType;
 typedef itk::Statistics::EuclideanDistance< MeasurementVectorType > DistanceMetricType;
 DistanceMetricType::Pointer distanceMetric = DistanceMetricType::New();
 DistanceMetricType::OriginType originPoint( 2 );  // not DistanceMetricType::OriginType originPoint;
 MeasurementVectorType queryPointA;
 MeasurementVectorType queryPointB;
 originPoint[0] = 0;
 originPoint[1] = 0;
 queryPointA[0] = 2;
 queryPointA[1] = 2;
 queryPointB[0] = 3;
 queryPointB[1] = 3;
 distanceMetric->SetOrigin( originPoint );
 std::cout << "Euclidean distance between the two query points (A and B) = " 
           << distanceMetric->Evaluate( queryPointA, queryPointB ) << std::endl;


3. DensityFunctions

The density functions also contain the MeasurementVector length as an ivar.

 densityfunction->SetMeasurementVectorSize( length );


4. SampleAlgorithms

Several statistics algorithms derive from SampleAlgorithmBase. They generally take an itk::Sample as an input and produce some statistically relevant information or another sample. These classes also contain the MeasurementVectorLength as an iVar and contain public: Set/Get macros to change the MeasurementVectorSize. They query the sample passed as input for the MeasurementVectorLength. They also contain consistency checks to ensure for instance that appropriate parameters are passed to the algorithm have conistent lengths.

 typedef itk::Sample< Array< float > > SampleType;
 typedef itk::Statistics::WeightedCovarianceCalculator< SampleType > CalculatorType;
 CalculatorType::MeanType mean( 3 );
 CalculatorType::Pointer calculator = CalculatorType::New();
 calculator->SetMean( mean );
 calculator->SetInputSample( sample ); // queried from sample.. length must be 3 or an exception
 calculator->SetWeightFunction(weightFunction.GetPointer()) ;

A few classes return Array and VariableSizeMatrix, for instance the co-variance matrix, but their API is largely the same as itk::Matrix, unless you try to access static const ivars in the returned matrix... surprise!


5. itk::Histogram

The existing histogram class is untouched. The itk::VariableDimensionHistogram handles histograms where the number of histogram axes is not known a-priori. In future the Histogram class may be deprecated/removed and the classes that generate histograms will generate VariableDimensionHistograms

6. Others

A few other classes like GoodnessOfFit, KdTrees etc contain the measurement vector length as an ivar. It is the users responsiblity to set these as appropriate.



ITK: [Welcome | Site Map]