[Insight-developers] Draft for the ITK statistical modelling module

Jisung Kim bahrahm@yahoo.com
Thu, 11 Oct 2001 07:29:11 -0700 (PDT)


--0-1666893777-1002810551=:46113
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi.

I made a document about ITK statistical modelling
module. Neither the document nor the module is
complete in any sense. However, the document explains
the architecture of the module and reflects current
implementation of the module. The attached document is
a text file with column size 79. I haven't check in
the new codes. As soon as naming and namespace issues
are settled down, I will check in the new codes.
Before you read the following "some current issues"
section, please read the attached document first.

Thank you,
Jisung.

-- Some current issues

   * Do we need a separate namespace for this
module?If so, what is good name? itks? I think having
a separate namespace from itk would be nice. The
statistical modelling module should be a set of
general statistical tools. There are possible class
name conflicts. Grouping this module's class using a
namespace might help users extinguish it from other
parts of ITK. Since users are probably already using
more than one namespaces in their programs such as itk
and std, at least I do, there is no extra typing.

   * Do the class names make sense to users who have
basic statistical knowledge? I have never seen
"feature domain" in anyliterature.    

   * I will make the Label and DensityEstimate classes
internally use itk::MapContainer. The key type of the
map structure is InstanceIdentifier type and the value
type is label or density estimate values as map value
type. The problem is that the InstanceIdentifier can
be either unsigned long (PointSetTable) or itk::Index
(other FeatureDomain subclasses). Since MapContainer
requires less than operator (<) for the key type, I
need something like itk::Index with less than
operator. I am considering creating a subclass of
itk::Index. The name of the class can be
"itks"::ComparableIndex. Is this approach a good  
idea? Or any other options? 


=====
Jisung Kim
bahrahm@yahoo.com
106 Mason Farm Rd.
129 Radiology Research Lab., CB# 7515
Univ. of North Carolina at Chapel Hill
Chapel Hill, NC 27599-7515

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com
--0-1666893777-1002810551=:46113
Content-Type: text/plain; name="ITK_statistical_modeling.txt"
Content-Description: ITK_statistical_modeling.txt
Content-Disposition: inline; filename="ITK_statistical_modeling.txt"

ITK Statistical Modelling Module

1. INTRODUCTION

    The main purpose of the module is to provide ways to store data
    and find a "good" or "acceptable" statistical model that fits well
    the data. In image analysis, the data would be "image" itself or
    results of image processing. Since statistical modelling is an
    iterative process that involves different stages such as selection
    of features, selection of models, model fitting, and validation of
    the selected model, this module would eventually provide adequate
    components for each stage.

    I think this  small set of basic components would be used for
    implementing statistical processes and evaluating image processing
    algorithms.  

2. DESIGN CONSIDERATIONS

    * Reducing learning efforts
        - Familiar concepts and class names : Since this module is not
        intended to present novel concepts and functionalities, a user
        who has basic knowledge about statistical modelling should be
        able to learn how to use this module quickly without learning
        new concepts or procedures. 

    * Inter-operability with existing ITK classes
        - ITK native data objects : there should be easy ways to
        plug-in or import data from Image or PointSet objects.
        - Pipeline mechanism may be needed. 
        
3. CURRENT DEVELOPMENT FOCUSES

    * Data container structures for feature vectors, class labels, and
    density estimates   

    * Characteristic functions of distributional models and their
    parameter estimation methods. The choice for the characteristic
    function is probability density function (PDF).

    * Utilities for importing event data from ITK's native data classes
    such as Image and PointSet to event data classes (FeatureDomain
    subclasses). 

4. DATA CONTAINER CLASSES

    * FeatureDomain Classes

    - Overview
        A feature domain means a part of feature space that are known
        to us by the data. A feature vector is a type of ITK Point
        which has two template arguments: the one is type of elements
        in the vector, and the other is the number of dimensions
        (Point< TFeatureElement, VFeatureDimension >). Although each
        subclass of this class might have different ways of organising
        data elements, it is a container class which has multiple
        elements of a same type. The commonalities of the subclasses
        are: 1) each element in a container has an identifier, a
        feature vector, and a frequency value, 2) a container has an
        iterator that has GetFeature and GetFrequency methods.
        

    - Histogram class (subclass)
        Users can think Histogram class as a multidimensional
        array. Each element (bin) of this class has two additional values -
        min and max vectors. The min and max vectors define a range for a bin
        within a feature space. The feature value for a bin will be
        the centre point of the bin. The Histogram class uses ITK Index
        objects as identifier of an element of a Histogram object.

        There are two subclasses of this class - DenseHistogram and
        SparseHistogram. Their only difference is the internal data
        storage structure, ITK Image for DenseHistogram and std map
        for SparseHistogram.

    - Table class (subclass)
        This class has two subclasses - PointSetTable and
        ImageTableAdaptor. 

        Users can think it as a table with two columns, one for
        identifier (presumably unsigned long - default for ITK
        PointSet) and the other for feature vector. This class is
        intended to be the most general container among the subclasses
        of the FeatureDomain class. 

        ImageTableAdaptor can be think of as a multidimensional
        array. Its identifier is ITK index and pixel value is a
        feature vector. The purpose of this adaptor class is to
        provide a direct way to plug-in ITK Image object.

        The two subclasses have fixed value ranges for frequency -
        zero and one. If a feature is present, frequency is one,
        otherwise, zero.
           

    - Inheritance tree

        FeatureDomain (abstract)
          |
          |- Histogram (abstract)
          |    |
          |    |- DenseHistogram
          |    |- SparseHistogram
          |
          |- Table (abstract)
               |
               |- PointSetTable
               |- ImageTableAdaptor
  
    * Label Class
    
    - Overview
        This class will be used to store a class label for each
        instance in a FeatureDomain class. To access the label for
        each instance, users uses the InstanceIdentifier that is in
        sync with the InstanceIdentifier of the feature domain class.

    * DensityEstimate Class
    
    - Overview
        This class will be used to store density estimate for each
        instance in a FeatureDomain class. A DensityEstimate class is
        capable of holding density estimates for a single class
        label. Accessing the density estimate value for each instance
        is done in the same way that users use in the Label class

    * Sample Class
    
    - Overview
        This class can be think of as a work space for users and
        higher level statistical functions. It includes proper type
        definitions of DensityEstimate class, Label class, and
        InstanceIdentifier for the FeatureDomain type given by
        template arguments.  Therefore, users
        don't have to worry about such type definitions of such
        objects. Since the DensityEstimate class can store density
        estimate values for a single class label, to support multiple
        class labels with density estimates for each class labels,
        this class provides an internal map structure that has class
        labels as keys and pointers to each label's density estimate
        object as values.  

5. CHARACTERISTIC FUNCTIONS AND THEIR ESTIMATORS
 
    * DensityFunction classes

    - Overview
        A subclass of this class is a PDF for a specific
        distribution. The input for the class is a feature, and the
        output is density estimate value for the feature. Each
        parameter of the PDF has a Get/Set"parameter" Method for
        estimation.

    - Inheritance tree

        DensityFunctionEstimator (abstract)
        |
        |-  ChiSquareDensityFunction
        |-  GaussianDensityFunction
        |-    :
        |-    :
        |-  GaussianMixtureModelDensityFunction
             

    * DensityFunctionEstimator classes

    - Overview
        A subclass of this class perform parameter estimation for a
        specific PDF. Parameter estimation can be done by direct
        computations (in the case which only mean, standard deviation,
        or/and covariance marix is required) or iterative methods such
        as maximum likelihood expectation maximisation (MLEM). The
        GassianMixtureModelDensityFunctionEstimator is an example of
        MLEM method. 

        After the estimator produces a set of parameter estimates
        using data from a Sample object, parameters from a
        DensityFunction object, and estimation algorithms, it will
        generate a DensityEstimate object filled with density estimate
        values for each instance in a FeatureDomain object.

    - Inheritance tree

        DensityFunctionEstimator (abstract)
        |
        |-  ChiSquareDensityFunctionEstimator
        |-  GaussianDensityFunctionEstimator
        |-    :
        |-    :
        |-  GaussianMixtureModelDensityFunctionEstimator

    - Process model diagram
    
    Sample ----------       
                    |-->  DensityFunctionEstimator --> DensityEstimate
    DensityFunction--                  |
            ^                          |
            |                          |
            ---------------------------|
            (parameters estimates of a specific density function) 

        
6. UTILITY CLASSES (not complete)

   * ImageToPointSetFilter< TInputImage, TOutputPointSet > 
       moves the index of a pixel into the point of the PointSet and a
       pixel value into the point data of the PointSet 

   * ImageToHistogramFilter< TImage, THistogram > 
       generates a Histogram object from an itk::Image object

   * NearestNeighborFinder< TFeatureDomain >
       brute force searching for K nearest neighbor. Inputs are the
       number of neighbors (K), a FeatureDomain instance, and a query
       point. Output is a vector of neighbour points (feature type).
 
   * FastRandomUnitNormalVariateGenerator 
       A fast generator of pseudo-random variates from the unit Normal
       distribution. It keeps a pool of about 1000 variates, and
       generates new ones by picking 4 from the pool, rotating the
       4-vector with these as its components, and replacing the old
       variates with the components of the rotated vector. 

7. FEATURES WHICH WILL APPEAR SOON 

   * more distributional models
       student-t, F, Gamma, Chi-Square, generalised Lambda
       distribution
   
   * classification methods
       Parzen windowing, K-means clustering

   * Goodness-of-fit parameter estimation for Gaussian Mixture Model


8. EXAMPLE

 //////////////////////////////////////////////////////////////////
 // following four segements of code are examples of instantiating
 // four different FeatureDomain subclasses
 
 // creates a DenseHistogram instance 
 // feature vector type is itk::Point< float, 2 > 
 // identifier type is itk::Index< 2 >
 typedef DenseHistogram< float, 2 > myDenseHistogramType ;
 myDenseHistogramType::Pointer dh = myDenseHistogramType::New() ;
 
 // creates a SparseHistogram instance 
 // feature vector type is itk::Point< float, 3 > 
 // identifier type is itk::Index< 3 >
 typedef SparseHistogram< float, 3 > mySparseHistogramType ;
 mySparseHistogramType::Pointer sh = mySparseHistogramType::New() ;
 
 // creates an ImageTableAdaptor instance 
 // feature vector type is itk::Point< float, 3 > 
 // identifier type is itk::Index< 2 >
 typedef itk::Point< float, 3 > myPixelType ;
 typedef itk::Image< myPixelType, 2> myImageType ;
 typedef ImageTableAdaptor< myImageType > myAdaptorType ;
 myAdaptorType::Pointer adaptor = myAdaptorType::New() ;  

 // creates a PointSetTable instance 
 // feature vector type is itk::Point< float, 2 > 
 // identifier type is unsigned long
 typedef PointSetTable< float, 2 > myPointSetTableType ;
 myPointSetTableType::Pointer pTable = myPointSetTableType::New() ;  
 
 ////////////////////////////////////////////////////////////////
 // following codes are more realistic example of 
 // generating density estimate using Gaussian PDF

 // creates an Sample instance
 // feature vector type is itk::Point< float, 2 >
 // identifier type is unsigned long
 // Label object type is Label< unsigned long, int >
 // DensityEstimate object type is 
 // DensityEstimate< unsigned long, double > 
 typedef Sample< myPointSetTableType, int, double > mySampleType ;
 mySampleType::Pointer sample = mySampleType::New() ;

 // plug-in a PointSetTable instance
 sample->SetFeatureDomain(pTable) ;

 // mySampleType::LabelType is equal to
 // Label< myPointSetTableType::InstanceIdentifier, int >
 mySampleType::LabelType::Pointer label = sample->GetLabel() ;

 // mySampleType::DensityEstimateType is equal to
 // DensityEstimate< myAdaptorType::InstanceIdentifier, double >
 // the argument of GetDensityEstimate method is label
 mySampleType::DensityEstimateType::Pointer de =
                                       sample->GetDensityEstimate(0) ;

 // creates a GaussianDensityFunction instance
 // feature vector type is itk::Point< float, 2 >
 typedef GaussianDensityFunction< mySampleType::FeatureElementType,
                                  mySampleType::FeatureDimension > 
                                  myGDF ;
 myGDF::Pointer gdf = myGDF::New() ;            
 
 // creates a GaussianDensityFunctionEstimator instance
 typedef GaussianDensityFunctionEstimator< myGDF, mySample >    
 myGDFEstimatorType ;
 myGDFEstimatorType::Pointer estimator = myGDFEstimatorType::New() ;
 
 estimator->SetSample(sample) ;
 esitmator->SetDensityFunction(myGDF) ;
 // run estimation procedure
 estimator->Execute() ;
 // store the resulting density estimate values for class label 0
 // in the Sample instance
 sample->SetDensityEstimate(0, estimator->GetDensityEstimate()) ;

  
9. DEFINITIONS (not complete)

   Statistical Model 
       distributional models such as Gaussian, Uniform, and Gaussian
       Mixture Model.  
        
   Features 
      Properties that observers are interested in from an instance of
      a set of observations for the purpose of statistical
      analysis. For example, a image pixel may have its intensity and
      gradient magnitude as its features.


   Feature Space 
      Space where the features lie. Its dimensionality is the number
      of the features that an instance has. The previous example's
      features (feature vector) exist in two-dimensional feature space
      that one dimension is intensity and the other is gradient
      magnitude.

       
   Feature Domain
      Part of a feature space that are known to us by the data. In
      this module, Feature Domain is interchangeable with data. 

      
                

--0-1666893777-1002810551=:46113--