[Insight-developers] Draft for the ITK statistical modelling module
Jisung Kim
bahrahm@yahoo.com
Thu, 11 Oct 2001 07:29:11 -0700 (PDT)
--0-1666893777-1002810551=:46113
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hi.
I made a document about ITK statistical modelling
module. Neither the document nor the module is
complete in any sense. However, the document explains
the architecture of the module and reflects current
implementation of the module. The attached document is
a text file with column size 79. I haven't check in
the new codes. As soon as naming and namespace issues
are settled down, I will check in the new codes.
Before you read the following "some current issues"
section, please read the attached document first.
Thank you,
Jisung.
-- Some current issues
* Do we need a separate namespace for this
module?If so, what is good name? itks? I think having
a separate namespace from itk would be nice. The
statistical modelling module should be a set of
general statistical tools. There are possible class
name conflicts. Grouping this module's class using a
namespace might help users extinguish it from other
parts of ITK. Since users are probably already using
more than one namespaces in their programs such as itk
and std, at least I do, there is no extra typing.
* Do the class names make sense to users who have
basic statistical knowledge? I have never seen
"feature domain" in anyliterature.
* I will make the Label and DensityEstimate classes
internally use itk::MapContainer. The key type of the
map structure is InstanceIdentifier type and the value
type is label or density estimate values as map value
type. The problem is that the InstanceIdentifier can
be either unsigned long (PointSetTable) or itk::Index
(other FeatureDomain subclasses). Since MapContainer
requires less than operator (<) for the key type, I
need something like itk::Index with less than
operator. I am considering creating a subclass of
itk::Index. The name of the class can be
"itks"::ComparableIndex. Is this approach a good
idea? Or any other options?
=====
Jisung Kim
bahrahm@yahoo.com
106 Mason Farm Rd.
129 Radiology Research Lab., CB# 7515
Univ. of North Carolina at Chapel Hill
Chapel Hill, NC 27599-7515
__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com
--0-1666893777-1002810551=:46113
Content-Type: text/plain; name="ITK_statistical_modeling.txt"
Content-Description: ITK_statistical_modeling.txt
Content-Disposition: inline; filename="ITK_statistical_modeling.txt"
ITK Statistical Modelling Module
1. INTRODUCTION
The main purpose of the module is to provide ways to store data
and find a "good" or "acceptable" statistical model that fits well
the data. In image analysis, the data would be "image" itself or
results of image processing. Since statistical modelling is an
iterative process that involves different stages such as selection
of features, selection of models, model fitting, and validation of
the selected model, this module would eventually provide adequate
components for each stage.
I think this small set of basic components would be used for
implementing statistical processes and evaluating image processing
algorithms.
2. DESIGN CONSIDERATIONS
* Reducing learning efforts
- Familiar concepts and class names : Since this module is not
intended to present novel concepts and functionalities, a user
who has basic knowledge about statistical modelling should be
able to learn how to use this module quickly without learning
new concepts or procedures.
* Inter-operability with existing ITK classes
- ITK native data objects : there should be easy ways to
plug-in or import data from Image or PointSet objects.
- Pipeline mechanism may be needed.
3. CURRENT DEVELOPMENT FOCUSES
* Data container structures for feature vectors, class labels, and
density estimates
* Characteristic functions of distributional models and their
parameter estimation methods. The choice for the characteristic
function is probability density function (PDF).
* Utilities for importing event data from ITK's native data classes
such as Image and PointSet to event data classes (FeatureDomain
subclasses).
4. DATA CONTAINER CLASSES
* FeatureDomain Classes
- Overview
A feature domain means a part of feature space that are known
to us by the data. A feature vector is a type of ITK Point
which has two template arguments: the one is type of elements
in the vector, and the other is the number of dimensions
(Point< TFeatureElement, VFeatureDimension >). Although each
subclass of this class might have different ways of organising
data elements, it is a container class which has multiple
elements of a same type. The commonalities of the subclasses
are: 1) each element in a container has an identifier, a
feature vector, and a frequency value, 2) a container has an
iterator that has GetFeature and GetFrequency methods.
- Histogram class (subclass)
Users can think Histogram class as a multidimensional
array. Each element (bin) of this class has two additional values -
min and max vectors. The min and max vectors define a range for a bin
within a feature space. The feature value for a bin will be
the centre point of the bin. The Histogram class uses ITK Index
objects as identifier of an element of a Histogram object.
There are two subclasses of this class - DenseHistogram and
SparseHistogram. Their only difference is the internal data
storage structure, ITK Image for DenseHistogram and std map
for SparseHistogram.
- Table class (subclass)
This class has two subclasses - PointSetTable and
ImageTableAdaptor.
Users can think it as a table with two columns, one for
identifier (presumably unsigned long - default for ITK
PointSet) and the other for feature vector. This class is
intended to be the most general container among the subclasses
of the FeatureDomain class.
ImageTableAdaptor can be think of as a multidimensional
array. Its identifier is ITK index and pixel value is a
feature vector. The purpose of this adaptor class is to
provide a direct way to plug-in ITK Image object.
The two subclasses have fixed value ranges for frequency -
zero and one. If a feature is present, frequency is one,
otherwise, zero.
- Inheritance tree
FeatureDomain (abstract)
|
|- Histogram (abstract)
| |
| |- DenseHistogram
| |- SparseHistogram
|
|- Table (abstract)
|
|- PointSetTable
|- ImageTableAdaptor
* Label Class
- Overview
This class will be used to store a class label for each
instance in a FeatureDomain class. To access the label for
each instance, users uses the InstanceIdentifier that is in
sync with the InstanceIdentifier of the feature domain class.
* DensityEstimate Class
- Overview
This class will be used to store density estimate for each
instance in a FeatureDomain class. A DensityEstimate class is
capable of holding density estimates for a single class
label. Accessing the density estimate value for each instance
is done in the same way that users use in the Label class
* Sample Class
- Overview
This class can be think of as a work space for users and
higher level statistical functions. It includes proper type
definitions of DensityEstimate class, Label class, and
InstanceIdentifier for the FeatureDomain type given by
template arguments. Therefore, users
don't have to worry about such type definitions of such
objects. Since the DensityEstimate class can store density
estimate values for a single class label, to support multiple
class labels with density estimates for each class labels,
this class provides an internal map structure that has class
labels as keys and pointers to each label's density estimate
object as values.
5. CHARACTERISTIC FUNCTIONS AND THEIR ESTIMATORS
* DensityFunction classes
- Overview
A subclass of this class is a PDF for a specific
distribution. The input for the class is a feature, and the
output is density estimate value for the feature. Each
parameter of the PDF has a Get/Set"parameter" Method for
estimation.
- Inheritance tree
DensityFunctionEstimator (abstract)
|
|- ChiSquareDensityFunction
|- GaussianDensityFunction
|- :
|- :
|- GaussianMixtureModelDensityFunction
* DensityFunctionEstimator classes
- Overview
A subclass of this class perform parameter estimation for a
specific PDF. Parameter estimation can be done by direct
computations (in the case which only mean, standard deviation,
or/and covariance marix is required) or iterative methods such
as maximum likelihood expectation maximisation (MLEM). The
GassianMixtureModelDensityFunctionEstimator is an example of
MLEM method.
After the estimator produces a set of parameter estimates
using data from a Sample object, parameters from a
DensityFunction object, and estimation algorithms, it will
generate a DensityEstimate object filled with density estimate
values for each instance in a FeatureDomain object.
- Inheritance tree
DensityFunctionEstimator (abstract)
|
|- ChiSquareDensityFunctionEstimator
|- GaussianDensityFunctionEstimator
|- :
|- :
|- GaussianMixtureModelDensityFunctionEstimator
- Process model diagram
Sample ----------
|--> DensityFunctionEstimator --> DensityEstimate
DensityFunction-- |
^ |
| |
---------------------------|
(parameters estimates of a specific density function)
6. UTILITY CLASSES (not complete)
* ImageToPointSetFilter< TInputImage, TOutputPointSet >
moves the index of a pixel into the point of the PointSet and a
pixel value into the point data of the PointSet
* ImageToHistogramFilter< TImage, THistogram >
generates a Histogram object from an itk::Image object
* NearestNeighborFinder< TFeatureDomain >
brute force searching for K nearest neighbor. Inputs are the
number of neighbors (K), a FeatureDomain instance, and a query
point. Output is a vector of neighbour points (feature type).
* FastRandomUnitNormalVariateGenerator
A fast generator of pseudo-random variates from the unit Normal
distribution. It keeps a pool of about 1000 variates, and
generates new ones by picking 4 from the pool, rotating the
4-vector with these as its components, and replacing the old
variates with the components of the rotated vector.
7. FEATURES WHICH WILL APPEAR SOON
* more distributional models
student-t, F, Gamma, Chi-Square, generalised Lambda
distribution
* classification methods
Parzen windowing, K-means clustering
* Goodness-of-fit parameter estimation for Gaussian Mixture Model
8. EXAMPLE
//////////////////////////////////////////////////////////////////
// following four segements of code are examples of instantiating
// four different FeatureDomain subclasses
// creates a DenseHistogram instance
// feature vector type is itk::Point< float, 2 >
// identifier type is itk::Index< 2 >
typedef DenseHistogram< float, 2 > myDenseHistogramType ;
myDenseHistogramType::Pointer dh = myDenseHistogramType::New() ;
// creates a SparseHistogram instance
// feature vector type is itk::Point< float, 3 >
// identifier type is itk::Index< 3 >
typedef SparseHistogram< float, 3 > mySparseHistogramType ;
mySparseHistogramType::Pointer sh = mySparseHistogramType::New() ;
// creates an ImageTableAdaptor instance
// feature vector type is itk::Point< float, 3 >
// identifier type is itk::Index< 2 >
typedef itk::Point< float, 3 > myPixelType ;
typedef itk::Image< myPixelType, 2> myImageType ;
typedef ImageTableAdaptor< myImageType > myAdaptorType ;
myAdaptorType::Pointer adaptor = myAdaptorType::New() ;
// creates a PointSetTable instance
// feature vector type is itk::Point< float, 2 >
// identifier type is unsigned long
typedef PointSetTable< float, 2 > myPointSetTableType ;
myPointSetTableType::Pointer pTable = myPointSetTableType::New() ;
////////////////////////////////////////////////////////////////
// following codes are more realistic example of
// generating density estimate using Gaussian PDF
// creates an Sample instance
// feature vector type is itk::Point< float, 2 >
// identifier type is unsigned long
// Label object type is Label< unsigned long, int >
// DensityEstimate object type is
// DensityEstimate< unsigned long, double >
typedef Sample< myPointSetTableType, int, double > mySampleType ;
mySampleType::Pointer sample = mySampleType::New() ;
// plug-in a PointSetTable instance
sample->SetFeatureDomain(pTable) ;
// mySampleType::LabelType is equal to
// Label< myPointSetTableType::InstanceIdentifier, int >
mySampleType::LabelType::Pointer label = sample->GetLabel() ;
// mySampleType::DensityEstimateType is equal to
// DensityEstimate< myAdaptorType::InstanceIdentifier, double >
// the argument of GetDensityEstimate method is label
mySampleType::DensityEstimateType::Pointer de =
sample->GetDensityEstimate(0) ;
// creates a GaussianDensityFunction instance
// feature vector type is itk::Point< float, 2 >
typedef GaussianDensityFunction< mySampleType::FeatureElementType,
mySampleType::FeatureDimension >
myGDF ;
myGDF::Pointer gdf = myGDF::New() ;
// creates a GaussianDensityFunctionEstimator instance
typedef GaussianDensityFunctionEstimator< myGDF, mySample >
myGDFEstimatorType ;
myGDFEstimatorType::Pointer estimator = myGDFEstimatorType::New() ;
estimator->SetSample(sample) ;
esitmator->SetDensityFunction(myGDF) ;
// run estimation procedure
estimator->Execute() ;
// store the resulting density estimate values for class label 0
// in the Sample instance
sample->SetDensityEstimate(0, estimator->GetDensityEstimate()) ;
9. DEFINITIONS (not complete)
Statistical Model
distributional models such as Gaussian, Uniform, and Gaussian
Mixture Model.
Features
Properties that observers are interested in from an instance of
a set of observations for the purpose of statistical
analysis. For example, a image pixel may have its intensity and
gradient magnitude as its features.
Feature Space
Space where the features lie. Its dimensionality is the number
of the features that an instance has. The previous example's
features (feature vector) exist in two-dimensional feature space
that one dimension is intensity and the other is gradient
magnitude.
Feature Domain
Part of a feature space that are known to us by the data. In
this module, Feature Domain is interchangeable with data.
--0-1666893777-1002810551=:46113--