[Insight-developers] Draft for the ITK statistical modelling module

Thu, 11 Oct 2001 12:16:49 -0400

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C15270.21C1D350
Content-Type: text/plain

I looked through the writeup. Here are my comments in no
particular order:

*	I think you also need CDFs.  CDFs are used more in decision criteria than PDFs.
*	I don't like the FeatureDomain name (or FeatureSpace or Features). I have never heard
FeatureDomain term used before in statistics. I tend use "feature" as a term of something that is
extracted or estimated from data. I think you are mixing the concept of a "measurement", a "sample",
a "population", a "sample from a population", a "subsample", and an "estimated pmf (histogram)".  I
suggest you identify how these concepts are each supposed to be used.  And what concepts needs to
represented as objects. Some suggestions for the underlying concepts

*	RandomVariable 
*	FunctionOfARandomVariable
*	Sample

*	This is a confusion concept because you can have a single sample of a random variable or you
can take a "sample" which are number of sampling a random variable. You want to make sure there is
not confusion by what you mean by sample in the toolkit.
*	Perhaps the solution is have a "Sample" and a "Measurement".  A "Measurement" is a single
sample of a random variable and "Sample" is a collection of "Measurement"'s.  I guess in your current
terminology a "Measurement" refers to your "Observation".

*	Measurement (see above)
*	Density
*	Distribution (typically a CDF).
*	Parameter (population parameter)
*	Estimate (something calculated from a collection of rv's. If the estimate is on of something
other than a parameter of the distribution, then it is hard to distinguish from a function of a
random variable).
*	Test - Student-t, Chi-squared, Hotelling T^2, F
*	Table 

*	In the context of a representing the area under a CDF.

*	NearestNeighborFinder - I would call this a NearestNeighborLocator.
*	FastRandomUnitNormalVariateGenerator

*	I still do not like this name.  If this the proposed algorithm for generating samples from a
standardized gaussian distribution, then I propose we just call it NormalVariateGenerator.
*	However, I think this "algorithm" should really just be a method on a "Density" or
"Distribution".  Every "Distribution" should be able to generate random variables.  The simpliest way
to do this is to draw a sample from a uniform distribution and run that value through the inverse CDF
of the specified Distribution. If a CDF cannot be inverted (easily), you can do a binary search for
that x value generates the CDF value specified.  This could be the default implementation in the
"Density" and "Distribution" classes.
*	Along these lines, a Distribution should be able to answer "test" questions. i.e. What is the
probability of a random variable having a value less than or equal to "blah"

-----Original Message-----
From: Jisung Kim [ mailto:bahrahm@yahoo.com <mailto:bahrahm@yahoo.com> ]
Sent: Thursday, October 11, 2001 10:29 AM
To: insight-dev-list
Subject: [Insight-developers] Draft for the ITK statistical modelling
module

Hi.

I made a document about ITK statistical modelling
module. Neither the document nor the module is
complete in any sense. However, the document explains
the architecture of the module and reflects current
implementation of the module. The attached document is
a text file with column size 79. I haven't check in
the new codes. As soon as naming and namespace issues
are settled down, I will check in the new codes.
Before you read the following "some current issues"
section, please read the attached document first.

Thank you,
Jisung.

-- Some current issues

   * Do we need a separate namespace for this
module?If so, what is good name? itks? I think having
a separate namespace from itk would be nice. The
statistical modelling module should be a set of
general statistical tools. There are possible class
name conflicts. Grouping this module's class using a
namespace might help users extinguish it from other
parts of ITK. Since users are probably already using
more than one namespaces in their programs such as itk
and std, at least I do, there is no extra typing.

   * Do the class names make sense to users who have
basic statistical knowledge? I have never seen
"feature domain" in anyliterature.   

   * I will make the Label and DensityEstimate classes
internally use itk::MapContainer. The key type of the
map structure is InstanceIdentifier type and the value
type is label or density estimate values as map value
type. The problem is that the InstanceIdentifier can
be either unsigned long (PointSetTable) or itk::Index
(other FeatureDomain subclasses). Since MapContainer
requires less than operator (<) for the key type, I
need something like itk::Index with less than
operator. I am considering creating a subclass of
itk::Index. The name of the class can be
"itks"::ComparableIndex. Is this approach a good 
idea? Or any other options?

=====
Jisung Kim
bahrahm@yahoo.com
106 Mason Farm Rd.
129 Radiology Research Lab., CB# 7515
Univ. of North Carolina at Chapel Hill
Chapel Hill, NC 27599-7515

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com <http://personals.yahoo.com> 

------_=_NextPart_001_01C15270.21C1D350
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<TITLE></TITLE>

<META content="MSHTML 5.50.4807.2300" name=GENERATOR></HEAD>
<BODY>
<P><FONT size=2>I looked through the writeup. Here are my comments in 
no<BR>particular order:</FONT></P><FONT size=2>
<UL>
  <LI><FONT color=#0000ff>I think you also need CDFs.&nbsp; CDFs are used more 
  in decision criteria than PDFs.</FONT></LI>
  <LI><FONT color=#0000ff>I don't like the FeatureDomain name (or FeatureSpace 
  or Features). I have never heard FeatureDomain term used before in statistics. 
  I tend use "feature" as a term of something that is extracted or estimated 
  from data. I think you are mixing the concept of a "measurement", a "sample", 
  a "population", a "sample from a population", a "subsample", and an "estimated 
  pmf (histogram)".&nbsp; I suggest you identify how these concepts&nbsp;are 
  each supposed to be used.&nbsp; And what concepts needs to represented as 
  objects. Some suggestions for the underlying concepts</FONT></LI>
  <UL>
    <LI><FONT color=#0000ff>RandomVariable </FONT></LI>
    <LI><FONT color=#0000ff>FunctionOfARandomVariable</FONT></LI>
    <LI><FONT color=#0000ff>Sample</FONT></LI>
    <UL>
      <LI><FONT color=#0000ff>This is a confusion concept because you can have a 
      single sample of a random variable or you can take a "sample" which are 
      number of sampling a random variable. You want to make sure there is not 
      confusion by what you mean by sample in the toolkit.</FONT></LI>
      <LI><FONT color=#0000ff>Perhaps the solution is have a "Sample" and a 
      "Measurement".&nbsp; A "Measurement" is a single sample of a random 
      variable and "Sample" is a collection of "Measurement"'s.&nbsp; I guess in 
      your current terminology a "Measurement" refers to your 
      "Observation".</FONT></LI></UL>
    <LI><FONT color=#0000ff>Measurement (see above)</FONT></LI>
    <LI><FONT color=#0000ff>Density</FONT></LI>
    <LI><FONT color=#0000ff>Distribution (typically a CDF).</FONT></LI>
    <LI><FONT color=#0000ff>Parameter (population parameter)</FONT></LI>
    <LI><FONT color=#0000ff>Estimate (something calculated from a collection of 
    rv's. If the estimate is on of something other than a parameter of the 
    distribution, then it is hard to distinguish from a function of a random 
    variable).</FONT></LI>
    <LI><FONT color=#0000ff>Test - Student-t, Chi-squared, Hotelling T^2, 
    F</FONT></LI>
    <LI><FONT color=#0000ff>Table </FONT></LI>
    <UL>
      <LI><FONT color=#0000ff>In the context of a representing the area under a 
      CDF.</FONT></LI></UL></UL>
  <LI><FONT color=#0000ff>NearestNeighborFinder - I would call this a 
  NearestNeighborLocator.</FONT></LI>
  <LI><FONT color=#0000ff>FastRandomUnitNormalVariateGenerator</FONT></LI>
  <UL>
    <LI><FONT color=#0000ff>I still do not like this name.&nbsp; If this the 
    proposed algorithm for generating samples from a standardized gaussian 
    distribution, then I propose we just call it 
    NormalVariateGenerator.</FONT></LI>
    <LI><FONT color=#0000ff>However, I think this "algorithm" should really just 
    be a method on a "Density" or "Distribution".&nbsp; Every "Distribution" 
    should be able to generate random variables.&nbsp; The simpliest way to do 
    this is to draw a sample from a uniform distribution and run that value 
    through the inverse CDF of the specified Distribution. If a CDF cannot be 
    inverted (easily), you can do a binary search for that x value generates the 
    CDF value specified.&nbsp; This could be the default implementation in the 
    "Density" and "Distribution" classes.</FONT></LI>
    <LI><FONT color=#0000ff>Along these lines, a Distribution should be able to 
    answer "test" questions. i.e. What is the probability of a random variable 
    having a value less than or equal to "blah"</FONT></LI></UL></UL>
<DIV>&nbsp;</DIV>
<DIV><FONT color=#0000ff></FONT>&nbsp;</DIV><FONT color=#0000ff></FONT><FONT 
color=#0000ff></FONT><FONT color=#0000ff></FONT>
<DIV><BR><BR><BR><BR><BR><BR>-----Original Message-----<BR>From: Jisung Kim [<A 
href="mailto:bahrahm@yahoo.com">mailto:bahrahm@yahoo.com</A>]<BR>Sent: Thursday, 
October 11, 2001 10:29 AM<BR>To: insight-dev-list<BR>Subject: 
[Insight-developers] Draft for the ITK statistical 
modelling<BR>module<BR><BR><BR>Hi.<BR><BR>I made a document about ITK 
statistical modelling<BR>module. Neither the document nor the module 
is<BR>complete in any sense. However, the document explains<BR>the architecture 
of the module and reflects current<BR>implementation of the module. The attached 
document is<BR>a text file with column size 79. I haven't check in<BR>the new 
codes. As soon as naming and namespace issues<BR>are settled down, I will check 
in the new codes.<BR>Before you read the following "some current 
issues"<BR>section, please read the attached document first.<BR><BR>Thank 
you,<BR>Jisung.<BR><BR>-- Some current issues<BR><BR>&nbsp;&nbsp; * Do we need a 
separate namespace for this<BR>module?If so, what is good name? itks? I think 
having<BR>a separate namespace from itk would be nice. The<BR>statistical 
modelling module should be a set of<BR>general statistical tools. There are 
possible class<BR>name conflicts. Grouping this module's class using 
a<BR>namespace might help users extinguish it from other<BR>parts of ITK. Since 
users are probably already using<BR>more than one namespaces in their programs 
such as itk<BR>and std, at least I do, there is no extra 
typing.<BR><BR>&nbsp;&nbsp; * Do the class names make sense to users who 
have<BR>basic statistical knowledge? I have never seen<BR>"feature domain" in 
anyliterature.&nbsp;&nbsp;&nbsp;<BR><BR>&nbsp;&nbsp; * I will make the Label and 
DensityEstimate classes<BR>internally use itk::MapContainer. The key type of 
the<BR>map structure is InstanceIdentifier type and the value<BR>type is label 
or density estimate values as map value<BR>type. The problem is that the 
InstanceIdentifier can<BR>be either unsigned long (PointSetTable) or 
itk::Index<BR>(other FeatureDomain subclasses). Since MapContainer<BR>requires 
less than operator (&lt;) for the key type, I<BR>need something like itk::Index 
with less than<BR>operator. I am considering creating a subclass 
of<BR>itk::Index. The name of the class can be<BR>"itks"::ComparableIndex. Is 
this approach a good&nbsp;<BR>idea? Or any other 
options?<BR><BR><BR>=====<BR>Jisung Kim<BR>bahrahm@yahoo.com<BR>106 Mason Farm 
Rd.<BR>129 Radiology Research Lab., CB# 7515<BR>Univ. of North Carolina at 
Chapel Hill<BR>Chapel Hill, NC 
27599-7515<BR><BR>__________________________________________________<BR>Do You 
Yahoo!?<BR>Make a great connection at Yahoo! Personals.<BR><A target=_blank 
href="http://personals.yahoo.com">http://personals.yahoo.com</A><BR></DIV></FONT></BODY></HTML>

------_=_NextPart_001_01C15270.21C1D350--