[Insight-developers] Draft for the ITK statistical modelling
module
Miller, James V (CRD)
millerjv@crd.ge.com
Thu, 11 Oct 2001 12:16:49 -0400
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C15270.21C1D350
Content-Type: text/plain
I looked through the writeup. Here are my comments in no
particular order:
* I think you also need CDFs. CDFs are used more in decision criteria than PDFs.
* I don't like the FeatureDomain name (or FeatureSpace or Features). I have never heard
FeatureDomain term used before in statistics. I tend use "feature" as a term of something that is
extracted or estimated from data. I think you are mixing the concept of a "measurement", a "sample",
a "population", a "sample from a population", a "subsample", and an "estimated pmf (histogram)". I
suggest you identify how these concepts are each supposed to be used. And what concepts needs to
represented as objects. Some suggestions for the underlying concepts
* RandomVariable
* FunctionOfARandomVariable
* Sample
* This is a confusion concept because you can have a single sample of a random variable or you
can take a "sample" which are number of sampling a random variable. You want to make sure there is
not confusion by what you mean by sample in the toolkit.
* Perhaps the solution is have a "Sample" and a "Measurement". A "Measurement" is a single
sample of a random variable and "Sample" is a collection of "Measurement"'s. I guess in your current
terminology a "Measurement" refers to your "Observation".
* Measurement (see above)
* Density
* Distribution (typically a CDF).
* Parameter (population parameter)
* Estimate (something calculated from a collection of rv's. If the estimate is on of something
other than a parameter of the distribution, then it is hard to distinguish from a function of a
random variable).
* Test - Student-t, Chi-squared, Hotelling T^2, F
* Table
* In the context of a representing the area under a CDF.
* NearestNeighborFinder - I would call this a NearestNeighborLocator.
* FastRandomUnitNormalVariateGenerator
* I still do not like this name. If this the proposed algorithm for generating samples from a
standardized gaussian distribution, then I propose we just call it NormalVariateGenerator.
* However, I think this "algorithm" should really just be a method on a "Density" or
"Distribution". Every "Distribution" should be able to generate random variables. The simpliest way
to do this is to draw a sample from a uniform distribution and run that value through the inverse CDF
of the specified Distribution. If a CDF cannot be inverted (easily), you can do a binary search for
that x value generates the CDF value specified. This could be the default implementation in the
"Density" and "Distribution" classes.
* Along these lines, a Distribution should be able to answer "test" questions. i.e. What is the
probability of a random variable having a value less than or equal to "blah"
-----Original Message-----
From: Jisung Kim [ mailto:bahrahm@yahoo.com <mailto:bahrahm@yahoo.com> ]
Sent: Thursday, October 11, 2001 10:29 AM
To: insight-dev-list
Subject: [Insight-developers] Draft for the ITK statistical modelling
module
Hi.
I made a document about ITK statistical modelling
module. Neither the document nor the module is
complete in any sense. However, the document explains
the architecture of the module and reflects current
implementation of the module. The attached document is
a text file with column size 79. I haven't check in
the new codes. As soon as naming and namespace issues
are settled down, I will check in the new codes.
Before you read the following "some current issues"
section, please read the attached document first.
Thank you,
Jisung.
-- Some current issues
* Do we need a separate namespace for this
module?If so, what is good name? itks? I think having
a separate namespace from itk would be nice. The
statistical modelling module should be a set of
general statistical tools. There are possible class
name conflicts. Grouping this module's class using a
namespace might help users extinguish it from other
parts of ITK. Since users are probably already using
more than one namespaces in their programs such as itk
and std, at least I do, there is no extra typing.
* Do the class names make sense to users who have
basic statistical knowledge? I have never seen
"feature domain" in anyliterature.
* I will make the Label and DensityEstimate classes
internally use itk::MapContainer. The key type of the
map structure is InstanceIdentifier type and the value
type is label or density estimate values as map value
type. The problem is that the InstanceIdentifier can
be either unsigned long (PointSetTable) or itk::Index
(other FeatureDomain subclasses). Since MapContainer
requires less than operator (<) for the key type, I
need something like itk::Index with less than
operator. I am considering creating a subclass of
itk::Index. The name of the class can be
"itks"::ComparableIndex. Is this approach a good
idea? Or any other options?
=====
Jisung Kim
bahrahm@yahoo.com
106 Mason Farm Rd.
129 Radiology Research Lab., CB# 7515
Univ. of North Carolina at Chapel Hill
Chapel Hill, NC 27599-7515
__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com <http://personals.yahoo.com>
------_=_NextPart_001_01C15270.21C1D350
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<TITLE></TITLE>
<META content="MSHTML 5.50.4807.2300" name=GENERATOR></HEAD>
<BODY>
<P><FONT size=2>I looked through the writeup. Here are my comments in
no<BR>particular order:</FONT></P><FONT size=2>
<UL>
<LI><FONT color=#0000ff>I think you also need CDFs. CDFs are used more
in decision criteria than PDFs.</FONT></LI>
<LI><FONT color=#0000ff>I don't like the FeatureDomain name (or FeatureSpace
or Features). I have never heard FeatureDomain term used before in statistics.
I tend use "feature" as a term of something that is extracted or estimated
from data. I think you are mixing the concept of a "measurement", a "sample",
a "population", a "sample from a population", a "subsample", and an "estimated
pmf (histogram)". I suggest you identify how these concepts are
each supposed to be used. And what concepts needs to represented as
objects. Some suggestions for the underlying concepts</FONT></LI>
<UL>
<LI><FONT color=#0000ff>RandomVariable </FONT></LI>
<LI><FONT color=#0000ff>FunctionOfARandomVariable</FONT></LI>
<LI><FONT color=#0000ff>Sample</FONT></LI>
<UL>
<LI><FONT color=#0000ff>This is a confusion concept because you can have a
single sample of a random variable or you can take a "sample" which are
number of sampling a random variable. You want to make sure there is not
confusion by what you mean by sample in the toolkit.</FONT></LI>
<LI><FONT color=#0000ff>Perhaps the solution is have a "Sample" and a
"Measurement". A "Measurement" is a single sample of a random
variable and "Sample" is a collection of "Measurement"'s. I guess in
your current terminology a "Measurement" refers to your
"Observation".</FONT></LI></UL>
<LI><FONT color=#0000ff>Measurement (see above)</FONT></LI>
<LI><FONT color=#0000ff>Density</FONT></LI>
<LI><FONT color=#0000ff>Distribution (typically a CDF).</FONT></LI>
<LI><FONT color=#0000ff>Parameter (population parameter)</FONT></LI>
<LI><FONT color=#0000ff>Estimate (something calculated from a collection of
rv's. If the estimate is on of something other than a parameter of the
distribution, then it is hard to distinguish from a function of a random
variable).</FONT></LI>
<LI><FONT color=#0000ff>Test - Student-t, Chi-squared, Hotelling T^2,
F</FONT></LI>
<LI><FONT color=#0000ff>Table </FONT></LI>
<UL>
<LI><FONT color=#0000ff>In the context of a representing the area under a
CDF.</FONT></LI></UL></UL>
<LI><FONT color=#0000ff>NearestNeighborFinder - I would call this a
NearestNeighborLocator.</FONT></LI>
<LI><FONT color=#0000ff>FastRandomUnitNormalVariateGenerator</FONT></LI>
<UL>
<LI><FONT color=#0000ff>I still do not like this name. If this the
proposed algorithm for generating samples from a standardized gaussian
distribution, then I propose we just call it
NormalVariateGenerator.</FONT></LI>
<LI><FONT color=#0000ff>However, I think this "algorithm" should really just
be a method on a "Density" or "Distribution". Every "Distribution"
should be able to generate random variables. The simpliest way to do
this is to draw a sample from a uniform distribution and run that value
through the inverse CDF of the specified Distribution. If a CDF cannot be
inverted (easily), you can do a binary search for that x value generates the
CDF value specified. This could be the default implementation in the
"Density" and "Distribution" classes.</FONT></LI>
<LI><FONT color=#0000ff>Along these lines, a Distribution should be able to
answer "test" questions. i.e. What is the probability of a random variable
having a value less than or equal to "blah"</FONT></LI></UL></UL>
<DIV> </DIV>
<DIV><FONT color=#0000ff></FONT> </DIV><FONT color=#0000ff></FONT><FONT
color=#0000ff></FONT><FONT color=#0000ff></FONT>
<DIV><BR><BR><BR><BR><BR><BR>-----Original Message-----<BR>From: Jisung Kim [<A
href="mailto:bahrahm@yahoo.com">mailto:bahrahm@yahoo.com</A>]<BR>Sent: Thursday,
October 11, 2001 10:29 AM<BR>To: insight-dev-list<BR>Subject:
[Insight-developers] Draft for the ITK statistical
modelling<BR>module<BR><BR><BR>Hi.<BR><BR>I made a document about ITK
statistical modelling<BR>module. Neither the document nor the module
is<BR>complete in any sense. However, the document explains<BR>the architecture
of the module and reflects current<BR>implementation of the module. The attached
document is<BR>a text file with column size 79. I haven't check in<BR>the new
codes. As soon as naming and namespace issues<BR>are settled down, I will check
in the new codes.<BR>Before you read the following "some current
issues"<BR>section, please read the attached document first.<BR><BR>Thank
you,<BR>Jisung.<BR><BR>-- Some current issues<BR><BR> * Do we need a
separate namespace for this<BR>module?If so, what is good name? itks? I think
having<BR>a separate namespace from itk would be nice. The<BR>statistical
modelling module should be a set of<BR>general statistical tools. There are
possible class<BR>name conflicts. Grouping this module's class using
a<BR>namespace might help users extinguish it from other<BR>parts of ITK. Since
users are probably already using<BR>more than one namespaces in their programs
such as itk<BR>and std, at least I do, there is no extra
typing.<BR><BR> * Do the class names make sense to users who
have<BR>basic statistical knowledge? I have never seen<BR>"feature domain" in
anyliterature. <BR><BR> * I will make the Label and
DensityEstimate classes<BR>internally use itk::MapContainer. The key type of
the<BR>map structure is InstanceIdentifier type and the value<BR>type is label
or density estimate values as map value<BR>type. The problem is that the
InstanceIdentifier can<BR>be either unsigned long (PointSetTable) or
itk::Index<BR>(other FeatureDomain subclasses). Since MapContainer<BR>requires
less than operator (<) for the key type, I<BR>need something like itk::Index
with less than<BR>operator. I am considering creating a subclass
of<BR>itk::Index. The name of the class can be<BR>"itks"::ComparableIndex. Is
this approach a good <BR>idea? Or any other
options?<BR><BR><BR>=====<BR>Jisung Kim<BR>bahrahm@yahoo.com<BR>106 Mason Farm
Rd.<BR>129 Radiology Research Lab., CB# 7515<BR>Univ. of North Carolina at
Chapel Hill<BR>Chapel Hill, NC
27599-7515<BR><BR>__________________________________________________<BR>Do You
Yahoo!?<BR>Make a great connection at Yahoo! Personals.<BR><A target=_blank
href="http://personals.yahoo.com">http://personals.yahoo.com</A><BR></DIV></FONT></BODY></HTML>
------_=_NextPart_001_01C15270.21C1D350--