[IGSTK-Developers] "many simple specialized" components vs. "fewer, more complex and general components"

Tue Jun 5 15:08:01 EDT 2007

Hi David,

                     That sounds reasonable.

At this point, it is just a matter of defining an

                 "Objective Measure of Complexity"

and with it, we could proceed to define a Threshold of how much
"complexity" is acceptable in an IGSTK component.

The label "Too Complex" doesn't make any sense if we don't have an
objective metric that can tell use how much complexity is too much
complexity.

Without an objective measure we will end up engaging in pointless
discussions, because the degree of complexity will be left to the
subjective aesthetic perception of every developer.

My suggestion for objectively measuring the complexity of an IGSTK
component is to use the notion of Markov Process / Chains:

             http://en.wikipedia.org/wiki/Markov_chain

in the following way:

   In the State Machine of the component, take the transition table,
   and evaluate the probabilities of every transition for being
   invoked. Then compute the Entropy of that set of probabilities,
   and use it as a measure of the "complexity" of the component.

In this context, a component with 5 states, and 7 inputs, will
have 35 transitions. In the plain case were all transitions are
equally likely to be triggered, their probabilities are 1/35.
then the component will have a complexity of

    K = - Sum (from 1 to 35) of (1/35) [ log( 1/35 ) / log(2) ]

    K = 5.12 bits.

A component with 20 equally probable transitions will have a
complexity K = 4.32 bits.

I will suggest that acceptable threshold of complexity for IGSTK
components should be 5 bits. This corresponds to a state machine
table of 32 equally probable transitions.

If you look a the Wiki page that evaluates the completeness of
the transition tables in IGSTK state machines:

http://public.kitware.com/IGSTKWIKI/index.php/State_Machine_Transition_Tables_Completeness

you will find that the components with the maximum number of
transitions are the ToolCalibration ant the Tracker, with:

ToolCalibration : 171 Transitions : 164 of which are undefined
Tracker         :  90 Transitions :  80 of which are undefined

If we assume that the undefined transitions will never happen,
(which is probably the reason why the developers never considered
this transitions in the table, in the first place), and we assume
that the defined transitions are equally probable, then we get:

     K( Tracker )         = 3.32 bits
     K( ToolCalibration ) = 2.8  bits

In the case where some of the transitions are more likely than
others, the Entropy of the transition table will diminish and
therefore the K measure of complexity will be lower.

This measure of complexity reflects the intuition that a complex
components have more functionality ("transitions"), and that it
has more uncertainty about its current state. It also matches
the notion that more complex components will require more lines
of code for performing a 100% code coverage.

Note that this measure of complexity is logarithmic in nature:

   a component with 100 transitions has just the double
   of complexity of a component with 10 transitions.
   That is, 6.6 bits versus 3.3 bits.

   We should keep this in mind when we compare the complexity
   of two components, or the complexity of two implementation
   of the same component.

One nice property of this suggested measure is that if
we take two components Sa and Sb, as Frank suggested earlier,
each one with complexity measures  K(Sa) and K(Sb) respectively,
and we assume that their functionalities are completely orthogonal,
that is, they are not redundant, and we fuse them together in a
single "more complex" component, the transition table of the combined
state machine in Ca will have a number of states equal to the product
of the number of states in Sa times the number of states in Sb.
Similarly its number of inputs will be the product of the number of
inputs in Sa times the number of inputs in Sb.  As a result the
measure of complexity of Ca will satisfy:

             K( Ca )  =   K( Sa )   +    K( Sb )

If Sa and Sb are not orthogonal, then the joint probability of
their transitions will not be the produce of the independent
probabilities, and we will find that Ca has a lower complexity
than the two independent Sa and Sb components.

In this context we also can interpret the effect of factorizing
functionality of Sa, Sb into a C++ base class Sc.

     Luis

-----------------
David Gobbi wrote:
> Hi Luis,
> 
> I'm with Frank on the idea that complex components are preferable to
> forcing the application programmer to write a complex app that has to
> connect many simple components into a complex web.
> 
> As long as a component can be fully understood, code-covered, and
> tested, it is unfair to call that component "too complex".  Splitting
> such a component in two "just because we can" is not a good enough
> reason, we must also justify our decision in terms of functionality.
> 
> A problem with specialized components is that it means we have more
> components to test, and each component is likely to receive less
> testing (we don't have unlimited resources).  Also, if the components
> are too constrained, then they will only be able to serve the needs of
> a very small audience.
> 
> Our primary means of achieving safety should be through testing and
> code review.  For the actual implementation of the code, we should
> focus on functionality.
> 
> - David
> 
> 
> On 5/31/07, Luis Ibanez <luis.ibanez at kitware.com> wrote:
> 
>>
>>
>> Hi Frank,
>>
>> I agree that we should strive to find the right balance
>> in the granularity of IGSTK components.
>>
>>  From the Algorithmic Theory point of view, we will know
>> whether a component is attempting to do too much or not,
>> by counting the number of "if"-like statements in the code.
>>
>> That will include "if", "switch", and ternary "a?b:c"
>> statements. When we try to engulf in a single component
>> the functionalities that should be implemented in two or
>> more independent components, we will find ourselves
>> introducing:
>>
>>   a) large numbers of states in the State Machine, or
>>   b) large numbers of inputs in the State Machine, or
>>   c) "if" conditions that split the different cases, or
>>   d) "switch" statements that split different cases
>>
>> Some of them will presumably be driven by "enums" and "bool"
>> flags that set the components in "this mode" or "this other mode".
>> The presence of these elements will be an indication of a component
>> that has grown too complex and that should be refactored/slit
>> into simpler components.
>>
>> Where do we draw that line, is what is open for discussion,
>> and we probably have to do it on a case by case basis.
>>
>>  From the pragmatic point of view, we can simply follow the practice
>> of agile programming. Let's start by putting a prototype
>> implementation of the component in the sandbox, and as part
>> of its code review we can discuss if it should be split into
>> multiple components or not.
>>
>> A clear sign will be how many lines of code do you need in the
>> test in order to ensure 100% code coverage of the component.
>> So, just by following our normal development process, the
>> components that are too complex will clearly stand out during
>> code reviews and during continuous dashboard testing.
>>
>>
>>
>> --------
>>
>>
>> Regarding the specific example that you mention:
>>
>> Before engaging in a discussion related to "complexity" we must
>> define what it means and how to measure it objectively.
>>
>> There are multiple concepts of complexity that we may want to
>> consider here, some of them are listed in the Wikipedia entry:
>>
>>       http://en.wikipedia.org/wiki/Complexity
>>
>> When it comes to software, there are at least two measures of
>> complexity that are relevant:
>>
>>
>> 1) How many lines of code it takes to write a program.
>>     This complexity measure is equivalent to Kolmogorov Complexity:
>>
>>     http://en.wikipedia.org/wiki/Kolmogorov_complexity
>>
>>     where the string to be generated is the sequence of states of
>>     the application. States, here being the full set of variables
>>     that completely defines the application.
>>
>>
>> 2) How many different options there are available for using a
>>     program (or a routine, or a component). And therefore how
>>     many decision should be made by the application developer
>>     in order to configure the application for a particular
>>     user case.
>>
>>
>> 3) How many steps are required from the user of the application
>>     in order to perform a task. This is the "complexity" perceived
>>     by a user.
>>
>>
>> In your suggested problem, you seem to be focused on (1) and (2),
>> rather than (3), and the underlying assumption seems to be that by
>> increasing the complexity of the components, we may be able to
>> reduce the complexity of an application.
>>
>>
>> Following your description of the problem, let's consider
>> the two cases:
>>
>>         A) a component Ca
>>         B) two components Sa and Sb
>>
>> where (Ca) offers the same functionality that (Sa+Sb)
>>
>> and the complexity of Ca, let's call it Comp(Ca) is larger than
>> the individual complexities of each Sa and Sb,
>>
>> That is
>>
>>          Comp(Ca)  >=  Comp(Sa)
>>          Comp(Ca)  >=  Comp(Sb)
>>
>>
>>  From the application developer point of view, if we use the notion
>> of complexity (2), it comes down to how many method decision should
>> be made in order to use the component Ca, versus, how many decision
>> should be made in order to use Sa & Sb.
>>
>> For example, let's say that Ca is a "swiss-army-knife" image slicer,
>> that can do:
>>
>>   a) 1 slice orthogonal to a needle, and touching the tip
>>   b) 3 orthogonal slices parallel to image axes and passing
>>        through the needle tip.
>>
>> and that Sa and Sb are respectively the independent components that
>> could do only (a) and only (b).
>>
>>  From the point of view of the application developer, in the case
>> of using Ca, the application should have an "if" statement that
>> switches between the use of functionality (a) and functionality (b)
>> at compile time or at run time (or both). In the case of using Sa
>> and Sb, the application developers must also set an "if" statement
>> indicating when to display slices using Sa, and when to use Sb.
>>
>> In this context, from the point of view of the application developer,
>> and using the concept of complexity (2), there is no difference between
>> using Ca and using Sa+Sb.
>>
>> On the other hand, the testing scenario for Ca requires to exercise
>> all the features of Sa plus all the features of Sb, with the aggravation
>> that some of the settings that make sense in the "Sb" mode of Ca,
>> may not make sense in the "Sa" mode of Ca.
>>
>>
>> Note also that it is quite likely that common functionalities of Sa
>> and Sb may be factorized into a base class Sab from which both Sa
>> and Sb will derive.
>>
>>
>> Before proceeding further with this discussion, we must define the
>> measures of complexity that we consider relevant and we should establish
>> objective methods for measuring those complexity concepts.
>>
>> ---
>>
>>
>> Again, from the pragmatic point of view, I agree with Patrick, that
>> we should probably start writing prototypes in the sandbox, and base
>> our discussions in more concrete cases. We probably will need multiple
>> iterations of design/implementation/testing on every component before
>> we find the right balance between specialization and generality.
>> On the bright side, that is what agile programming is very good at.
>>
>>
>>
>>
>>       Regards,
>>
>>
>>           Luis
>>
>>
>>
>> -----------------------
>> Frank Lindseth wrote:
>> > Luis (and others),
>> >
>> > We had a long discussion about "many simple specialized" components
>> > vs. "fewer, more complex and general components" after you had to  
>> leave
>> > the Tcon yesterday (we should probably have started with this  topic).
>> > It seems like the common opinion is that in order to make it simpler
>> > for the app. developer to satisfy the clinical user requirements   it's
>> > sensible to move a little bit in the more general direction for  
>> some of
>> > the components, at the same time the components should not  become so
>> > complex that it's not possible to test them in the ordinary  way, we
>> > have to find the right balance.
>> > I know you have strong feelings about this Luis, but do you (or  
>> anybody
>> > else for that matter) think that a compromise can be found  somewhere
>> > along the simple comp./complex app - complex comp./simple  app. line?
>> > As you know, this has been my main IGSTK concern from day one, and I
>> > really need some input as to what to except as our "IGSTK practical
>> > trial period" is about to end and we have to take the big decision
>> > regarding what to base future IGS efforts on (it looks promising
>> > regarding other issues, e.g. the "coordinate system" challenge).
>> >
>> > If we need to think in terms of concrete scenarios I believe that the
>> > slicer-comp. should be used (could be specialized both in terms of
>> > modality and functionality) .
>> > Some background information / discussion can be found here:
>> > http://public.kitware.com/IGSTKWIKI/index.php/
>> > Talk:DesignChallenges#Slicing
>> >
>> > A little scenario that can help to trigger some response to this 
>> e-mail:
>> > User/surgeon would like to have an IGS system with a certain  
>> complexity
>> > in terms of required functionality (will increase over the  years, I
>> > know...).
>> > Such an app.  could be realized in different ways depending on the  way
>> > the components are made:
>> > A) Many, simple and specialized components -> Complex app. will be
>> > needed (many obj. , switching, etc.) in order to satisfy the user 
>> above.
>> > B) Fewer, more complex and general components. -> Simple app.  (to
>> > satisfy user).
>> > C) Balanced comp. -> Balanced app.  (to satisfy user).
>> >
>> > List of points that can push the balance in one or the other direction:
>> > = User/surgeon
>> > -Overall safety (not the same as comp. safety):
>> > * It's easier to test a comp. then it is to test an app. (as long as
>> > the comp. is not to complex, i.e. up to a certain level)
>> > * A simple app. is safer and easier to test then a complex one.
>> > * A complex comp. is of course more difficult to to test then a  simple
>> > one, but we should think more like this: lets say that we have  a
>> > complex comp. Ca that offers the same functionality as two simpler
>> > comp. Sa and Sb. As long as it's possible to test Ca, knowing that  
>> this
>> > comp. work properly has added more to the overall safety then  testing
>> > Sa and Sb separately.
>> > * etc. (feel free to add points to this list)
>> >
>> > = App. developer:
>> > * In terms of building a user community, the easier it is to build a
>> > app. with a certain functionality, the better it is. The extreme case
>> > being that the app. dev. only  connect the high level comp. needed  and
>> > make everything accessible to the user trough a gui.
>> > * etc. (feel free to add points to this list)
>> >
>> > = Comp. developer:
>> > * resources for dev. maintenance, doc. testing, etc.
>> > * etc. (feel free to add points to this list)
>> >
>> > Have a nice weekend everybody.
>> > Regards,
>> > Frank
>> >
>> > _______________________________________________
>> > IGSTK-Developers mailing list
>> > IGSTK-Developers at public.kitware.com
>> > http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>> >
>> _______________________________________________
>> IGSTK-Developers mailing list
>> IGSTK-Developers at public.kitware.com
>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>
>