[IGSTK-Developers] "many simple specialized" components vs. "fewer, more complex and general components"

Tue Jun 5 20:48:00 EDT 2007

Hi all,

I'm going to try and offer some thoughts as we are working on validating 
state machines out here (and we are either going crazy from the desert 
heat or from the complexity!). I was not part of the original tcon 
conversation on granularity, so I am having a little trouble trying to 
understand Frank's original email; I apologize if I misunderstand anything.

Some conceptual points:
1. Components should have a "natural complexity" based on their 
cohesiveness. That is, they should intuitively encapsulate a set of 
behaviors and state, like a coarse-grained object. An agile approach 
here helps because their should be mindshare as to whether a given 
component are intuitively at the right level. I would also mention our 
community can help a lot, as original  component developers may assume a 
component is easier to understand than it is to the novice.
2. Components and classes are not the same thing (or shouldn't be). In 
IGSTK they are though, which can add some confusion. A component may be 
implemented as a collection of classes aggregated in specific and 
meaningful ways. In fact I see no reason why IGSTK components could not 
be composed from multiple classes.
3. How a component should be used is very important and should be 
reified in the software. This can be a very tricky thing to do, as many 
toolkits such as IGSTK intend for their components to be used in ways 
that are unanticipated. As components at higher levels of aggregation 
encapsulate more specific use cases (#4), their interfaces (and state 
machines) should reflect that, this is commonly referred to as 
Intentional Programming.
4. I see no reason for components to be at the same level anyway. A 
component Ca may be implemented by a specific combination of components 
Cb and Cc. In fact, this composition style is often one moves from 
application-independent and reusable components to more 
application-specific (and thus narrowly scoped) ones.

Some pointed comments from reading the email trace:
- I would say there is a 4th type of complexity, and that is the 
complexity of a programmer to program within the IGSTK coding model. 
We've had this discussion several times before, along the lines of "are 
we making the coding model so different from standard practice that it 
actually makes our code unsafe?". I don't want to revisit that argument, 
except to say that it is a form of complexity - even if you can 
guarantee the state machine will not get into an unsafe state, you can 
still have many frustrating situations contrary to the developer's 
expectations. The Event model within IGSTK is an example to me. The 
application workflow effort (is that underway?) is an example solution.
- The example in Luis' email of 5/31 (the "swiss-army-knife") is fine 
though I don't think it is a representative case. One really shouldn't 
implement a component in this way, using conditional logic over 
polymorphism. Of course, accounting for dynamic binding in an 
application is a form of conditional logic that needs to be tested as well.

On measures:
The Markov Chain analysis is interesting, but I have a couple questions:
1. Assuming uniform probabilities on transitions is not appropriate for 
IGSTK. In fact these components are constructed with some pretty blatant 
assumptions about what input is most likely next. We intend to create 
stochastic testing simulations in our tools to account for expected 
outcomes and fault scenarios.
2. Why is the number of states (inputs) in Ca equal to the product of 
the number of states (inputs) in Sa and Sb? As your assumption is Sa and 
Sb are independent of one another, I would think it would be the sum 
(basically, once the Ca's machine is set ready to run, there would be 2 
independent graphs in the machine with no interaction between them)?
3. It is counterintuitive that "a component with 100 transition has just 
double the complexity of a component with 10 transitions". In fact the 
literature on the state explosion problem suggests it is exponential, 
not logarithmic, and that is certainly the case for our coverage tools. 
Harel's seminal work on Statecharts and HSMs is found on compactness of 
representation (inline with your Kolmogorov complexity measure) in order 
to reduce states and transitions. This is important because your measure 
would suggest that larger-grained components are better. I think for 
IGSTK smaller state machines are better because they are more 
human-understandable and testable. The complexity that comes from 
composing components in other more flexible toolkits doesn't really 
exist in IGSTK due to its strong reliance on static binding and lack of 
external configuration files.

We focus a lot on algorithmic complexity, and I'm not sure that was the 
original motivation for Frank's email. It would seem proper usage of 
components by application developers is. We also tend to think that our 
design reduces complexity, but complexity is an inherent attribute of 
the problem space, not the solution space - all we can do is move it 
around and manage it so our solutions are easy to understand, test, and 
reuse. On the one hand IGSTK helps with some of this but then introduces 
its own set of issues as suggested above. In any event, I think Frank's 
question is more about factoring complexity into granular components 
(and application versus framework components) than it is about the 
complexity of the algorithms and state machines. (?)

There are also a lot of metrics out there for component complexity that 
are not algorithmic. The more traditional ones in object-oriented 
programming include dependency analysis, fan-in/out, coupling, and 
cohesion. There are also complexity measures that can be applied to 
statecharts that evaluate their structure - for example average 
branching factor. These all can be readily included into a DART 
dashboard with threshold measures defined that suggest warnings on the 
dashboard much like kwStyle.

Finally, I'll say 9as I have said before) that IGSTK tends to rely 
solely on state machines and unit testing to achieve safety. There are a 
number of safety-oriented programming and engineering practices that we 
can also look at - error/fault/failure analysis, requirements 
management, and so forth. These may not sound Agile, but I think the 
application domain necessitates our considering them.

Thanks,
K2

Luis Ibanez wrote:
>
> Hi David,
>
>
>                     That sounds reasonable.
>
>
> At this point, it is just a matter of defining an
>
>
>                 "Objective Measure of Complexity"
>
>
> and with it, we could proceed to define a Threshold of how much
> "complexity" is acceptable in an IGSTK component.
>
>
> The label "Too Complex" doesn't make any sense if we don't have an
> objective metric that can tell use how much complexity is too much
> complexity.
>
>
> Without an objective measure we will end up engaging in pointless
> discussions, because the degree of complexity will be left to the
> subjective aesthetic perception of every developer.
>
>
>
> My suggestion for objectively measuring the complexity of an IGSTK
> component is to use the notion of Markov Process / Chains:
>
>             http://en.wikipedia.org/wiki/Markov_chain
>
> in the following way:
>
>   In the State Machine of the component, take the transition table,
>   and evaluate the probabilities of every transition for being
>   invoked. Then compute the Entropy of that set of probabilities,
>   and use it as a measure of the "complexity" of the component.
>
>
> In this context, a component with 5 states, and 7 inputs, will
> have 35 transitions. In the plain case were all transitions are
> equally likely to be triggered, their probabilities are 1/35.
> then the component will have a complexity of
>
>
>    K = - Sum (from 1 to 35) of (1/35) [ log( 1/35 ) / log(2) ]
>
>    K = 5.12 bits.
>
>
> A component with 20 equally probable transitions will have a
> complexity K = 4.32 bits.
>
>
> I will suggest that acceptable threshold of complexity for IGSTK
> components should be 5 bits. This corresponds to a state machine
> table of 32 equally probable transitions.
>
>
> If you look a the Wiki page that evaluates the completeness of
> the transition tables in IGSTK state machines:
>
> http://public.kitware.com/IGSTKWIKI/index.php/State_Machine_Transition_Tables_Completeness 
>
>
> you will find that the components with the maximum number of
> transitions are the ToolCalibration ant the Tracker, with:
>
> ToolCalibration : 171 Transitions : 164 of which are undefined
> Tracker         :  90 Transitions :  80 of which are undefined
>
> If we assume that the undefined transitions will never happen,
> (which is probably the reason why the developers never considered
> this transitions in the table, in the first place), and we assume
> that the defined transitions are equally probable, then we get:
>
>
>     K( Tracker )         = 3.32 bits
>     K( ToolCalibration ) = 2.8  bits
>
>
> In the case where some of the transitions are more likely than
> others, the Entropy of the transition table will diminish and
> therefore the K measure of complexity will be lower.
>
>
> This measure of complexity reflects the intuition that a complex
> components have more functionality ("transitions"), and that it
> has more uncertainty about its current state. It also matches
> the notion that more complex components will require more lines
> of code for performing a 100% code coverage.
>
>
> Note that this measure of complexity is logarithmic in nature:
>
>
>   a component with 100 transitions has just the double
>   of complexity of a component with 10 transitions.
>   That is, 6.6 bits versus 3.3 bits.
>
>
>   We should keep this in mind when we compare the complexity
>   of two components, or the complexity of two implementation
>   of the same component.
>
>
> One nice property of this suggested measure is that if
> we take two components Sa and Sb, as Frank suggested earlier,
> each one with complexity measures  K(Sa) and K(Sb) respectively,
> and we assume that their functionalities are completely orthogonal,
> that is, they are not redundant, and we fuse them together in a
> single "more complex" component, the transition table of the combined
> state machine in Ca will have a number of states equal to the product
> of the number of states in Sa times the number of states in Sb.
> Similarly its number of inputs will be the product of the number of
> inputs in Sa times the number of inputs in Sb.  As a result the
> measure of complexity of Ca will satisfy:
>
>
>             K( Ca )  =   K( Sa )   +    K( Sb )
>
>
> If Sa and Sb are not orthogonal, then the joint probability of
> their transitions will not be the produce of the independent
> probabilities, and we will find that Ca has a lower complexity
> than the two independent Sa and Sb components.
>
> In this context we also can interpret the effect of factorizing
> functionality of Sa, Sb into a C++ base class Sc.
>
>
>
>     Luis
>
>
>
> -----------------
> David Gobbi wrote:
>> Hi Luis,
>>
>> I'm with Frank on the idea that complex components are preferable to
>> forcing the application programmer to write a complex app that has to
>> connect many simple components into a complex web.
>>
>> As long as a component can be fully understood, code-covered, and
>> tested, it is unfair to call that component "too complex".  Splitting
>> such a component in two "just because we can" is not a good enough
>> reason, we must also justify our decision in terms of functionality.
>>
>> A problem with specialized components is that it means we have more
>> components to test, and each component is likely to receive less
>> testing (we don't have unlimited resources).  Also, if the components
>> are too constrained, then they will only be able to serve the needs of
>> a very small audience.
>>
>> Our primary means of achieving safety should be through testing and
>> code review.  For the actual implementation of the code, we should
>> focus on functionality.
>>
>> - David
>>
>>
>> On 5/31/07, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>>
>>>
>>>
>>> Hi Frank,
>>>
>>> I agree that we should strive to find the right balance
>>> in the granularity of IGSTK components.
>>>
>>>  From the Algorithmic Theory point of view, we will know
>>> whether a component is attempting to do too much or not,
>>> by counting the number of "if"-like statements in the code.
>>>
>>> That will include "if", "switch", and ternary "a?b:c"
>>> statements. When we try to engulf in a single component
>>> the functionalities that should be implemented in two or
>>> more independent components, we will find ourselves
>>> introducing:
>>>
>>>   a) large numbers of states in the State Machine, or
>>>   b) large numbers of inputs in the State Machine, or
>>>   c) "if" conditions that split the different cases, or
>>>   d) "switch" statements that split different cases
>>>
>>> Some of them will presumably be driven by "enums" and "bool"
>>> flags that set the components in "this mode" or "this other mode".
>>> The presence of these elements will be an indication of a component
>>> that has grown too complex and that should be refactored/slit
>>> into simpler components.
>>>
>>> Where do we draw that line, is what is open for discussion,
>>> and we probably have to do it on a case by case basis.
>>>
>>>  From the pragmatic point of view, we can simply follow the practice
>>> of agile programming. Let's start by putting a prototype
>>> implementation of the component in the sandbox, and as part
>>> of its code review we can discuss if it should be split into
>>> multiple components or not.
>>>
>>> A clear sign will be how many lines of code do you need in the
>>> test in order to ensure 100% code coverage of the component.
>>> So, just by following our normal development process, the
>>> components that are too complex will clearly stand out during
>>> code reviews and during continuous dashboard testing.
>>>
>>>
>>>
>>> --------
>>>
>>>
>>> Regarding the specific example that you mention:
>>>
>>> Before engaging in a discussion related to "complexity" we must
>>> define what it means and how to measure it objectively.
>>>
>>> There are multiple concepts of complexity that we may want to
>>> consider here, some of them are listed in the Wikipedia entry:
>>>
>>>       http://en.wikipedia.org/wiki/Complexity
>>>
>>> When it comes to software, there are at least two measures of
>>> complexity that are relevant:
>>>
>>>
>>> 1) How many lines of code it takes to write a program.
>>>     This complexity measure is equivalent to Kolmogorov Complexity:
>>>
>>>     http://en.wikipedia.org/wiki/Kolmogorov_complexity
>>>
>>>     where the string to be generated is the sequence of states of
>>>     the application. States, here being the full set of variables
>>>     that completely defines the application.
>>>
>>>
>>> 2) How many different options there are available for using a
>>>     program (or a routine, or a component). And therefore how
>>>     many decision should be made by the application developer
>>>     in order to configure the application for a particular
>>>     user case.
>>>
>>>
>>> 3) How many steps are required from the user of the application
>>>     in order to perform a task. This is the "complexity" perceived
>>>     by a user.
>>>
>>>
>>> In your suggested problem, you seem to be focused on (1) and (2),
>>> rather than (3), and the underlying assumption seems to be that by
>>> increasing the complexity of the components, we may be able to
>>> reduce the complexity of an application.
>>>
>>>
>>> Following your description of the problem, let's consider
>>> the two cases:
>>>
>>>         A) a component Ca
>>>         B) two components Sa and Sb
>>>
>>> where (Ca) offers the same functionality that (Sa+Sb)
>>>
>>> and the complexity of Ca, let's call it Comp(Ca) is larger than
>>> the individual complexities of each Sa and Sb,
>>>
>>> That is
>>>
>>>          Comp(Ca)  >=  Comp(Sa)
>>>          Comp(Ca)  >=  Comp(Sb)
>>>
>>>
>>>  From the application developer point of view, if we use the notion
>>> of complexity (2), it comes down to how many method decision should
>>> be made in order to use the component Ca, versus, how many decision
>>> should be made in order to use Sa & Sb.
>>>
>>> For example, let's say that Ca is a "swiss-army-knife" image slicer,
>>> that can do:
>>>
>>>   a) 1 slice orthogonal to a needle, and touching the tip
>>>   b) 3 orthogonal slices parallel to image axes and passing
>>>        through the needle tip.
>>>
>>> and that Sa and Sb are respectively the independent components that
>>> could do only (a) and only (b).
>>>
>>>  From the point of view of the application developer, in the case
>>> of using Ca, the application should have an "if" statement that
>>> switches between the use of functionality (a) and functionality (b)
>>> at compile time or at run time (or both). In the case of using Sa
>>> and Sb, the application developers must also set an "if" statement
>>> indicating when to display slices using Sa, and when to use Sb.
>>>
>>> In this context, from the point of view of the application developer,
>>> and using the concept of complexity (2), there is no difference between
>>> using Ca and using Sa+Sb.
>>>
>>> On the other hand, the testing scenario for Ca requires to exercise
>>> all the features of Sa plus all the features of Sb, with the 
>>> aggravation
>>> that some of the settings that make sense in the "Sb" mode of Ca,
>>> may not make sense in the "Sa" mode of Ca.
>>>
>>>
>>> Note also that it is quite likely that common functionalities of Sa
>>> and Sb may be factorized into a base class Sab from which both Sa
>>> and Sb will derive.
>>>
>>>
>>> Before proceeding further with this discussion, we must define the
>>> measures of complexity that we consider relevant and we should 
>>> establish
>>> objective methods for measuring those complexity concepts.
>>>
>>> ---
>>>
>>>
>>> Again, from the pragmatic point of view, I agree with Patrick, that
>>> we should probably start writing prototypes in the sandbox, and base
>>> our discussions in more concrete cases. We probably will need multiple
>>> iterations of design/implementation/testing on every component before
>>> we find the right balance between specialization and generality.
>>> On the bright side, that is what agile programming is very good at.
>>>
>>>
>>>
>>>
>>>       Regards,
>>>
>>>
>>>           Luis
>>>
>>>
>>>
>>> -----------------------
>>> Frank Lindseth wrote:
>>> > Luis (and others),
>>> >
>>> > We had a long discussion about "many simple specialized" components
>>> > vs. "fewer, more complex and general components" after you had to  
>>> leave
>>> > the Tcon yesterday (we should probably have started with this  
>>> topic).
>>> > It seems like the common opinion is that in order to make it simpler
>>> > for the app. developer to satisfy the clinical user requirements   
>>> it's
>>> > sensible to move a little bit in the more general direction for  
>>> some of
>>> > the components, at the same time the components should not  become so
>>> > complex that it's not possible to test them in the ordinary  way, we
>>> > have to find the right balance.
>>> > I know you have strong feelings about this Luis, but do you (or  
>>> anybody
>>> > else for that matter) think that a compromise can be found  somewhere
>>> > along the simple comp./complex app - complex comp./simple  app. line?
>>> > As you know, this has been my main IGSTK concern from day one, and I
>>> > really need some input as to what to except as our "IGSTK practical
>>> > trial period" is about to end and we have to take the big decision
>>> > regarding what to base future IGS efforts on (it looks promising
>>> > regarding other issues, e.g. the "coordinate system" challenge).
>>> >
>>> > If we need to think in terms of concrete scenarios I believe that the
>>> > slicer-comp. should be used (could be specialized both in terms of
>>> > modality and functionality) .
>>> > Some background information / discussion can be found here:
>>> > http://public.kitware.com/IGSTKWIKI/index.php/
>>> > Talk:DesignChallenges#Slicing
>>> >
>>> > A little scenario that can help to trigger some response to this 
>>> e-mail:
>>> > User/surgeon would like to have an IGS system with a certain  
>>> complexity
>>> > in terms of required functionality (will increase over the  years, I
>>> > know...).
>>> > Such an app.  could be realized in different ways depending on 
>>> the  way
>>> > the components are made:
>>> > A) Many, simple and specialized components -> Complex app. will be
>>> > needed (many obj. , switching, etc.) in order to satisfy the user 
>>> above.
>>> > B) Fewer, more complex and general components. -> Simple app.  (to
>>> > satisfy user).
>>> > C) Balanced comp. -> Balanced app.  (to satisfy user).
>>> >
>>> > List of points that can push the balance in one or the other 
>>> direction:
>>> > = User/surgeon
>>> > -Overall safety (not the same as comp. safety):
>>> > * It's easier to test a comp. then it is to test an app. (as long as
>>> > the comp. is not to complex, i.e. up to a certain level)
>>> > * A simple app. is safer and easier to test then a complex one.
>>> > * A complex comp. is of course more difficult to to test then a  
>>> simple
>>> > one, but we should think more like this: lets say that we have  a
>>> > complex comp. Ca that offers the same functionality as two simpler
>>> > comp. Sa and Sb. As long as it's possible to test Ca, knowing 
>>> that  this
>>> > comp. work properly has added more to the overall safety then  
>>> testing
>>> > Sa and Sb separately.
>>> > * etc. (feel free to add points to this list)
>>> >
>>> > = App. developer:
>>> > * In terms of building a user community, the easier it is to build a
>>> > app. with a certain functionality, the better it is. The extreme case
>>> > being that the app. dev. only  connect the high level comp. 
>>> needed  and
>>> > make everything accessible to the user trough a gui.
>>> > * etc. (feel free to add points to this list)
>>> >
>>> > = Comp. developer:
>>> > * resources for dev. maintenance, doc. testing, etc.
>>> > * etc. (feel free to add points to this list)
>>> >
>>> > Have a nice weekend everybody.
>>> > Regards,
>>> > Frank
>>> >
>>> > _______________________________________________
>>> > IGSTK-Developers mailing list
>>> > IGSTK-Developers at public.kitware.com
>>> > http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>> >
>>> _______________________________________________
>>> IGSTK-Developers mailing list
>>> IGSTK-Developers at public.kitware.com
>>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>>
>>
> _______________________________________________
> IGSTK-Developers mailing list
> IGSTK-Developers at public.kitware.com
> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers

-- 
===
Kevin A. Gary, Ph.D.
Assistant Professor
Division of Computing Studies
Arizona State University at the Polytechnic Campus
(480)727-1373
http://dcst2.east.asu.edu/~kgary
kgary at asu.edu