[IGSTK-Developers] "many simple specialized" componentsvs. "fewer, more complex and general components"

Tue Jun 5 22:28:29 EDT 2007

Yep, we definitely need to find the right balance, ideally, a 
guideline for the future development.
I will summarize the discussion in the mailing list, and make it a 
agenda item.

Kevin Cleary wrote:
> It seems like we should at least have some brief discussion about this on
> the teleconference on Thursday
> 
> Both myself and Ziv will only be available for the first 30 minutes of the
> teleconference as we have to leave at 930 am East Coast Time for our faculty
> retreat (Patrick is lucky and does not have to go!)
> 
> So I will ask Patrick to put this at the top of the agenda and I will help
> moderate 5-10 minutes discussion (I am sure we could spend the whole tcon on
> this but we have many other things to discuss)
> 
> For the record, I am on the medium-weight component side of things, but I am
> not sure I know how to define medium ;)
> 
> Kevin
> -------------------------------------------------------------------
> 
> 
> 
> 
> -----Original Message-----
> From: igstk-developers-bounces+cleary=georgetown.edu at public.kitware.com
> [mailto:igstk-developers-bounces+cleary=georgetown.edu at public.kitware.com]
> On Behalf Of Kevin Gary
> Sent: Tuesday, June 05, 2007 8:48 PM
> To: Luis Ibanez
> Cc: IGSTK-developers
> Subject: Re: [IGSTK-Developers] "many simple specialized" componentsvs.
> "fewer, more complex and general components"
> 
> Hi all,
> 
> I'm going to try and offer some thoughts as we are working on validating 
> state machines out here (and we are either going crazy from the desert 
> heat or from the complexity!). I was not part of the original tcon 
> conversation on granularity, so I am having a little trouble trying to 
> understand Frank's original email; I apologize if I misunderstand anything.
> 
> Some conceptual points:
> 1. Components should have a "natural complexity" based on their 
> cohesiveness. That is, they should intuitively encapsulate a set of 
> behaviors and state, like a coarse-grained object. An agile approach 
> here helps because their should be mindshare as to whether a given 
> component are intuitively at the right level. I would also mention our 
> community can help a lot, as original  component developers may assume a 
> component is easier to understand than it is to the novice.
> 2. Components and classes are not the same thing (or shouldn't be). In 
> IGSTK they are though, which can add some confusion. A component may be 
> implemented as a collection of classes aggregated in specific and 
> meaningful ways. In fact I see no reason why IGSTK components could not 
> be composed from multiple classes.
> 3. How a component should be used is very important and should be 
> reified in the software. This can be a very tricky thing to do, as many 
> toolkits such as IGSTK intend for their components to be used in ways 
> that are unanticipated. As components at higher levels of aggregation 
> encapsulate more specific use cases (#4), their interfaces (and state 
> machines) should reflect that, this is commonly referred to as 
> Intentional Programming.
> 4. I see no reason for components to be at the same level anyway. A 
> component Ca may be implemented by a specific combination of components 
> Cb and Cc. In fact, this composition style is often one moves from 
> application-independent and reusable components to more 
> application-specific (and thus narrowly scoped) ones.
> 
> Some pointed comments from reading the email trace:
> - I would say there is a 4th type of complexity, and that is the 
> complexity of a programmer to program within the IGSTK coding model. 
> We've had this discussion several times before, along the lines of "are 
> we making the coding model so different from standard practice that it 
> actually makes our code unsafe?". I don't want to revisit that argument, 
> except to say that it is a form of complexity - even if you can 
> guarantee the state machine will not get into an unsafe state, you can 
> still have many frustrating situations contrary to the developer's 
> expectations. The Event model within IGSTK is an example to me. The 
> application workflow effort (is that underway?) is an example solution.
> - The example in Luis' email of 5/31 (the "swiss-army-knife") is fine 
> though I don't think it is a representative case. One really shouldn't 
> implement a component in this way, using conditional logic over 
> polymorphism. Of course, accounting for dynamic binding in an 
> application is a form of conditional logic that needs to be tested as well.
> 
> On measures:
> The Markov Chain analysis is interesting, but I have a couple questions:
> 1. Assuming uniform probabilities on transitions is not appropriate for 
> IGSTK. In fact these components are constructed with some pretty blatant 
> assumptions about what input is most likely next. We intend to create 
> stochastic testing simulations in our tools to account for expected 
> outcomes and fault scenarios.
> 2. Why is the number of states (inputs) in Ca equal to the product of 
> the number of states (inputs) in Sa and Sb? As your assumption is Sa and 
> Sb are independent of one another, I would think it would be the sum 
> (basically, once the Ca's machine is set ready to run, there would be 2 
> independent graphs in the machine with no interaction between them)?
> 3. It is counterintuitive that "a component with 100 transition has just 
> double the complexity of a component with 10 transitions". In fact the 
> literature on the state explosion problem suggests it is exponential, 
> not logarithmic, and that is certainly the case for our coverage tools. 
> Harel's seminal work on Statecharts and HSMs is found on compactness of 
> representation (inline with your Kolmogorov complexity measure) in order 
> to reduce states and transitions. This is important because your measure 
> would suggest that larger-grained components are better. I think for 
> IGSTK smaller state machines are better because they are more 
> human-understandable and testable. The complexity that comes from 
> composing components in other more flexible toolkits doesn't really 
> exist in IGSTK due to its strong reliance on static binding and lack of 
> external configuration files.
> 
> We focus a lot on algorithmic complexity, and I'm not sure that was the 
> original motivation for Frank's email. It would seem proper usage of 
> components by application developers is. We also tend to think that our 
> design reduces complexity, but complexity is an inherent attribute of 
> the problem space, not the solution space - all we can do is move it 
> around and manage it so our solutions are easy to understand, test, and 
> reuse. On the one hand IGSTK helps with some of this but then introduces 
> its own set of issues as suggested above. In any event, I think Frank's 
> question is more about factoring complexity into granular components 
> (and application versus framework components) than it is about the 
> complexity of the algorithms and state machines. (?)
> 
> There are also a lot of metrics out there for component complexity that 
> are not algorithmic. The more traditional ones in object-oriented 
> programming include dependency analysis, fan-in/out, coupling, and 
> cohesion. There are also complexity measures that can be applied to 
> statecharts that evaluate their structure - for example average 
> branching factor. These all can be readily included into a DART 
> dashboard with threshold measures defined that suggest warnings on the 
> dashboard much like kwStyle.
> 
> Finally, I'll say 9as I have said before) that IGSTK tends to rely 
> solely on state machines and unit testing to achieve safety. There are a 
> number of safety-oriented programming and engineering practices that we 
> can also look at - error/fault/failure analysis, requirements 
> management, and so forth. These may not sound Agile, but I think the 
> application domain necessitates our considering them.
> 
> Thanks,
> K2
> 
> 
> 
> 
> Luis Ibanez wrote:
>> Hi David,
>>
>>
>>                     That sounds reasonable.
>>
>>
>> At this point, it is just a matter of defining an
>>
>>
>>                 "Objective Measure of Complexity"
>>
>>
>> and with it, we could proceed to define a Threshold of how much
>> "complexity" is acceptable in an IGSTK component.
>>
>>
>> The label "Too Complex" doesn't make any sense if we don't have an
>> objective metric that can tell use how much complexity is too much
>> complexity.
>>
>>
>> Without an objective measure we will end up engaging in pointless
>> discussions, because the degree of complexity will be left to the
>> subjective aesthetic perception of every developer.
>>
>>
>>
>> My suggestion for objectively measuring the complexity of an IGSTK
>> component is to use the notion of Markov Process / Chains:
>>
>>             http://en.wikipedia.org/wiki/Markov_chain
>>
>> in the following way:
>>
>>   In the State Machine of the component, take the transition table,
>>   and evaluate the probabilities of every transition for being
>>   invoked. Then compute the Entropy of that set of probabilities,
>>   and use it as a measure of the "complexity" of the component.
>>
>>
>> In this context, a component with 5 states, and 7 inputs, will
>> have 35 transitions. In the plain case were all transitions are
>> equally likely to be triggered, their probabilities are 1/35.
>> then the component will have a complexity of
>>
>>
>>    K = - Sum (from 1 to 35) of (1/35) [ log( 1/35 ) / log(2) ]
>>
>>    K = 5.12 bits.
>>
>>
>> A component with 20 equally probable transitions will have a
>> complexity K = 4.32 bits.
>>
>>
>> I will suggest that acceptable threshold of complexity for IGSTK
>> components should be 5 bits. This corresponds to a state machine
>> table of 32 equally probable transitions.
>>
>>
>> If you look a the Wiki page that evaluates the completeness of
>> the transition tables in IGSTK state machines:
>>
>>
> http://public.kitware.com/IGSTKWIKI/index.php/State_Machine_Transition_Table
> s_Completeness 
>>
>> you will find that the components with the maximum number of
>> transitions are the ToolCalibration ant the Tracker, with:
>>
>> ToolCalibration : 171 Transitions : 164 of which are undefined
>> Tracker         :  90 Transitions :  80 of which are undefined
>>
>> If we assume that the undefined transitions will never happen,
>> (which is probably the reason why the developers never considered
>> this transitions in the table, in the first place), and we assume
>> that the defined transitions are equally probable, then we get:
>>
>>
>>     K( Tracker )         = 3.32 bits
>>     K( ToolCalibration ) = 2.8  bits
>>
>>
>> In the case where some of the transitions are more likely than
>> others, the Entropy of the transition table will diminish and
>> therefore the K measure of complexity will be lower.
>>
>>
>> This measure of complexity reflects the intuition that a complex
>> components have more functionality ("transitions"), and that it
>> has more uncertainty about its current state. It also matches
>> the notion that more complex components will require more lines
>> of code for performing a 100% code coverage.
>>
>>
>> Note that this measure of complexity is logarithmic in nature:
>>
>>
>>   a component with 100 transitions has just the double
>>   of complexity of a component with 10 transitions.
>>   That is, 6.6 bits versus 3.3 bits.
>>
>>
>>   We should keep this in mind when we compare the complexity
>>   of two components, or the complexity of two implementation
>>   of the same component.
>>
>>
>> One nice property of this suggested measure is that if
>> we take two components Sa and Sb, as Frank suggested earlier,
>> each one with complexity measures  K(Sa) and K(Sb) respectively,
>> and we assume that their functionalities are completely orthogonal,
>> that is, they are not redundant, and we fuse them together in a
>> single "more complex" component, the transition table of the combined
>> state machine in Ca will have a number of states equal to the product
>> of the number of states in Sa times the number of states in Sb.
>> Similarly its number of inputs will be the product of the number of
>> inputs in Sa times the number of inputs in Sb.  As a result the
>> measure of complexity of Ca will satisfy:
>>
>>
>>             K( Ca )  =   K( Sa )   +    K( Sb )
>>
>>
>> If Sa and Sb are not orthogonal, then the joint probability of
>> their transitions will not be the produce of the independent
>> probabilities, and we will find that Ca has a lower complexity
>> than the two independent Sa and Sb components.
>>
>> In this context we also can interpret the effect of factorizing
>> functionality of Sa, Sb into a C++ base class Sc.
>>
>>
>>
>>     Luis
>>
>>
>>
>> -----------------
>> David Gobbi wrote:
>>> Hi Luis,
>>>
>>> I'm with Frank on the idea that complex components are preferable to
>>> forcing the application programmer to write a complex app that has to
>>> connect many simple components into a complex web.
>>>
>>> As long as a component can be fully understood, code-covered, and
>>> tested, it is unfair to call that component "too complex".  Splitting
>>> such a component in two "just because we can" is not a good enough
>>> reason, we must also justify our decision in terms of functionality.
>>>
>>> A problem with specialized components is that it means we have more
>>> components to test, and each component is likely to receive less
>>> testing (we don't have unlimited resources).  Also, if the components
>>> are too constrained, then they will only be able to serve the needs of
>>> a very small audience.
>>>
>>> Our primary means of achieving safety should be through testing and
>>> code review.  For the actual implementation of the code, we should
>>> focus on functionality.
>>>
>>> - David
>>>
>>>
>>> On 5/31/07, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>>>
>>>>
>>>> Hi Frank,
>>>>
>>>> I agree that we should strive to find the right balance
>>>> in the granularity of IGSTK components.
>>>>
>>>>  From the Algorithmic Theory point of view, we will know
>>>> whether a component is attempting to do too much or not,
>>>> by counting the number of "if"-like statements in the code.
>>>>
>>>> That will include "if", "switch", and ternary "a?b:c"
>>>> statements. When we try to engulf in a single component
>>>> the functionalities that should be implemented in two or
>>>> more independent components, we will find ourselves
>>>> introducing:
>>>>
>>>>   a) large numbers of states in the State Machine, or
>>>>   b) large numbers of inputs in the State Machine, or
>>>>   c) "if" conditions that split the different cases, or
>>>>   d) "switch" statements that split different cases
>>>>
>>>> Some of them will presumably be driven by "enums" and "bool"
>>>> flags that set the components in "this mode" or "this other mode".
>>>> The presence of these elements will be an indication of a component
>>>> that has grown too complex and that should be refactored/slit
>>>> into simpler components.
>>>>
>>>> Where do we draw that line, is what is open for discussion,
>>>> and we probably have to do it on a case by case basis.
>>>>
>>>>  From the pragmatic point of view, we can simply follow the practice
>>>> of agile programming. Let's start by putting a prototype
>>>> implementation of the component in the sandbox, and as part
>>>> of its code review we can discuss if it should be split into
>>>> multiple components or not.
>>>>
>>>> A clear sign will be how many lines of code do you need in the
>>>> test in order to ensure 100% code coverage of the component.
>>>> So, just by following our normal development process, the
>>>> components that are too complex will clearly stand out during
>>>> code reviews and during continuous dashboard testing.
>>>>
>>>>
>>>>
>>>> --------
>>>>
>>>>
>>>> Regarding the specific example that you mention:
>>>>
>>>> Before engaging in a discussion related to "complexity" we must
>>>> define what it means and how to measure it objectively.
>>>>
>>>> There are multiple concepts of complexity that we may want to
>>>> consider here, some of them are listed in the Wikipedia entry:
>>>>
>>>>       http://en.wikipedia.org/wiki/Complexity
>>>>
>>>> When it comes to software, there are at least two measures of
>>>> complexity that are relevant:
>>>>
>>>>
>>>> 1) How many lines of code it takes to write a program.
>>>>     This complexity measure is equivalent to Kolmogorov Complexity:
>>>>
>>>>     http://en.wikipedia.org/wiki/Kolmogorov_complexity
>>>>
>>>>     where the string to be generated is the sequence of states of
>>>>     the application. States, here being the full set of variables
>>>>     that completely defines the application.
>>>>
>>>>
>>>> 2) How many different options there are available for using a
>>>>     program (or a routine, or a component). And therefore how
>>>>     many decision should be made by the application developer
>>>>     in order to configure the application for a particular
>>>>     user case.
>>>>
>>>>
>>>> 3) How many steps are required from the user of the application
>>>>     in order to perform a task. This is the "complexity" perceived
>>>>     by a user.
>>>>
>>>>
>>>> In your suggested problem, you seem to be focused on (1) and (2),
>>>> rather than (3), and the underlying assumption seems to be that by
>>>> increasing the complexity of the components, we may be able to
>>>> reduce the complexity of an application.
>>>>
>>>>
>>>> Following your description of the problem, let's consider
>>>> the two cases:
>>>>
>>>>         A) a component Ca
>>>>         B) two components Sa and Sb
>>>>
>>>> where (Ca) offers the same functionality that (Sa+Sb)
>>>>
>>>> and the complexity of Ca, let's call it Comp(Ca) is larger than
>>>> the individual complexities of each Sa and Sb,
>>>>
>>>> That is
>>>>
>>>>          Comp(Ca)  >=  Comp(Sa)
>>>>          Comp(Ca)  >=  Comp(Sb)
>>>>
>>>>
>>>>  From the application developer point of view, if we use the notion
>>>> of complexity (2), it comes down to how many method decision should
>>>> be made in order to use the component Ca, versus, how many decision
>>>> should be made in order to use Sa & Sb.
>>>>
>>>> For example, let's say that Ca is a "swiss-army-knife" image slicer,
>>>> that can do:
>>>>
>>>>   a) 1 slice orthogonal to a needle, and touching the tip
>>>>   b) 3 orthogonal slices parallel to image axes and passing
>>>>        through the needle tip.
>>>>
>>>> and that Sa and Sb are respectively the independent components that
>>>> could do only (a) and only (b).
>>>>
>>>>  From the point of view of the application developer, in the case
>>>> of using Ca, the application should have an "if" statement that
>>>> switches between the use of functionality (a) and functionality (b)
>>>> at compile time or at run time (or both). In the case of using Sa
>>>> and Sb, the application developers must also set an "if" statement
>>>> indicating when to display slices using Sa, and when to use Sb.
>>>>
>>>> In this context, from the point of view of the application developer,
>>>> and using the concept of complexity (2), there is no difference between
>>>> using Ca and using Sa+Sb.
>>>>
>>>> On the other hand, the testing scenario for Ca requires to exercise
>>>> all the features of Sa plus all the features of Sb, with the 
>>>> aggravation
>>>> that some of the settings that make sense in the "Sb" mode of Ca,
>>>> may not make sense in the "Sa" mode of Ca.
>>>>
>>>>
>>>> Note also that it is quite likely that common functionalities of Sa
>>>> and Sb may be factorized into a base class Sab from which both Sa
>>>> and Sb will derive.
>>>>
>>>>
>>>> Before proceeding further with this discussion, we must define the
>>>> measures of complexity that we consider relevant and we should 
>>>> establish
>>>> objective methods for measuring those complexity concepts.
>>>>
>>>> ---
>>>>
>>>>
>>>> Again, from the pragmatic point of view, I agree with Patrick, that
>>>> we should probably start writing prototypes in the sandbox, and base
>>>> our discussions in more concrete cases. We probably will need multiple
>>>> iterations of design/implementation/testing on every component before
>>>> we find the right balance between specialization and generality.
>>>> On the bright side, that is what agile programming is very good at.
>>>>
>>>>
>>>>
>>>>
>>>>       Regards,
>>>>
>>>>
>>>>           Luis
>>>>
>>>>
>>>>
>>>> -----------------------
>>>> Frank Lindseth wrote:
>>>>> Luis (and others),
>>>>>
>>>>> We had a long discussion about "many simple specialized" components
>>>>> vs. "fewer, more complex and general components" after you had to  
>>>> leave
>>>>> the Tcon yesterday (we should probably have started with this  
>>>> topic).
>>>>> It seems like the common opinion is that in order to make it simpler
>>>>> for the app. developer to satisfy the clinical user requirements   
>>>> it's
>>>>> sensible to move a little bit in the more general direction for  
>>>> some of
>>>>> the components, at the same time the components should not  become so
>>>>> complex that it's not possible to test them in the ordinary  way, we
>>>>> have to find the right balance.
>>>>> I know you have strong feelings about this Luis, but do you (or  
>>>> anybody
>>>>> else for that matter) think that a compromise can be found  somewhere
>>>>> along the simple comp./complex app - complex comp./simple  app. line?
>>>>> As you know, this has been my main IGSTK concern from day one, and I
>>>>> really need some input as to what to except as our "IGSTK practical
>>>>> trial period" is about to end and we have to take the big decision
>>>>> regarding what to base future IGS efforts on (it looks promising
>>>>> regarding other issues, e.g. the "coordinate system" challenge).
>>>>>
>>>>> If we need to think in terms of concrete scenarios I believe that the
>>>>> slicer-comp. should be used (could be specialized both in terms of
>>>>> modality and functionality) .
>>>>> Some background information / discussion can be found here:
>>>>> http://public.kitware.com/IGSTKWIKI/index.php/
>>>>> Talk:DesignChallenges#Slicing
>>>>>
>>>>> A little scenario that can help to trigger some response to this 
>>>> e-mail:
>>>>> User/surgeon would like to have an IGS system with a certain  
>>>> complexity
>>>>> in terms of required functionality (will increase over the  years, I
>>>>> know...).
>>>>> Such an app.  could be realized in different ways depending on 
>>>> the  way
>>>>> the components are made:
>>>>> A) Many, simple and specialized components -> Complex app. will be
>>>>> needed (many obj. , switching, etc.) in order to satisfy the user 
>>>> above.
>>>>> B) Fewer, more complex and general components. -> Simple app.  (to
>>>>> satisfy user).
>>>>> C) Balanced comp. -> Balanced app.  (to satisfy user).
>>>>>
>>>>> List of points that can push the balance in one or the other 
>>>> direction:
>>>>> = User/surgeon
>>>>> -Overall safety (not the same as comp. safety):
>>>>> * It's easier to test a comp. then it is to test an app. (as long as
>>>>> the comp. is not to complex, i.e. up to a certain level)
>>>>> * A simple app. is safer and easier to test then a complex one.
>>>>> * A complex comp. is of course more difficult to to test then a  
>>>> simple
>>>>> one, but we should think more like this: lets say that we have  a
>>>>> complex comp. Ca that offers the same functionality as two simpler
>>>>> comp. Sa and Sb. As long as it's possible to test Ca, knowing 
>>>> that  this
>>>>> comp. work properly has added more to the overall safety then  
>>>> testing
>>>>> Sa and Sb separately.
>>>>> * etc. (feel free to add points to this list)
>>>>>
>>>>> = App. developer:
>>>>> * In terms of building a user community, the easier it is to build a
>>>>> app. with a certain functionality, the better it is. The extreme case
>>>>> being that the app. dev. only  connect the high level comp. 
>>>> needed  and
>>>>> make everything accessible to the user trough a gui.
>>>>> * etc. (feel free to add points to this list)
>>>>>
>>>>> = Comp. developer:
>>>>> * resources for dev. maintenance, doc. testing, etc.
>>>>> * etc. (feel free to add points to this list)
>>>>>
>>>>> Have a nice weekend everybody.
>>>>> Regards,
>>>>> Frank
>>>>>
>>>>> _______________________________________________
>>>>> IGSTK-Developers mailing list
>>>>> IGSTK-Developers at public.kitware.com
>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>>>>
>>>> _______________________________________________
>>>> IGSTK-Developers mailing list
>>>> IGSTK-Developers at public.kitware.com
>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>>>
>> _______________________________________________
>> IGSTK-Developers mailing list
>> IGSTK-Developers at public.kitware.com
>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
> 
>