Complexity

From IGSTK

Jump to: navigation, search

Algorithmic complexity is an important concept to consider when addressing the design of software components.

This pages provides some background on the concept of algorithmic complexity and potential ways of measuring it using objective quantitative metrics.

Contents

What is an Algorithm

http://en.wikipedia.org/wiki/Algorithm

The concept of an algorithm originated as a means of recording procedures for solving mathematical problems such as finding the common divisor of two numbers or multiplying two numbers. The concept was formalized in 1936 through Alan Turing's Turing machines and Alonzo Church's lambda calculus, which in turn formed the foundation of computer science.

Measuring Complexity

There are different aspect of complexity that are commonly associated to Algorithms.

For example

In the context of IGSTK, since most of the C++ classes are implemented based on State Machines (close to Turing Machines), multiple measures of complexity are applicable. Here we suggest to consider further some of these measures.

Markov Chain Approach

Definition

http://en.wikipedia.org/wiki/Markov_chain

In mathematics, a Markov chain, named after Andrey Markov, is a discrete-time stochastic process with the Markov property.

A Markov chain is a series of states of a system that has the Markov property. At each time the system may have changed from the state it was in the moment before, or it may have stayed in the same state. The changes of state are called transitions.

If a sequence of states has the Markov property, it means that every future state is conditionally independent of every prior state given the current state.

Markov Property

In probability theory, a stochastic process has the Markov property if the conditional probability distribution of future states of the process, given the present state and all past states, depends only upon the present state and not on any past states, i.e. it is conditionally independent of the past states (the path of the process) given the present state. A process with the Markov property is usually called a Markov process, and may be described as Markovian.

Application to Measuring Complexity in IGSTK

The execution of a State Machine results in a sequence of transitions from state to state. Given a transition table, some of those transitions will happen more often than others. The sequence of transitions can be assimilated to a Markov chain in the sense that there is probability associated with every transition happening. We can also assume that the probability of every transition is independent of the previous transitions

A proposed measure of Complexity for IGSTK is based on the Entropy of the Markov Chain associated to the execution of the State Machine of a given IGSTK class.

In order to build this concept from the base, let's consider first a class that provides a single method whose execution is linear and does not have any internal conditions. Such class will have the lowest level of complexity.

In this context the following properties are irrelevant:

  • How much time does it take to execute the method.
  • How many lines of code were written to implement the method

We have established in previous discussions that when an algorithm is properly implemented in terms of a State Machine, its code shouldn't contain any branching instructions, such as "if", "switch" and C++ ternary operators "bool?true:false". Instead, all the branching and decision making is hard coded in the possible transitions of the State Machine. We assume here that the IGSTK class has been implemented correctly, and therefore, no branching instructions exist inside the code.

In this conditions, the richness in behavior of the class is defined by the potential number of transitions and their respective probabilities of occurrence. Considering the probabilities is important because it is possible that a few transitions may dominate the behavior of the class. In other words, a class may be most of the time cycling to a small subset of the transitions, while other transitions are very rarely triggered. Taken to the limit we can imagine that some of the transitions may simply never be triggered, and therefore they represent functionality that may not be needed and could be removed. This has to be done upon careful consideration, given that some of those transitions may correspond to error conditions that happen very rarely but when they do, they may be critical.

The richness of choices of behavior, as stated in the transition table can then be measured by the uncertainty of the transition. A class whose execution is almost linear will have a lower level of uncertainty and therefore will have a lower level of complexity. A class whose execution allows for many options, will have a high level of uncertainty, and therefore a high level of complexity. Based on this intuition we propose here to use the Entropy of the transition table as a measure of algorithmic complexity for IGSTK classes.


Example

For a hypothetical class T, with a set of three states {S1,S2,S3} and two inputs {I1,I2}, the transition table will have six entries


Transitions I1 I2
S1 S2 S3
S2 S2 S1
S3 S1 S2


Where the probabilities of occurrence of the transitions, assuming that the Markov Property holds, are equal to the values in the following table

Transitions Probabilities I1 I2
S1 0.10 0.05
S2 0.25 0.20
S3 0.35 0.05


Transition uncertainty is given by the negative log in base two of the transition probability and its measurement units are bits.

- log2P(Si,Ij)

The following table shows the uncertainty values

Transitions Uncertainty I1 I2
S1 3.32 bits 4.32 bits
S2 2.00 bits 2.32 bits
S3 1.51 bits 4.32 bits


The Entropy of the transition table is given by the sum of weighted uncertainties of each transition

- \sum_{i=0}^{3}\sum_{j=0}^{2} P(Si,Ij) \log_2 P(Si,Ij)

The weighted uncertainties are computed by multiplying the uncertainty times the probability of the transition, which results in the following table

Weighted Transitions Uncertainty I1 I2
S1 0.332 bits 0.216 bits
S2 0.500 bits 0.464 bits
S3 0.530 bits 0.216 bits

And the sum of these weighted uncertainties, provides the expected uncertainty of a transition, which is also the Entropy of the transition table.

In the case of this hypothetical class and transition table the Entropy value is

Entropy 2.2589 bits

The computation was made with the following Excel Spreadsheet

Increasing Complexity

Here we analyze the effect that adding features to a class will have on the measure of complexity.

Adding a Global boolean mode

Let's assume that we add a boolean state variable to the class in order to split its behavior in two modes. The correct way of implementing this in a state machine is by splitting all the original states into two states. That is, for an original state S, now we have states S0, S1, where S0 represents state S when the boolean flag is false, and S1 represents state S when the boolean flag is true. In this context, the number of states is double, but the number of Inputs remains constant. Therefore, the number of transitions is doubled.

For simplicity we can examine first what happens if the new states simply split uniformly the probabilities of the transition. We are not claiming that this is how it happens in practice, we are just taking here the easy case to analyze it first. When a given probability is divided by two we replace the weighted uncertainty

 U = - P \log_2 P = P \log_2 \frac{1}{P}

with the sum of two uncertainties whose probabilities are half of the original

 U2 = \frac{P}{2} \log_2 \frac{2}{P} + \frac{P}{2} \log_2 \frac{2}{P} = P \log_2 \frac{2}{P}

 U2 = P \log_2 { 2 \frac{1}{P} } = P \log_2 { 2 } + P \log_2 { \frac{1}{P} } = P + P \log_2 { \frac{1}{P} } = P + U1

If we apply this to each one of the transitions, we will end up with a new value of Entropy that is

E2 = E1 + 1bit

Which matches the intuition that adding a global boolean to a class, increases its uncertainty by 1 bit.

It is known that the partition of the two probabilities into equal values is the one that yields the maximum Entropy. Any other partition will result in a smaller Entropy. That means that if instead of splitting a transition of probability P in two transitions of probability {P/2,P/2} we choose to split it into transitions of probability { P/3, 2P/3 }, then the resulting increase of Entropy will not have been 1 bit.

 U2 = \frac{P}{3} \log_2 \frac{3}{P} + \frac{2P}{3} \log_2 \frac{3}{2P}

 U2 = \frac{P}{3} \log_2 3\frac{1}{P} + \frac{2P}{3} \log_2 \frac{3}{2} \frac{1}{P}

 U2 = \left[ \frac{P}{3} +  \frac{2P}{3} \right] \log_2 \frac{1}{P} +  \frac{P}{3} \log_2 3 + \frac{2P}{3} \log_2  \frac{3}{2}

 U2 = P \log_2 \frac{1}{P} +  \frac{P}{3} \log_2 3 + \frac{2P}{3} \log_2  3 + \frac{2P}{3} \log_2  \frac{1}{2}

 U2 = P \log_2 \frac{1}{P} +  P \log_2 3 - \frac{2P}{3}

 U2 = P \log_2 \frac{1}{P} +  P \left[ \log_2 3 - \frac{2}{3} \right]

 U2 = P \log_2 \frac{1}{P} +   0.918 P  = U1 + 0.918 P

if applied to all the transitions then we will end up with

E2 = E1 + 0.918bits

Adding a Local boolean mode

Let's consider now that the new boolean condition is not relevant to all states in the State Machine but only to a small subset of them. In this case, then the split of the uncertainty will affect a smaller fraction of the global Entropy.

From the equations of the previous section we can see that if we add the probabilities of the transitions from states that are affected by the boolean condition, that value times 1 bit is going to be the increase of uncertainty of the state machine behavior.

Pb = P
booleanrelevant
Pc = P
nobooleanrelevant

Pc + Pb = 1

And we assume still a uniform split of the probabilities of the states considered in Pb, then the increase of Entropy will be

E2 = E1 + Pbbits


Example

If we apply this to the previous hypothetical class T, that initially had a set of three states {S1,S2,S3} and two inputs {I1,I2}, and we claim that the new boolean flag is going to be relevant only for state S3, then we have new states S3t and S3f, where {t,f} stand for "true" and "false" of the boolean flag. The transition table then becomes


Transitions I1 I2
S1 S2 S3t
S2 S2 S1
S3t S1 S3f
S3f S3t S2


Where the probabilities of occurrence of the transitions, assuming that the Markov Property holds, are equal to the values in the following table

Transitions Probabilities I1 I2
S1 0.10 0.05
S2 0.25 0.20
S3t 0.28 0.03
S3f 0.07 0.02


The following table shows the uncertainty values

Transitions Uncertainty I1 I2
S1 3.32 bits 4.32 bits
S2 2.00 bits 2.32 bits
S3t 1.83 bits 3.83 bits
S3f 5.05 bits 5.64 bits


The Entropy of the transition table is given by the sum of weighted uncertainties of each transition

- \sum_{i=0}^{3}\sum_{j=0}^{2} P(Si,Ij) \log_2 P(Si,Ij)

The weighted uncertainties are computed by multiplying the uncertainty times the probability of the transition, which results in the following table

Weighted Transitions Uncertainty I1 I2
S1 0.332 bits 0.216 bits
S2 0.500 bits 0.464 bits
S3t 0.514 bits 0.268 bits
S3f 0.152 bits 0.113 bits

And the sum of these weighted uncertainties, provides the expected uncertainty of a transition, which is also the Entropy of the transition table.

In the case of this hypothetical class and transition table the Entropy value is

Entropy 2.5601 bits

Which compared to the Entropy ("Complexity") of the original class represents an increase of 0.3 bits in Complexity.


The computation was made with the following Excel Spreadsheet


The value of 1 bit should not be underestimated. It may sound small, but in practice, an increase of complexity by 1 bit implies that the effort o fully testing the code of the class has to be DOUBLED in terms of lines of code.

Adding a Enum set of N modes

If instead of adding a boolean flag, we decide to add an Enum, and based on it, control N modes of behavior, then the increase in complexity is going to be larger.

The correct conversion of a Enum set into a state machine implies to replicate the states by the number of entries in the Enum set. That is, if the Emum set has five entries representing five different modes of operation, and they are relevant for all the states of the state machine, then the number of states gets multiplied by five, and so is the number of transitions in the transition table.

Repeating the computational exercise we will find that complexity is increased by

log25 = 2.32bits

This increase will be smaller if the Enum modes are not equally probable, and even smaller, if the modes are not relevant to all the states, but just to a subset of the states.

Personal tools
TOOLBOX
LANGUAGES