Error Handling

From IGSTK

Jump to: navigation, search

Contents

Preface

This page is for discussion of how IGSTK should detect and respond to error conditions. Some discussion of the kinds of the errors that can occur during use of IGSTK has already been covered in Tracker_Hazards.

Kinds of Errors

The following are the kinds of errors that IGSTK will have to respond to:

  1. Input/Output Errors
    1. Readers/Writers unable to read/write a file (file doesn't exist or is corrupt)
    2. Tracker unable to communicate with tracking device
    3. Network connection is broken (if networking is necessary)
  2. Bad or Missing Data Errors
    1. No line-of-sight when attempting to measure Polaris tool location
    2. Registration failure (landmarks incorrectly identified)
  3. Invalid/Illegal requests by an application using the toolkit
    1. For example, landmark registration request by an application before landmark points are established

(Does anyone want to expand on this?)

Functional Requirements

There are no existing IGSTK requirements for error handling. The following could be used as a starting point. The first two are toolkit-level requirement, and the others are application-level requirement.

  1. All Input/Output errors, whether related to the file system, the network, or any connected devices, must be detected.
  2. All situations where required data is missing or unusable must be detected.
  3. When a recoverable error occurs, the user must be informed and given instructions on how to remedy the error.
  4. When a recoverable error occurs, the user must be given the option of declaring it to be unrecoverable, via a mechanism similar to a "Retry/Fail" dialog box.
  5. When a nonrecoverable error occurs, the user must be informed and the component that sufferred the error must become unavailable.

Design

From a design perspective, we must consider how the igstk classes will notify other classes (or the application) that an error has occurred. There are several options. These are not mutually exclusive, we can implement any or all of these.

  1. Return Values (methods return a value indicating success or failure)
  2. Error Variables (an object has an internal Error variable and a GetError() method)
  3. Exceptions (meaning exceptions as defined by the C++ language standard)
  4. Events (meaning IGSTK events, which are derived from ITK's events)

There are several "boundaries" between layers of code where errors will have to be handled

  1. the boundary between an object's code and the operating system, e.g. access to files, devices, and the network
  2. the boundary between objects, e.g. between a tracker and a communicator, or between a spatial object and a tracker
  3. the boundary between objects and the application
  4. the boundary between the application and the user

Return Values

Having a method return a boolean error value can provide very compact code.

if (m_Communication->Read(data) == false)
  {
  do something;
  }

Currently, return codes are used by classes derived from igstkTracker and igstkSerialCommunication to report the success or failure of internal function calls. In the following chunk of code, we use the boolean value from InternalOpenPort() to directly set the input to the state machine:

void SerialCommunication::AttemptToOpenPort()
{
  m_StateMachine.PushInputBoolean( (bool)InternalOpenPort(), 
                                   m_OpenPortSuccessInput,
                                   m_OpenPortFailureInput );
}

This kind of error checking works very nicely, but when it is used in an application there is always the chance that the caller of the method might not check the return value, resulting in an undetected error.

Return values are an ideal method of handling errors within a single class because of their simplicity.

Error Variables

Each IGSTK object could have a "GetError()" method that would return zero if no error has occurred within the object, or an enumerated error code if an error has occurred.

Every time a method is called, that method would clear the m_Error variable if no error occurred, or set the m_Error variable if an error did occur.

m_Communication->Read(data);
if (m_Communication->GetError() == ERROR_VALUE)
  {
  do something;
  }

The code is not as compact as for return codes, but the advantage is that error variables can be used even if the method already returns something:

data = m_Communication->Read();
if (m_Communication->GetError())
  {
  do something;
  }

Or, if the error code is used to drive the state machine:

void Tracker::AttemptToReadFromCommunicationObject()
{
  m_Communication->Read();
  m_StateMachine.PushInputBoolean( (m_Communication->GetError() == m_Communication::SUCCESS), 
                                   m_ReadSuccessInput,
                                   m_ReadFailureInput );
}

There is one major drawback to error variables: if the method we are calling can be called from more than one thread, then it is difficult to ensure that GetError() will give the correct error result.

As an example of this, the standard "C" library uses the "errno" value to store the error values of system function calls. This caused lots of problems for "C" library developers when people started writing multithreaded applications, because "errno" had to be a different value depending on which thread it was checked from!

Currently the igstkNDICommandInterpeter class has a GetError() method, but the only reason for this is that it was derived from an older package that used GetError(). It is okay to use GetError() for classes that will only be accessed from a single thread, but in general, using return values is safer and is preferred.

Exceptions

Code for catching exceptions looks like this:

try
  {
  m_Communication->Read(data);
  }
catch (...)
  {
  do something;
  }

The advantage of using exceptions is that exceptions cannot "slip by unnoticed". If the caller of a method does not catch the exception, then the program fails immediately.

In contrast, if return values or error variables are used, and the caller neglects to check whether an error occurred, the program might crash, or it might continue marching along in an unstable state.

Exceptions are used in ITK but not in VTK. They are not used anywhere in IGSTK.

Good Practices

The use of exceptions require that developers follow basic guidelines on the organization of the code. The following list of condicions is one of the typical ways of ensuring that exceptions are used safely in the code.

  • Basic Condition: After an exception is thrown, there is no leakage of resources (memory, file handles, mutex locks)
  • Weak Condition: After an exception is thrown, the class is left in a valid state
  • Strong Condition: After an exception is thrown, the class is left in the previous state

The strong condition is of course the safest, and it is also the hardest one to achieve in practice.

A possible methodology for implementing the Strong condition is to use the approach of contract-programming, where all transactions between classes involve

  • Pre-condition
  • Action
  • Post-condition

The precondition must be validated before the class attempt to perform the Action. The post-condition is guarranted to be true after the execution of the Action.

In term of contract programming, the Pre-condition is to be satisfied by the client of the class, while the post-condition is guarranted by the class that provide the services.

Events

Events are used in VTK to report errors. Each object can generate events, and any other object can set up an "observer" for those events. While events can be used for error handling, that's not what they are really meant for. Both VTK and ITK use events primarily as a means of driving the user interface, and for passing miscellaneous information between components.

In the three examples above (return values, error variable, exceptions) revolve around the following construct:

  • attempt to do something by executing code A
    • if the attempt failed, execute code B
    • if the attempt succeeded, execute code C

Events just don't fit this construct.

The place where Events can be useful for error reporting in IGSTK is for passing information about an error to the application, so that the application can pop up a dialog box for the user.

In particular, events could be useful for reporting errors to the user where the user can take action to remedy the error, since after the observer code has been executed, the code execution returns to the point in the original object where the event was generated.

For example,

  • attempt an IO operation
    • if operation succeeds, call success function
    • if operation fails, call failure function

Then the failure function would be like this:

  • generate an event indicating failure
    • if user indicates "retry", then set a state machine input that will re-attempt the IO operation
    • if the user indicates "fail", then switch from "attempting" state to the "failure" state

Proposals for Error Handling

[Discussion posted by DG]

There is not, at this time, a consensus on how to handle errors in IGSTK.

The following is a proposal on how errors can be handled:

1) An IGSTK object will usually detect IO errors via return values or error variables or by catching exceptions. The response of an object to an error should always be to set the appropriate input for its state machine.

2) An IGSTK object should report non-recoverable errors by raising an exception. The object-object boundary should use exceptions as the error handling method of choice. An object should not raise an exception immediately upon detecting an error, rather, it should set an input to its state machine, and the state machine should then call a function that raises the exception.

3) An IGSTK object should report recoverable errors by generating an event, and by providing some means for the observer of the event to indicate whether to "retry" or "fail", and if "fail" then an exception should be generated as indicated above.

4) The application should detect non-recoverable errors in the components via exception catching, and recoverable errors by using event observers.


KG: There is also some discussion on exceptions over in State Machines. I am in favor of translating exceptions into state machine inputs. Exceptions represent abnormal or unexpected program behavior, and a benefit of a state machine is to transform the unexpected to expected outcomes. The state machine can effectively help you say "I do not know what that is or why it went wrong, but I know what state I am in now". And even if you unsure of the state (example: network connection failed, did it drop or is something else going on?) you can say that explicitly with the state machine and make appropriate decisions (the "attempting-to-reconnect" state, retry 5 times, then transition to a more permanent failed state). So I am in favor of #1 above but not #2.

I am not so certain I see the general purpose utility of events for error-handling, and there was a discussion on the t-con on 8/25 that also raised some concern. For one, the call sequence in a single-threaded application when you generate an event in a lightweight event model would seem to create a roundabout interaction with the object generating the event (in fact, this seems to sometimes happen in normal state machine sequences). When an error occurs, the SM should transition to a known state. However, whether the SM can get out of that situation (or even needs to) is another question. And should the application have the ability to change such policies? For example, reconsidering the network connection example above, what if the application would prefer not to retry and instead would prefer the component immediately fail? It seems some error handling ability can be encapsulated in a component but others should in fact be delegated. Doing this while preserving SM encapsulation is tricky.

Finally, my concern still is more about how the caller is notified a request for service was not met. If we do not use return codes then in an error scenario it would seem we would have to rely on exceptions. Return codes also have the benefit of being able to return more information in an object if need be. Events do not make snese unless we are multithreaded.

Personal tools
TOOLBOX
LANGUAGES