[ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

# 5. clsfy: Statistical Classification

Chapter summary:
Statistical classifiers only work x% of the time. x is inversely proportional to what the theory says it should be.

`clsfy` contains several classes for representing, using and training statistical classifiers. Input data is represented by `vnl_vector<double>`. Output classes are represented by integers [0..n_classes).

 [ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

## 5.1 Classifiers

All classifiers support the classification of sample vectors, and estimates of class probabilities. All classifiers are derived from `clsfy_classifier_base`.

### Main functions

`unsigned n_dims() const`

Dimensionality of vector space of inputs.

`unsigned n_classes() const`

The number of possible output classes. If n_classes() == 1, this indicates a binary classifier. In this case, most functions return values associated with just the positive (1st) class. As far as the interface is concerned a binary classifier is distinct from a multiclass classifier with n_classes() == 2.

`unsigned classify(x) const`

Most likely class of vector x

`void class_probabilities(vcl_vector<double> & outputs, x) const`

Estimate of a-posteriori class probabilities for vector x. If the classifier is binary (i.e. n_classes == 1), only a single value will be returned, and will be the probability of being in the class 1, also called the positive class.

`double log_l(x)`

If the classifier is binary, an estimate of the a-posteriori log likelihood of being in class 1.

The classifiers all support IO via `vsl_b_read`, `vsl_b_write`, and `vsl_print_summary`.

 [ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

## 5.2 Builders

The classifier training algorithms are embedded within the classes derived from `clsfy_builder_base`.

### Main functions

`clsfy_classifier_base* new_classifier()`

Create a new classifier of appropriate type on heap and return pointer

`double build(model, training_inputs, training_outputs, n_classes)`

Train the classifier from the data supplied

The concrete builders have attributes that can be modified to control the training process. They should all have default values for these attributes which may allow you to build a classifier without understanding too much about it.

The builders all support IO via `vsl_b_read`, `vsl_b_write`, and `vsl_print_summary`.

 [ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

### 5.2.1 Strategy Pattern

This code is an example of the strategy pattern (Gamma, et al. Design Patterns, Addison Wesley, 1995.) It is possible to write code that builds and uses a classifier, where your code does not itself know what sort of classifier is being used. Both builders and classifiers can be saved and loaded by base class pointer.

 [ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

## 5.3 Derived Classes

`clsfy_binary_hyperplane`

Simple two class classifier, where the class boundary is a plane (or line in two dimensions).

`clsfy_binary_hyperplane_ls_builder`

Train a hyperplane classifier using least squares.

`clsfy_pdf_classifier`

A binary classifier that takes a single PDF to describe the positive class (class number 1). The boundary is set on an iso-probability contour.

`clsfy_k_nearest_neighbour`

One of the simplest and most effective classifies around. Don't wait until the end of your PhD before comparing your algorithm with this one.

`clsfy_rbf_parzen`

A Parzen window classifiers that uses a radial basis kernel at each training point.

`clsfy_random_classifier`

Useful for testing, this classifier outputs a preferred class independent of the input data.

 [ < ] [ > ] [ << ] [ Up ] [ >> ] [Top] [Contents] [Index] [ ? ]

## 5.4 Examples

Suppose we wish to compute a classifier from a set of vectors, then estimate the probability that each vector was taken from class 1 by the distribution.

 ```vcl_vector > data_inputs(n); vcl_vector data_targets(n); // Load in the vectors .... // Create an iterator object to pass the data in mbl_data_wrapper > v_data(data_input); // Define what type of builder to use. In this case we want a hyperplane. clsfy_binary_hyperplane_ls_builder builder; // Generate model to build clsfy_classifier_base *classifier = builder.new_classifier(); // I could have created it directly using // clsfy_binary_hyperplane; // Build the model from the data builder.build(*classifier, v_data, data_targets); vsl_print_summary(vcl_cout, classifier); // Now find error; unsigned error; for (int i=0;iclassify(data_inputs[i]) != data_targets[i]) error++; vcl_cout "Training error " << error << " out of " << n << "samples"<

 [ << ] [ >> ] [Top] [Contents] [Index] [ ? ]

This document was generated on May, 1 2013 using texi2html 1.76.