[ITK] Behaviour of KdTreeBasedKmeansEstimator

Wed Nov 5 05:29:58 EST 2014

Hi folks,

I want to implement k-means clustering of sample measurement vectors and
started with the example codes provided in
http://www.itk.org/Doxygen45/html/Statistics_2KdTreeBasedKMeansClustering_8cxx-example.html
and
http://www.itk.org/Wiki/ITK/Examples/Statistics/KdTreeBasedKmeansEstimator

Both examples work, but small changes to the code results in unexpected
behaviour.
If the standard deviation in KdTreeBasedKMeansClustering.cxx is reduced
(e.g. to 5), the algorithm is not able to find the correct clusters with
centroid 100 and 200. Instead clusters with centroid 150 and 0 are found.
Is this an intended behaviour? Reducing the standard deviation should
provide a better separation of the two classes.

The wiki example will crash if the number of samples increases (Exception:
vectors are not the same size (3 and 0)). The same happens, if I replace
the WeightedCentroidKdTreeGenerator with KdTreeGenerator in 
KdTreeBasedKMeansClustering.cxx.
The crash seems to be related to the bucket size and number of samples. Is
this a bug in KdTreeGenerator?

I'm serching for a robust n-dimensional clustering algorithm. Do you have
any suggestions for alternative algorithms or to stabilze the behaviour of
the KdTreeBasedKmeansEstimator?

Thanks,
Jan