[vtkusers] K-means values

Thompson, David C dcthomp at sandia.gov
Tue Mar 15 18:27:25 EDT 2011


Hi Sara,

> I'm using vtkKmeansStatistics to successfully cluster data points.
> However, I'm missing how you access the actual cluster mean values,
> instead of just their labels.  It looks like the order of the labels
> may not correspond to the values of the means, is this true?

I'm not clear on what you mean by "label". I've run the filter on data with 2 columns (named x & y) and with 2 sets of initial cluster center coordinates specified on the LEARN_PARAMETERS input: one for k=2 and one for k=3. I get this table:

+----------------+----------------+----------------+----------------+----------------+----------------+-----------------+
| Run ID         | k              | Iterations     | Error          | Cardinality    | x              | y               |
+----------------+----------------+----------------+----------------+----------------+----------------+-----------------+
| 0              | 2              | 3              | 1528.94        | 772            | 0.166201       | 0.12059         |
| 0              | 2              | 3              | 498.266        | 228            | 2.79467        | 2.99856         |
| 1              | 3              | 15             | 546.596        | 397            | -0.341883      | -0.486857       |
| 1              | 3              | 15             | 546.946        | 405            | 0.758854       | 0.855424        |
| 1              | 3              | 15             | 381.077        | 198            | 2.99941        | 3.14951         |
+----------------+----------------+----------------+----------------+----------------+----------------+-----------------+

as the first block of output 1 (i.e., GetOutputDataObject( 1 ).GetBlock( 0 ).Dump() will produce the above). The first 2 rows contain the cluster mean values corresponding to the run with k=2 and the final 3 rows have the same for the run with k=3. Because there are 2 coordinates (x & y) for each cluster center, there is no good way to order cluster centers by their means. Instead, their order matches the initial guesses at cluster centers specified on the LEARN_PARAMETERS input if it exists. Otherwise, the order is random because the initial guesses are produced randomly. Is this what you wanted to know?

    David



More information about the vtkusers mailing list