[vtkusers] K-means values

David Thompson dcthomp at sandia.gov
Wed Mar 16 14:22:07 EDT 2011


Hi Sara,

> It seems like I could solve this by using learning and specifying  
> one iteration, but this seems awkward.  If anyone is aware of a  
> better way to access the means from the kMeansStatistics output,  
> could you let me know?

That would indeed be the only way to have the filter compute the mean  
coordinates for each cluster; the mean coordinates are part of the  
statistical model, not the assessment of data, and the model is  
computed by Learn and Derive. I'm sorry you find it awkward, but  
that's the way things are at the moment. Do you have some suggestion  
on how to change things? It doesn't seem to me to involve a  
significant amount of code to get the means computed:
   kMeansStatistics->SetLearnOption( 1 ); // This is on by default.
   kMeansStatistics->SetMaxNumIterations( 1 );
nor a lot of code to access them:
   vtkTable* tab = vtkTable::SafeDownCast( kMeansStatistics- 
 >GetOutputDataObject( 1 ).GetBlock( 0 ) );
   double xc = tab->GetValueByName( label, "x" ).ToDouble();

	David

> On Mar 15, 2011, at 3:51 PM, Sara Rolfe wrote:
>
>> Hi David,
>>
>> Thanks for your reply.  Right now I'm using vtkKmeansStatistics  
>> without learning and am following the example here:
>>
>> http://www.vtk.org/Wiki/VTK/Examples/InfoVis/KMeansClustering
>>
>> The output that I get using kMeansStatistics->GetOutput()->Dump()  
>> shows the original value, the distance to the nearest cluster, and  
>> cluster id it is assigned to, instead of the cluster mean.
>>
>> +-----------------+-----------------+------------------+
>> | Magnitude       | distance (0)    | closest id (0)   |
>> +-----------------+-----------------+------------------+
>> | 0.0657005       | 6.44972e-06     | 4                |
>> | 0.0652216       | 4.24651e-06     | 4                |
>> | 0.0646891       | 2.33557e-06     | 4                |
>> | 0.0641142       | 9.08931e-07     | 4                |
>> | 0.0635069       | 1.19747e-07     | 4                |
>> | 0.0666587       | 1.2235e-05       | 4                |
>>
>> I think I will probably use learning, but I'd like to get it  
>> working without first.
>>
>> Thanks,
>> Sara
>>
>> On Mar 15, 2011, at 3:27 PM, Thompson, David C wrote:
>>
>>> Hi Sara,
>>>
>>>> I'm using vtkKmeansStatistics to successfully cluster data points.
>>>> However, I'm missing how you access the actual cluster mean values,
>>>> instead of just their labels.  It looks like the order of the  
>>>> labels
>>>> may not correspond to the values of the means, is this true?
>>>
>>> I'm not clear on what you mean by "label". I've run the filter on  
>>> data with 2 columns (named x & y) and with 2 sets of initial  
>>> cluster center coordinates specified on the LEARN_PARAMETERS  
>>> input: one for k=2 and one for k=3. I get this table:
>>>
>>> +----------------+----------------+---------------- 
>>> +----------------+----------------+---------------- 
>>> +-----------------+
>>> | Run ID         | k              | Iterations     |  
>>> Error          | Cardinality    | x              | y               |
>>> +----------------+----------------+---------------- 
>>> +----------------+----------------+---------------- 
>>> +-----------------+
>>> | 0              | 2              | 3              |  
>>> 1528.94        | 772            | 0.166201       | 0.12059         |
>>> | 0              | 2              | 3              |  
>>> 498.266        | 228            | 2.79467        | 2.99856         |
>>> | 1              | 3              | 15             |  
>>> 546.596        | 397            | -0.341883      | -0.486857       |
>>> | 1              | 3              | 15             |  
>>> 546.946        | 405            | 0.758854       | 0.855424        |
>>> | 1              | 3              | 15             |  
>>> 381.077        | 198            | 2.99941        | 3.14951         |
>>> +----------------+----------------+---------------- 
>>> +----------------+----------------+---------------- 
>>> +-----------------+
>>>
>>> as the first block of output 1 (i.e.,  
>>> GetOutputDataObject( 1 ).GetBlock( 0 ).Dump() will produce the  
>>> above). The first 2 rows contain the cluster mean values  
>>> corresponding to the run with k=2 and the final 3 rows have the  
>>> same for the run with k=3. Because there are 2 coordinates (x & y)  
>>> for each cluster center, there is no good way to order cluster  
>>> centers by their means. Instead, their order matches the initial  
>>> guesses at cluster centers specified on the LEARN_PARAMETERS input  
>>> if it exists. Otherwise, the order is random because the initial  
>>> guesses are produced randomly. Is this what you wanted to know?
>>>
>>>    David
>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.vtk.org/mailman/listinfo/vtkusers
>
> <ATT00002..txt>





More information about the vtkusers mailing list