[vtkusers] K-means values

Sara Rolfe smrolfe at u.washington.edu
Wed Mar 16 16:07:35 EDT 2011


Hi David,

It's not a difficult fix, I mainly found it awkward since I thought I  
was missing something simple.  I understand how it works better now  
and can certainly implement it this way.  I appreciate your  
clarification.

Sara

On Mar 16, 2011, at 11:22 AM, David Thompson wrote:

> Hi Sara,
>
>> It seems like I could solve this by using learning and specifying  
>> one iteration, but this seems awkward.  If anyone is aware of a  
>> better way to access the means from the kMeansStatistics output,  
>> could you let me know?
>
> That would indeed be the only way to have the filter compute the  
> mean coordinates for each cluster; the mean coordinates are part of  
> the statistical model, not the assessment of data, and the model is  
> computed by Learn and Derive. I'm sorry you find it awkward, but  
> that's the way things are at the moment. Do you have some suggestion  
> on how to change things? It doesn't seem to me to involve a  
> significant amount of code to get the means computed:
>  kMeansStatistics->SetLearnOption( 1 ); // This is on by default.
>  kMeansStatistics->SetMaxNumIterations( 1 );
> nor a lot of code to access them:
>  vtkTable* tab = vtkTable::SafeDownCast( kMeansStatistics- 
> >GetOutputDataObject( 1 ).GetBlock( 0 ) );
>  double xc = tab->GetValueByName( label, "x" ).ToDouble();
>
> 	David
>
>> On Mar 15, 2011, at 3:51 PM, Sara Rolfe wrote:
>>
>>> Hi David,
>>>
>>> Thanks for your reply.  Right now I'm using vtkKmeansStatistics  
>>> without learning and am following the example here:
>>>
>>> http://www.vtk.org/Wiki/VTK/Examples/InfoVis/KMeansClustering
>>>
>>> The output that I get using kMeansStatistics->GetOutput()->Dump()  
>>> shows the original value, the distance to the nearest cluster, and  
>>> cluster id it is assigned to, instead of the cluster mean.
>>>
>>> +-----------------+-----------------+------------------+
>>> | Magnitude       | distance (0)    | closest id (0)   |
>>> +-----------------+-----------------+------------------+
>>> | 0.0657005       | 6.44972e-06     | 4                |
>>> | 0.0652216       | 4.24651e-06     | 4                |
>>> | 0.0646891       | 2.33557e-06     | 4                |
>>> | 0.0641142       | 9.08931e-07     | 4                |
>>> | 0.0635069       | 1.19747e-07     | 4                |
>>> | 0.0666587       | 1.2235e-05       | 4                |
>>>
>>> I think I will probably use learning, but I'd like to get it  
>>> working without first.
>>>
>>> Thanks,
>>> Sara
>>>
>>> On Mar 15, 2011, at 3:27 PM, Thompson, David C wrote:
>>>
>>>> Hi Sara,
>>>>
>>>>> I'm using vtkKmeansStatistics to successfully cluster data points.
>>>>> However, I'm missing how you access the actual cluster mean  
>>>>> values,
>>>>> instead of just their labels.  It looks like the order of the  
>>>>> labels
>>>>> may not correspond to the values of the means, is this true?
>>>>
>>>> I'm not clear on what you mean by "label". I've run the filter on  
>>>> data with 2 columns (named x & y) and with 2 sets of initial  
>>>> cluster center coordinates specified on the LEARN_PARAMETERS  
>>>> input: one for k=2 and one for k=3. I get this table:
>>>>
>>>> +----------------+----------------+---------------- 
>>>> +----------------+----------------+---------------- 
>>>> +-----------------+
>>>> | Run ID         | k              | Iterations     |  
>>>> Error          | Cardinality    | x              |  
>>>> y               |
>>>> +----------------+----------------+---------------- 
>>>> +----------------+----------------+---------------- 
>>>> +-----------------+
>>>> | 0              | 2              | 3              |  
>>>> 1528.94        | 772            | 0.166201       |  
>>>> 0.12059         |
>>>> | 0              | 2              | 3              |  
>>>> 498.266        | 228            | 2.79467        |  
>>>> 2.99856         |
>>>> | 1              | 3              | 15             |  
>>>> 546.596        | 397            | -0.341883      |  
>>>> -0.486857       |
>>>> | 1              | 3              | 15             |  
>>>> 546.946        | 405            | 0.758854       |  
>>>> 0.855424        |
>>>> | 1              | 3              | 15             |  
>>>> 381.077        | 198            | 2.99941        |  
>>>> 3.14951         |
>>>> +----------------+----------------+---------------- 
>>>> +----------------+----------------+---------------- 
>>>> +-----------------+
>>>>
>>>> as the first block of output 1 (i.e.,  
>>>> GetOutputDataObject( 1 ).GetBlock( 0 ).Dump() will produce the  
>>>> above). The first 2 rows contain the cluster mean values  
>>>> corresponding to the run with k=2 and the final 3 rows have the  
>>>> same for the run with k=3. Because there are 2 coordinates (x &  
>>>> y) for each cluster center, there is no good way to order cluster  
>>>> centers by their means. Instead, their order matches the initial  
>>>> guesses at cluster centers specified on the LEARN_PARAMETERS  
>>>> input if it exists. Otherwise, the order is random because the  
>>>> initial guesses are produced randomly. Is this what you wanted to  
>>>> know?
>>>>
>>>>   David
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.vtk.org/mailman/listinfo/vtkusers
>>
>> <ATT00002..txt>
>
>




More information about the vtkusers mailing list