[vtk-developers] statistics command line tool

Pebay, Philippe P pppebay at sandia.gov
Mon Feb 1 17:10:46 EST 2010


Hello,

I thought I would let you know about a vtkpython script which I created in order to access most of the VTK/Titan statistics functionalities (with the notable exception of hypothesis testing for this has not been implemented yet for all engines) from the command line. The main motivation for this script was an OVIS application, but it turns out to be useful for other purposes such as quick statistics calculations from flat files. The script is called haruspex.py and is located in VTK/Examples/Infovis/Python; it comes with a couple of ancillary CSV files (columns.py and temperatures.py) which can be used as examples.

For those who are interested, an example is copied below.
 
Any feedback will be appreciated for I am planning to extend the applicability of this tool. Thanks,

Philippe

===========================================
First, an execution of the helper that lists the available options
===========================================
[pebay at carlit]~/cvs/VTK/Examples/Infovis/Python$ ~/bin/VTK at Debug/bin/vtkpython haruspex.py -h
Usage:
         -h               Help: print this message and exit
         -d <filename>    CSV input data file
         -e <engine>      Type of statistics engine. Available engines are:
                            descriptive
                            order
                            contingency
                            correlative
                            multicorrelative
                            pca
                            kmeans
         [-m <prefix>]    CSV input model file. Default: calculate model from scratch
         [-s <prefix>]    CSV output model (statistics) file prefix. Default: outputModel
         [-a <filename>]  CSV output data (annotated) file. Default: outputData.csv
         [-c <filename>]  CSV columns of interest file. Default: all columns are of interest
         [-v]             Increase verbosity (0 = silent). Default: 0


===========================================
Below is an example of its execution with all options used (with the exception of the redefinition of the output file names), maximum verbosity, but with NO input model (e.g., for a first iteration):
(more text after this output dump)
===========================================

[pebay at carlit]~/cvs/VTK/Examples/Infovis/Python$ ~/bin/VTK at Debug/bin/vtkpython haruspex.py -d temperatures.csv -c columns.csv -e descriptive -vv           
# Parsed command line:
  Input data file: temperatures.csv
  No input model
  Statistics: descriptive
  Columns of interest in file: columns.csv
  Output data file: outputData.csv
  Output model file prefix: outputModel

# Instantiated a vtkDescriptiveStatistics object

# Reading input data:
  Number of columns: 3
  Number of rows: 32

# Input data:
+-----------+-----------+------------+
| CompId    | Temp1     | Temp2      |
+-----------+-----------+------------+
| 2         | 46        | 45         |
| 3         | 47        | 49         |
| 2         | 46        | 47         |
| 3         | 46        | 46         |
| 2         | 47        | 46         |
| 3         | 47        | 49         |
| 2         | 49        | 49         |
| 3         | 47        | 45         |
| 2         | 50        | 50         |
| 3         | 46        | 46         |
| 2         | 51        | 50         |
| 3         | 48        | 48         |
| 2         | 52        | 54         |
| 3         | 48        | 47         |
| 2         | 52        | 52         |
| 3         | 49        | 49         |
| 2         | 53        | 54         |
| 3         | 50        | 50         |
| 2         | 53        | 54         |
| 3         | 50        | 52         |
| 2         | 53        | 53         |
| 3         | 50        | 51         |
| 2         | 54        | 54         |
| 3         | 49        | 49         |
| 2         | 52        | 52         |
| 3         | 50        | 51         |
| 2         | 52        | 52         |
| 3         | 49        | 47         |
| 2         | 48        | 48         |
| 3         | 48        | 50         |
| 2         | 46        | 48         |
| 3         | 47        | 47         |
+-----------+-----------+------------+

# Reading list of columns of interest:
  Number of columns of interest: 2
  Columns of interest are: [1, 2]

# Calculating statistics:
  Requesting column Temp1
  Requesting column Temp2

# Saving output (annotated) data:
  Wrote outputData.csv

# Saving output model (statistics):
  Output model is a vtkTable
  Wrote outputModel-0.csv
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Variable  | Cardinalit| Minimum   | Maximum   | Mean      | M2        | M3        | M4        | Standard D| Variance  | g1 Skewnes| G1 Skewnes| g2 Kurtosi| G2 Kurtosi| Sum        |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Temp1     | 32        | 46        | 54        | 49.2188   | 185.469   | 136.951   | 2053.84   | 2.44599   | 5.98286   | 0.29245   | 0.32201   | -1.20692  | -1.20538  | 1575       |
| Temp2     | 32        | 45        | 54        | 49.5      | 234       | 93        | 3399      | 2.74743   | 7.54839   | 0.140137  | 0.154301  | -1.1358   | -1.12175  | 1584       |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+

===========================================
Now, we are going to do the same thing, but using the previously calculated outputModel as input (that is, for subsequent iterations). Note the updated (aggregated) model that results from this procedure.
This can be repeated ad nauseam.
===========================================

[pebay at carlit]~/cvs/VTK/Examples/Infovis/Python$ ~/bin/VTK at Debug/bin/vtkpython haruspex.py -d temperatures.csv -c columns.csv -e descriptive -m outputModel -vv
# Parsed command line:
  Input data file: temperatures.csv
  Input model file prefix: outputModel
  Statistics: descriptive
  Columns of interest in file: columns.csv
  Output data file: outputData.csv
  Output model file prefix: outputModel

# Instantiated a vtkDescriptiveStatistics object

# Reading input data:
  Number of columns: 3
  Number of rows: 32

# Input data:
+-----------+-----------+------------+
| CompId    | Temp1     | Temp2      |
+-----------+-----------+------------+
| 2         | 46        | 45         |
| 3         | 47        | 49         |
| 2         | 46        | 47         |
| 3         | 46        | 46         |
| 2         | 47        | 46         |
| 3         | 47        | 49         |
| 2         | 49        | 49         |
| 3         | 47        | 45         |
| 2         | 50        | 50         |
| 3         | 46        | 46         |
| 2         | 51        | 50         |
| 3         | 48        | 48         |
| 2         | 52        | 54         |
| 3         | 48        | 47         |
| 2         | 52        | 52         |
| 3         | 49        | 49         |
| 2         | 53        | 54         |
| 3         | 50        | 50         |
| 2         | 53        | 54         |
| 3         | 50        | 52         |
| 2         | 53        | 53         |
| 3         | 50        | 51         |
| 2         | 54        | 54         |
| 3         | 49        | 49         |
| 2         | 52        | 52         |
| 3         | 50        | 51         |
| 2         | 52        | 52         |
| 3         | 49        | 47         |
| 2         | 48        | 48         |
| 3         | 48        | 50         |
| 2         | 46        | 48         |
| 3         | 47        | 47         |
+-----------+-----------+------------+

# Reading input model:
  Number of columns: 15
  Number of rows: 2

# Input Model:
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Variable  | Cardinalit| Minimum   | Maximum   | Mean      | M2        | M3        | M4        | Standard D| Variance  | g1 Skewnes| G1 Skewnes| g2 Kurtosi| G2 Kurtosi| Sum        |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Temp1     | 32        | 46        | 54        | 49.2188   | 185.469   | 136.951   | 2053.84   | 2.44599   | 5.98286   | 0.29245   | 0.32201   | -1.20692  | -1.20538  | 1575       |
| Temp2     | 32        | 45        | 54        | 49.5      | 234       | 93        | 3399      | 2.74743   | 7.54839   | 0.140137  | 0.154301  | -1.1358   | -1.12175  | 1584       |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+

# Reading list of columns of interest:
  Number of columns of interest: 2
  Columns of interest are: [1, 2]

# Calculating statistics:
  Requesting column Temp1
  Requesting column Temp2

# Saving output (annotated) data:
  Wrote outputData.csv

# Saving output model (statistics):
  Output model is a vtkTable
  Wrote outputModel-0.csv
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Variable  | Cardinalit| Minimum   | Maximum   | Mean      | M2        | M3        | M4        | Standard D| Variance  | g1 Skewnes| G1 Skewnes| g2 Kurtosi| G2 Kurtosi| Sum        |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+
| Temp1     | 64        | 46        | 54        | 49.2188   | 370.938   | 273.902   | 4107.68   | 2.4265    | 5.8879    | 0.299554  | 0.314125  | -1.14862  | -1.14373  | 3150       |
| Temp2     | 64        | 45        | 54        | 49.5      | 468       | 186       | 6798      | 2.72554   | 7.42857   | 0.143541  | 0.150523  | -1.07518  | -1.06421  | 3168       |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+


--
Philippe Pébay
Sandia National Laboratories




More information about the vtk-developers mailing list