[Insight-users] PCAShapeModelEstimator -- seems way too slow?

Fri May 14 22:19:58 EDT 2004

Hi Zach,
Classically, the PCA can be calculated by identifying the eigen vectors
corresponding to the k largest eigen values in the covariance matrix
(generated from the data). Owing to the fact that in image processing, =
one
uses images with large number of pixels, the implementation in ITK is =
bit
different and the logic behind choosing such an approach is explained in =
the
documentation of the itkImagePCAShapeModelEstimator class. Here is an =
excerpt
from the class documentation explaining the logic:

 * To speed the computation instead of performing the eigen analysis of =
the=20
 * covariance vector A*A' where A is a matrix with p x t, p =3D number =
of
 * pixels or voxels in each images and t =3D number of training images, =
we
 * calculate the eigen vectors of the inner product matrix A'*A. The
resulting
 * eigen vectors (E) are then multiplied with the matrix A to get the=20
 * principal components. The covariance matrix has a dimension of p x p.
Since=20
 * number of pixels in any image being typically very high the eigen=20
 * decomposition becomes computationally expensive. The inner product on =
the=20
 * other hand has the dimension of t x t, where t is typically much =
smaller=20
 * that p. Hence the eigen decomposition (most compute intensive part) =
is
 * orders of magnitude faster.

Lets consider a typical data set of 100 images each of the size 256 x =
256
pixels. The A matrix will have a dimension of 256^2 x 100. As per the
classical approach, the covariance matrix will have a dimension of 256^2 =
x
256^2. Using our approach the inner product matrix dimension reduces to =
100 x
100. These leads to a huge reduction in both computation and the memory
requirements.

In your case, the A matrix will have a size of 300 x 1100. The =
covariance
matrix will have a dimension of 300 x 300, while the inner product =
matrix
dimension will 1100 x 1100. This would explain the large computation =
time you
are seeing.

One approach could be to compute the covariance matrix followed by k =
largest
eigen vector identification using the existing functions in ITK. =
Ideally, the
approach would be to have both the methods implemented and switch =
between one
or the other depending on the size of the training set, the two =
scenarios
being large images with fewer training sets (use inner product =
matrix)and
small images with large number of training sets (use the covariance =
matrix).

Hope this explanation helps. Thanks,

Sayan

>Hello,

>Recently, I've been doing some PCA work, and I noticed that itk's=20
>ImagePCAShapeModelEstimator seems to be surprisingly slow.

>I have, for example, 1100 images of size 20x15 -- a pretty modest data=20
>set to do PCA on. Matlab (running on a 450 MHz ultrasparc-II) can=20
>compute the principal components on such a dataset in a few seconds,=20
>even when I intentionally do things in a particularly slow way.

>Using the ITK PCA estimator, this same operation takes 15+ minutes on=20
>my personal machine (867 mhz g4). It's not a RAM or VM issue since the=20
>process never seems to grab more than 100 megs of RAM, and doesn't hit=20
>the disk at all.

>This seems especially strange given the effort in the PCA class's=20
>implementation toward efficiency. (Though I do realize that a dataset=20
>such as mine with more images than pixels per image does defeat some of =

>the optimizations rolled in...)

>What could I possibly be doing wrong? I profiled the itk PCA code, and=20
>nothing looks overtly wrong -- it just seems to take way too long!

>Zach Pincus