[Insight-users] PCAShapeModelEstimator -- seems way too slow?
Sayan Pathak
spathak@insightful.com
Fri May 14 22:19:58 EDT 2004
Hi Zach,
Classically, the PCA can be calculated by identifying the eigen vectors
corresponding to the k largest eigen values in the covariance matrix
(generated from the data). Owing to the fact that in image processing, =
one
uses images with large number of pixels, the implementation in ITK is =
bit
different and the logic behind choosing such an approach is explained in =
the
documentation of the itkImagePCAShapeModelEstimator class. Here is an =
excerpt
from the class documentation explaining the logic:
* To speed the computation instead of performing the eigen analysis of =
the=20
* covariance vector A*A' where A is a matrix with p x t, p =3D number =
of
* pixels or voxels in each images and t =3D number of training images, =
we
* calculate the eigen vectors of the inner product matrix A'*A. The
resulting
* eigen vectors (E) are then multiplied with the matrix A to get the=20
* principal components. The covariance matrix has a dimension of p x p.
Since=20
* number of pixels in any image being typically very high the eigen=20
* decomposition becomes computationally expensive. The inner product on =
the=20
* other hand has the dimension of t x t, where t is typically much =
smaller=20
* that p. Hence the eigen decomposition (most compute intensive part) =
is
* orders of magnitude faster.
Lets consider a typical data set of 100 images each of the size 256 x =
256
pixels. The A matrix will have a dimension of 256^2 x 100. As per the
classical approach, the covariance matrix will have a dimension of 256^2 =
x
256^2. Using our approach the inner product matrix dimension reduces to =
100 x
100. These leads to a huge reduction in both computation and the memory
requirements.
In your case, the A matrix will have a size of 300 x 1100. The =
covariance
matrix will have a dimension of 300 x 300, while the inner product =
matrix
dimension will 1100 x 1100. This would explain the large computation =
time you
are seeing.
One approach could be to compute the covariance matrix followed by k =
largest
eigen vector identification using the existing functions in ITK. =
Ideally, the
approach would be to have both the methods implemented and switch =
between one
or the other depending on the size of the training set, the two =
scenarios
being large images with fewer training sets (use inner product =
matrix)and
small images with large number of training sets (use the covariance =
matrix).
Hope this explanation helps. Thanks,
Sayan
>Hello,
>Recently, I've been doing some PCA work, and I noticed that itk's=20
>ImagePCAShapeModelEstimator seems to be surprisingly slow.
>I have, for example, 1100 images of size 20x15 -- a pretty modest data=20
>set to do PCA on. Matlab (running on a 450 MHz ultrasparc-II) can=20
>compute the principal components on such a dataset in a few seconds,=20
>even when I intentionally do things in a particularly slow way.
>Using the ITK PCA estimator, this same operation takes 15+ minutes on=20
>my personal machine (867 mhz g4). It's not a RAM or VM issue since the=20
>process never seems to grab more than 100 megs of RAM, and doesn't hit=20
>the disk at all.
>This seems especially strange given the effort in the PCA class's=20
>implementation toward efficiency. (Though I do realize that a dataset=20
>such as mine with more images than pixels per image does defeat some of =
>the optimizations rolled in...)
>What could I possibly be doing wrong? I profiled the itk PCA code, and=20
>nothing looks overtly wrong -- it just seems to take way too long!
>Zach Pincus
More information about the Insight-users
mailing list