[Insight-users] PCAShapeModelEstimator -- seems way too slow?

Luis Ibanez luis.ibanez@kitware.com
Sat May 15 00:52:32 EDT 2004


Hi Zach,


Thanks for looking into this performance issue.


If you start profiling the code you may find
useful the following class:

    Insight/Code/Common/
       itkTimeProbesCollectorBase.h

http://www.itk.org/Insight/Doxygen/html/classitk_1_1TimeProbesCollectorBase.html

It allows you to cumulate the time used in between
two pieces of code. The usage is quite simple:


   #include "itkTimeProbesCollectorBase.h"


   itk::TimeProbesCollectorBase   timer;


   timer.Start("Block1");

   ... code code code

   timer.Stop("Block1");



   timer.Report();



The "tags" like "Block1" should match between the
Stop() and the Start() methods. You can add as
much tags as you want, and you can also nest them.


Note that time resolution is much better under Unix
that under Windows. Try to put the tags around pieces
of code that should take at least some milliseconds
to execute.



Please let us know what you find.


   Thanks


     Luis


-----------------------
Zachary Pincus wrote:

> Sayan,
> 
> I am most familiar with how PCA works and the particular strategies used 
> to speed the calculations, both in general and within ITK specifically. 
> Moreover, I understand that a data set such as mine does defeat the ITK 
> optimization, as I mentioned earlier.
> 
>>> (Though I do realize that a dataset such as mine with more images 
>>> than pixels per image does defeat some of the optimizations rolled 
>>> in...)
> 
> 
> Nevertheless, calculating the inner product A'A (a ~1000x1000 matrix) 
> and its eigenvectors *SHOULD* take a minute at most. To verify this, I 
> performed the same calculations in ITK on my machine, matlab on a 450 
> Mhz sparcII, and on numerical python on my machine.
> 
> ITK takes about 10-15 minutes to compute the PCA decomposition for 1100 
> images of 300 px each. Now my test results:
> 
> First: Matlab calculates eig(A'*A) in about 30 sec, even when A is 300 x 
> 1100. This is the "particularly slow way" of getting the results, as I 
> mentioned earlier. So there is a true and significant difference between 
> Matlab and the ITK/VNL code, *regardless* of the fact that my data set 
> is a strange shape.
> 
> Second: For a better comparison than matlab running on another machine, 
> I ran some numeric python code to compute the eigenvectors of A'A on my 
> personal machine. This calls BLAS and LAPACK, and even for the 300x1000 
> case, the calculation completes within a minute. So, again, the ITK PCA 
> routine is seeing a profound slowdown somewhere.
> 
> Conclusion: There is some slowdown somewhere, either in how I compiled 
> the code, or in how I call the code, or in how the code is implemented, 
> that is independent of the fact that I'm passing in suboptimal data. 
> Perhaps the slowness isn't in the actual VNL linear algebra code, but in 
> the wrappers that pack the matrices. I'll try to profile the whole run 
> and look.
> 
> Zach
> 
> Department of Biochemistry and Program in Biomedical Informatics
> Stanford University School of Medicine
> 
> 
> On May 14, 2004, at 2:19 PM, Sayan Pathak wrote:
> 
>> Hi Zach,
>> Classically, the PCA can be calculated by identifying the eigen vectors
>> corresponding to the k largest eigen values in the covariance matrix
>> (generated from the data). Owing to the fact that in image processing, 
>> one
>> uses images with large number of pixels, the implementation in ITK is bit
>> different and the logic behind choosing such an approach is explained 
>> in the
>> documentation of the itkImagePCAShapeModelEstimator class. Here is an 
>> excerpt
>> from the class documentation explaining the logic:
>>
>>
>>  * To speed the computation instead of performing the eigen analysis 
>> of the
>>  * covariance vector A*A' where A is a matrix with p x t, p = number of
>>  * pixels or voxels in each images and t = number of training images, we
>>  * calculate the eigen vectors of the inner product matrix A'*A. The
>> resulting
>>  * eigen vectors (E) are then multiplied with the matrix A to get the
>>  * principal components. The covariance matrix has a dimension of p x p.
>> Since
>>  * number of pixels in any image being typically very high the eigen
>>  * decomposition becomes computationally expensive. The inner product 
>> on the
>>  * other hand has the dimension of t x t, where t is typically much 
>> smaller
>>  * that p. Hence the eigen decomposition (most compute intensive part) is
>>  * orders of magnitude faster.
>>
>> Lets consider a typical data set of 100 images each of the size 256 x 256
>> pixels. The A matrix will have a dimension of 256^2 x 100. As per the
>> classical approach, the covariance matrix will have a dimension of 
>> 256^2 x
>> 256^2. Using our approach the inner product matrix dimension reduces 
>> to 100 x
>> 100. These leads to a huge reduction in both computation and the memory
>> requirements.
>>
>> In your case, the A matrix will have a size of 300 x 1100. The covariance
>> matrix will have a dimension of 300 x 300, while the inner product matrix
>> dimension will 1100 x 1100. This would explain the large computation 
>> time you
>> are seeing.
>>
>> One approach could be to compute the covariance matrix followed by k 
>> largest
>> eigen vector identification using the existing functions in ITK. 
>> Ideally, the
>> approach would be to have both the methods implemented and switch 
>> between one
>> or the other depending on the size of the training set, the two scenarios
>> being large images with fewer training sets (use inner product matrix)and
>> small images with large number of training sets (use the covariance 
>> matrix).
>>
>> Hope this explanation helps. Thanks,
>>
>> Sayan
>>
>>
>>
>>
>>
>>> Hello,
>>
>>
>>> Recently, I've been doing some PCA work, and I noticed that itk's
>>> ImagePCAShapeModelEstimator seems to be surprisingly slow.
>>
>>
>>> I have, for example, 1100 images of size 20x15 -- a pretty modest data
>>> set to do PCA on. Matlab (running on a 450 MHz ultrasparc-II) can
>>> compute the principal components on such a dataset in a few seconds,
>>> even when I intentionally do things in a particularly slow way.
>>
>>
>>> Using the ITK PCA estimator, this same operation takes 15+ minutes on
>>> my personal machine (867 mhz g4). It's not a RAM or VM issue since the
>>> process never seems to grab more than 100 megs of RAM, and doesn't hit
>>> the disk at all.
>>
>>
>>> This seems especially strange given the effort in the PCA class's
>>> implementation toward efficiency. (Though I do realize that a dataset
>>> such as mine with more images than pixels per image does defeat some of
>>> the optimizations rolled in...)
>>
>>
>>> What could I possibly be doing wrong? I profiled the itk PCA code, and
>>> nothing looks overtly wrong -- it just seems to take way too long!
>>
>>
>>> Zach Pincus
>>
>>
> 
> _______________________________________________
> Insight-users mailing list
> Insight-users@itk.org
> http://www.itk.org/mailman/listinfo/insight-users
> 






More information about the Insight-users mailing list