[ITK-dev] efficiency of vnl_matrix

Matt McCormick matt.mccormick at kitware.com
Thu Mar 12 19:30:47 EDT 2015


Hi Jian,

Thanks for sharing these performance testing coding. As Chuck also
demonstrated, measuring performance is critical for real progress.

Thanks,
Matt

On Thu, Mar 12, 2015 at 11:39 AM, Jian Cheng <jian.cheng.1983 at gmail.com> wrote:
> Hi,
>
> For (2), the attachment is a comparison I made.
> It adds additional tests on vnl to the code from the previous link
> http://nghiaho.com/?p=1726 .
> Some header files can be downloaded from
> https://github.com/DiffusionMRITool/dmritool/tree/master/Modules/HelperFunctions/include
>
> Then you can build and run
> // With eigen
> // g++ -DTEST_EIGEN test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse_vnl -lopencv_core  -O3 -DNDEBUG
>
> // With ARMA OpenBLAS
> // g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp
> -lopenblas -O3 -DNDEBUG -DHAVE_INLINE
>
> // with vnl
> // g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
> -DUTL_USE_FASTLAPACK
>
> // with vnl + openblas
> // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
> -DUTL_USE_FASTLAPACK
> // with vnl + mkl
> // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
> -DUTL_USE_FASTLAPACK
>
> // with utl+openblas
> // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
> -DUTL_USE_FASTLAPACK
> // with utl + mkl;
> // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3
> -DUTL_USE_FASTLAPACK
>
> In my experiments, using blas functions from mkl is the most efficient way.
>
> best,
> Jian Cheng
>
> On 03/12/2015 10:41 AM, Matt McCormick wrote:
>> Hi,
>>
>> From the discussion so far, it appears the following serious of steps
>> could be taken to move forward on performance:
>>
>> 1) Replace vnl_vector and vnl_matrix with vnl_vector_fixed and
>> vnl_matrix_fixed when possible.
>>
>> 2) Add Jian Cheng's BLAS and LAPACK backends for vnl_vector and vnl_matrix.
>>
>> 3) Add support for armadillo or eigen.
>>
>> 1) and 2) will be relatively easy to make, and will hopefully have an
>> immediate impact on performance.  3) will take make work to happen,
>> and it will take longer to impact the toolkit.  We will need to have
>> cross-platform builds of the libraries in the repository.  Also, many
>> ITK classes encapsulate their use of VNL very poorly, so it will not
>> be as simple as swapping out or improving their backends.
>>
>> 2 cents,
>> Matt
>>
>> On Thu, Mar 12, 2015 at 10:20 AM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov> wrote:
>>> Chuck,
>>>
>>> Thank you for giving us that important conclusion, under a quite difficult
>>> situations.
>>>
>>> I wonder if there is any distinction in the usage of vnl_matrix vs
>>> vnl_matrix_fixed. I would expect that operations done for pixel transforms
>>> should have there dimension know at run-time and should be able to use the
>>> vnl_matrix_fixed.
>>>
>>> I also have considered methods to transform a whole array of points at a
>>> time. I wonder if for 3x3*3*256 sized operations ( scan-line size ) if there
>>> would be benefit with the library based operations.
>>>
>>> Brad
>>>
>>>
>>> On Mar 12, 2015, at 10:02 AM, Chuck Atkins <chuck.atkins at kitware.com> wrote:
>>>
>>> I worked with Julie Langou, maintainer of LAPACK, on this project a few
>>> years ago.  The funding situation ended up very strange and messy and we
>>> basically had to cram 3 months worth of effort into 3 weeks, so needless to
>>> say, we were not able to really achieve our goals.  However, we spent a fair
>>> amount of time profiling ITK and analyzing it's hot spots from vnl to
>>> determine where to best spend the small ammount of time we had.  The results
>>> were not as straight forward as we expected.  It turns out that most of the
>>> use for vnl_matrix and vnl_vector was actually for an enourmous number of
>>> operations on very small sized vectors and matricies (dimensions of 2, 3, or
>>> 4), often for coordinate and geometry calculations or for small per-pixel
>>> operations that were not easily vectorized in the implementation at the
>>> time.  In these cases, the overhead of calling out to a BLAS or LAPACK
>>> library was much too expensive and the existing use of VNL was far more
>>> optimal.  This falls apart, however when trying to use vnl for more complex
>>> algorithms since the larger matrix operations will be where the benefit can
>>> be seen.  So just re-implementing the vnl vector and matrix classes and
>>> operators with underlying BLAS and LAPACK routines turned out to not be the
>>> best solution for ITK as a whole.
>>>
>>> - Chuck
>>> tage of the performance gains of large block matrix and vector
>>> operations seen with optimized BLAS and LAPACK libraries, the
>>> computations needed to be re-worked to act in an SoA (struct of
>>> arrays) fashion instead.  Given our limited time and resources, this
>>> was out of scope for what we could tackle.
>>>
>>> * Typically AoS and SoA refer to storage layout but I'm using it to
>>> refer to computation layout.  The terminology may not be correct but
>>> I think you can understand what I mean.
>>> On Thu, Mar 12, 2015 at 8:32 AM, Bradley Lowekamp <blowekamp at mail.nih.gov>
>>> wrote:
>>>> Hello,
>>>>
>>>> If I was writing my own ITK classes, and needed a fast matrix library I
>>>> would likely pursue an additional dependency on an efficient numeric library
>>>> for that project, such as eigen.
>>>>
>>>> However for the broad appeal of ITK I would think a flexible back end
>>>> would be best. As I think there are a variety of BLAS and LAPACK libraries
>>>> available ( commercial, open source, vender free ). It would be nice to pick
>>>> one what has been optimized for the current architecture. I would think it
>>>> would be most flexible to use this interface in the back end of a chosen
>>>> numeric interface ( currently VNL ). Unfortunately, I don't have as much
>>>> experience with these libraries as I'd like.
>>>>
>>>> Brad
>>>>
>>>> On Mar 12, 2015, at 5:15 AM, m.staring at lumc.nl wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think the eigen library is a mature and very fast library for these
>>>>> kind of things:
>>>>> http://eigen.tuxfamily.org/index.php?title=Main_Page
>>>>>
>>>>> You may want to check it out, to see if it offers what you need.
>>>>>
>>>>> It would be great to be able to use these within the itk.
>>>>>
>>>>> 2c
>>>>> Marius
>>>>>
>>>>> -----Original Message-----
>>>>> From: Insight-developers [mailto:insight-developers-bounces at itk.org] On
>>>>> Behalf Of Jian Cheng
>>>>> Sent: Wednesday, March 11, 2015 23:17
>>>>> To: Matt McCormick
>>>>> Cc: Chuck Atkins; ITK
>>>>> Subject: Re: [ITK-dev] efficiency of vnl_matrix
>>>>>
>>>>> Hi Matt,
>>>>>
>>>>> Thanks for your help, and also for the ITK workshop in UNC last time.
>>>>>
>>>>> It is very unfortunate. The efficiency of these numerical math operators
>>>>> are very important for many applications.
>>>>>
>>>>> I recently released an ITK based toolbox, called dmritool, for diffusion
>>>>> MRI data processing.
>>>>> It has some files to add some supports of blas, lapack, mkl to
>>>>> vnl_matrix and vnl_vector.
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlBlas_8h_source.html
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVNLBlas_8h_source.html
>>>>>
>>>>> Those functions are not internally for vnl_matrix class. They are
>>>>> operators for the data pointer stored in vnl_matrix object.
>>>>> Thus, later I made a N-dimensional array library which internally
>>>>> includes those functions, and also supports expression template to avoid
>>>>> temporary copies.
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlMatrix_8h_source.html
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVector_8h_source.html
>>>>>
>>>>> The efficiency comparison between vnl_vector/vnl_matrix and the
>>>>> vector/matrix using openblas, lapack, or mkl can be found by running those
>>>>> two tests
>>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLBlasGTest.cxx
>>>>>
>>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLLapackGTest.cxx
>>>>>
>>>>> Maybe some codes can be used as patches in somewhere in ITK. I am not
>>>>> sure. Maybe we need more discussion on it.
>>>>> With your help and discussion, I will be very glad to make my first
>>>>> patch to ITK.
>>>>> Thanks.
>>>>>
>>>>> best,
>>>>> Jian Cheng
>>>>>
>>>>>
>>>>> On 03/11/2015 04:39 PM, Matt McCormick wrote:
>>>>>> Hi Jian,
>>>>>>
>>>>>> Yes, it would be wonderful to improve the efficiency of these basic
>>>>>> numerical operations.
>>>>>>
>>>>>> Funding for the Refactor Numerical Libraries has currently ended, and
>>>>>> the effort is currently frozen.  However, you are more than welcome to
>>>>>> pick it up and we can help you get it into ITK.  More information on
>>>>>> the patch submission process can be found here [1] and in the ITK
>>>>>> Software Guide.
>>>>>>
>>>>>> Thanks,
>>>>>> Matt
>>>>>>
>>>>>> [1]
>>>>>> https://insightsoftwareconsortium.github.io/ITKBarCamp-doc/CommunitySo
>>>>>> ftwareProcess/SubmitAPatchToGerrit/index.html
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:07 PM, Jian Cheng <jian.cheng.1983 at gmail.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> My task using ITK has intensive matrix-matrix product, pseudo-inverse,
>>>>>>> etc.
>>>>>>> Thus the performance is actually mainly determined by the matrix
>>>>>>> library I used.
>>>>>>> Firstly I use vnl_matrix and vnl_vector in ITK. Then I found it is
>>>>>>> very inefficient because vnl matrix lib does not use blas and lapack.
>>>>>>> After I wrote my own matrix class which uses openblas and lapack, I
>>>>>>> got a hug gain of performance.
>>>>>>>
>>>>>>> I found there is a proposal to improve the efficiency of numerical
>>>>>>> libraries in ITK.
>>>>>>> http://www.itk.org/Wiki/ITK/Release_4/Refactor_Numerical_Libraries
>>>>>>> I am not sure how is the progress of the proposal.
>>>>>>> I wonder when the vnl matrix lib can internally support blas and
>>>>>>> lapack, or mkl, so that we can just use it without lose of the
>>>>>>> efficiency.
>>>>>>> Thanks.
>>>>>>>
>>>>>>> best,
>>>>>>> Jian Cheng
>>>>>>> _______________________________________________
>>>>>>> Powered by www.kitware.com
>>>>>>>
>>>>>>> Visit other Kitware open-source projects at
>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>
>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>
>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>
>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>> http://kitware.com/products/protraining.php
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>> http://kitware.com/products/protraining.php
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>
>


More information about the Insight-developers mailing list