[Rtk-users] GPUs testing

Elena Padovani elenapadovani.lk at gmail.com
Thu Jul 19 05:44:51 EDT 2018


Hi Simon,
Thank you for the fast reply. i changed the RTK_CUDA_PROJECTIONS_SLAB_SIZE
but unfortunately nothing has changed. I also compiled it with the FLAG
RTK_TIME_EACH_FILTER on and i did not understand why it tells me
that CudaFDKBackProjectionImageFilter took 0.0275
s, CudaFDKConeBeamReconstructionFilter took 9.58 s
and CudaFFTRampImageFilter took 0.0389 s while the PrintTiming method tells
me that; Prefilter operations took 6.65, Ramp Filter 1.71, Backprojection:
1.21. So my question is where is the remaining time spent ? For instance,
is (Backprojection=1.21 - CudaFDKBackProjectionImageFilter=0.0275) the time
needed to copy the memory from CPU to GPU? The same holds for the ramp
filter.
Moreover it seems to me that what is taking long is the CUDAWeighting
filter so do you think that increasing the number of thread per block which
is now { 16, 16 , 2 } could help ?

Here is what the applications shows me with the -v option:

Reconstructing and writing... It took 11.8574 s
FDKConeBeamReconstructionFilter timing:
  Prefilter operations: 6.65107 s
  Ramp filter: 1.71472 s
  Backprojection: 1.21037 s


*********************************************************************************
Probe Tag                                     Starts    Stops     Time (s)

*********************************************************************************
ConstantImageSource                           1         1
 0.0962241
CudaCropImageFilter                           43        43
0.00230094
CudaFDKBackProjectionImageFilter              43        43
0.0275691
CudaFDKConeBeamReconstructionFilter           1         1         9.58291

CudaFFTRampImageFilter                        43        43
0.0389145
ExtractImageFilter                            43        43
0.0130324
FFTWRealToHalfHermitianForwardFFTImageFilter  12        12
0.00128049
ImageFileReader                               686       686
 0.0481416
ImageFileWriter                               1         1         11.8383

ImageSeriesReader                             686       686
 0.0484766
ProjectionsReader                             1         1         44.7685

Self                                          129       129
 0.0506474
StreamingImageFilter                          2         2         27.713

VarianObiRawImageFilter                       686       686       0.0135297

At the beginning i was using my own application with my own data i now
switched back to the wiki VarianRecontruction test ( with a 512^3
reconstructed volume).

Thank you again,
Kind Regards

Elena

2018-07-18 22:00 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:

> Hi,
> Thanks for sharing your results.
> RTK uses CUFFT for the ramp filtering which does its own blocks/grid
> management. For backprojection, it's pretty simple, see
> https://github.com/SimonRit/RTK/blob/master/src/
> rtkCudaFDKBackProjectionImageFilter.cu#L198
> mostly hardcoded, independent of the number of CUDA cores and could be
> optimized. There is one compilation parameter that you can try to change to
> see if that speeds up the computation, that is the cmake variable
> RTK_CUDA_PROJECTIONS_SLAB_SIZE which controls how many projections are
> backprojected simultaneously.
> We currently currently don't propose any way to use multiple GPUs.
> Please keep us posted if you continue to do some tests. In particular, I
> advise turning on RTK_TIME_EACH_FILTER in cmake so that you get a report
> with -v option in applications on how much time your program spent in each
> filter.
> Best regards,
> Simon
>
> On Wed, Jul 18, 2018 at 6:48 PM, Elena Padovani <
> elenapadovani.lk at gmail.com> wrote:
>
>> Hi RTK-users,
>>
>> I compiled RTK with CUDA and tried to setup a benchmark to analyze the
>> performances trend of the GPUs when using the CUDA-FDK reconstruction
>> filter. Precisely, when reconstructing the same volume from the same
>> data-set on NVS510 GTX860M and GTX970M i got results consistent with the
>> number of GPUs cuda cores. Indeed, when setting up this benchmark i was
>> expecting a reduction in the reconstruction time with the increase of
>> cuda cores(at least until the dimension of the reconstructed volume was not
>> the actual bottleneck). However, when testing it on a Tesla P100 i got
>> performances comparable to the GTX860M. Would you expect such a result?
>>
>> Unfortunately i am new to CUDA and i was wondering if any of you could
>> help me figuring this out.
>> How does rtk with CUDA manage the number of blocks/grid dimension ?
>> Is the number of blocks/grid dimension depedent on the GPU cuda cores?
>> Is there a way to use multiple GPUs?
>>
>> The test was carried with the following data:
>> - 360 projections
>> - reconstructed volume 600x700x800 px
>>
>> Thank you in advance
>> Kind regards
>>
>> Elena
>>
>>
>> _______________________________________________
>> Rtk-users mailing list
>> Rtk-users at public.kitware.com
>> https://public.kitware.com/mailman/listinfo/rtk-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/rtk-users/attachments/20180719/6a4103a4/attachment-0001.html>


More information about the Rtk-users mailing list