[Rtk-users] GPUs testing
Chao Wu
wuchao04 at gmail.com
Fri Jul 20 07:03:15 EDT 2018
Hi, By a quick look, the time reported with RTK_TIME_EACH_FILTER seems to
be time per execution of each filter. I didn't look into the code so I have
no idea whether it is an average time or the time of the last execution .
In addition (not shown in your example) if one filter has more than one
instances in the pipeline, the report only lists the total number of
executions of all instances.
Regards, Chao
Elena Padovani <elenapadovani.lk at gmail.com> 于2018年7月19日周四 上午11:44写道:
> Hi Simon,
> Thank you for the fast reply. i changed the RTK_CUDA_PROJECTIONS_SLAB_SIZE
> but unfortunately nothing has changed. I also compiled it with the FLAG
> RTK_TIME_EACH_FILTER on and i did not understand why it tells me
> that CudaFDKBackProjectionImageFilter took 0.0275
> s, CudaFDKConeBeamReconstructionFilter took 9.58 s
> and CudaFFTRampImageFilter took 0.0389 s while the PrintTiming method tells
> me that; Prefilter operations took 6.65, Ramp Filter 1.71, Backprojection:
> 1.21. So my question is where is the remaining time spent ? For instance,
> is (Backprojection=1.21 - CudaFDKBackProjectionImageFilter=0.0275) the
> time needed to copy the memory from CPU to GPU? The same holds for the ramp
> filter.
> Moreover it seems to me that what is taking long is the CUDAWeighting
> filter so do you think that increasing the number of thread per block which
> is now { 16, 16 , 2 } could help ?
>
> Here is what the applications shows me with the -v option:
>
> Reconstructing and writing... It took 11.8574 s
> FDKConeBeamReconstructionFilter timing:
> Prefilter operations: 6.65107 s
> Ramp filter: 1.71472 s
> Backprojection: 1.21037 s
>
>
>
> *********************************************************************************
> Probe Tag Starts Stops Time
> (s)
>
> *********************************************************************************
> ConstantImageSource 1 1
> 0.0962241
> CudaCropImageFilter 43 43
> 0.00230094
> CudaFDKBackProjectionImageFilter 43 43
> 0.0275691
> CudaFDKConeBeamReconstructionFilter 1 1 9.58291
>
> CudaFFTRampImageFilter 43 43
> 0.0389145
> ExtractImageFilter 43 43
> 0.0130324
> FFTWRealToHalfHermitianForwardFFTImageFilter 12 12
> 0.00128049
> ImageFileReader 686 686
> 0.0481416
> ImageFileWriter 1 1 11.8383
>
> ImageSeriesReader 686 686
> 0.0484766
> ProjectionsReader 1 1 44.7685
>
> Self 129 129
> 0.0506474
> StreamingImageFilter 2 2 27.713
>
> VarianObiRawImageFilter 686 686 0.0135297
>
> At the beginning i was using my own application with my own data i now
> switched back to the wiki VarianRecontruction test ( with a 512^3
> reconstructed volume).
>
> Thank you again,
> Kind Regards
>
> Elena
>
> 2018-07-18 22:00 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:
>
>> Hi,
>> Thanks for sharing your results.
>> RTK uses CUFFT for the ramp filtering which does its own blocks/grid
>> management. For backprojection, it's pretty simple, see
>>
>> https://github.com/SimonRit/RTK/blob/master/src/rtkCudaFDKBackProjectionImageFilter.cu#L198
>> mostly hardcoded, independent of the number of CUDA cores and could be
>> optimized. There is one compilation parameter that you can try to change to
>> see if that speeds up the computation, that is the cmake variable
>> RTK_CUDA_PROJECTIONS_SLAB_SIZE which controls how many projections are
>> backprojected simultaneously.
>> We currently currently don't propose any way to use multiple GPUs.
>> Please keep us posted if you continue to do some tests. In particular, I
>> advise turning on RTK_TIME_EACH_FILTER in cmake so that you get a report
>> with -v option in applications on how much time your program spent in each
>> filter.
>> Best regards,
>> Simon
>>
>> On Wed, Jul 18, 2018 at 6:48 PM, Elena Padovani <
>> elenapadovani.lk at gmail.com> wrote:
>>
>>> Hi RTK-users,
>>>
>>> I compiled RTK with CUDA and tried to setup a benchmark to analyze the
>>> performances trend of the GPUs when using the CUDA-FDK reconstruction
>>> filter. Precisely, when reconstructing the same volume from the same
>>> data-set on NVS510 GTX860M and GTX970M i got results consistent with the
>>> number of GPUs cuda cores. Indeed, when setting up this benchmark i was
>>> expecting a reduction in the reconstruction time with the increase of
>>> cuda cores(at least until the dimension of the reconstructed volume was not
>>> the actual bottleneck). However, when testing it on a Tesla P100 i got
>>> performances comparable to the GTX860M. Would you expect such a result?
>>>
>>> Unfortunately i am new to CUDA and i was wondering if any of you could
>>> help me figuring this out.
>>> How does rtk with CUDA manage the number of blocks/grid dimension ?
>>> Is the number of blocks/grid dimension depedent on the GPU cuda cores?
>>> Is there a way to use multiple GPUs?
>>>
>>> The test was carried with the following data:
>>> - 360 projections
>>> - reconstructed volume 600x700x800 px
>>>
>>> Thank you in advance
>>> Kind regards
>>>
>>> Elena
>>>
>>>
>>> _______________________________________________
>>> Rtk-users mailing list
>>> Rtk-users at public.kitware.com
>>> https://public.kitware.com/mailman/listinfo/rtk-users
>>>
>>>
>>
> _______________________________________________
> Rtk-users mailing list
> Rtk-users at public.kitware.com
> https://public.kitware.com/mailman/listinfo/rtk-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/rtk-users/attachments/20180720/78f8ff62/attachment.html>
More information about the Rtk-users
mailing list