[Rtk-users] Slow CUDA FDK performance
Moritz Schaar
schaar at imt.uni-luebeck.de
Fri Nov 5 11:13:06 EDT 2021
Hi,
I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0.
This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019 and even python update (3.8.5 to 3.8.12).
Using the python wrapping of RTK I implemented own routines that use FDK similar to the rtkfdk application.
On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic voxel size.
The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU version took 17.1 and the CUDA version 1.2 seconds.
Running the new software version on the same system results in roughly 19 s CPU time but more than 7 s for the CUDA version.
I don't care about the actual timings but the relative increase of the CUDA version is what bothers me.
To dig up some more information I recompiled RTK with RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I got:
**************************************************************************************************************
Probe Tag Starts Stops Time (s) Memory (kB) Cuda memory (kB)
**************************************************************************************************************
ChangeInformationImageFilter 200 200 0.0211846 0 0
ConstantImageSource 1 1 0.0305991 65668 0
CudaCropImageFilter 13 13 0.0222911 15786.8 15753.8
CudaDisplacedDetectorImageFilter 13 13 0.0540568 10719.1 16384
CudaFDKBackProjectionImageFilter 13 13 0.0326397 5051.38 5041.23
CudaFDKConeBeamReconstructionFilter 1 1 5.72999 552184 211648
CudaFDKWeightProjectionFilter 13 13 0.0262806 -13892 630.154
CudaFFTRampImageFilter 13 13 0.148416 43095.4 12499.7
CudaParkerShortScanImageFilter 13 13 0.0467202 2525.85 15753.8
ExtractImageFilter 13 13 0.0259726 15812.3 -15753.8
ImageFileReader 200 200 0.0226735 -0.16 0
ImageSeriesReader 200 200 0.066097 6.12 0
ProjectionsReader 1 1 26.0388 208488 0
StreamingImageFilter 2 2 16.0663 547512 191840
VnlRealToHalfHermitianForwardFFTImageFilter 2 2 0.0208174 0 0
Following the conversion on the mailing list, https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html, I see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which roughly 1/3 is spent in the CudaFFTRampImageFilter.
Sadly I don't have these results for the old software version so I can't relate these values.
However, I also played around with v2.2.0 but it doesn't make a difference.
Sadly, the version I used before (v2.1.0) won't compile with CUDA 11.5 anymore. I tried to add small adjustments e.g. this commit https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6 to make things work but this didn't work.
The same happens with other errors when trying to setup ITK 5.1.2, so getting back the old version for comparison seems impossible.
Is there any direction you can point me to check what is actually the issue here? Or maybe someone has an idea what could be the reason? CUDA/RTK/ITK version?
Any help is appreciated.
Best,
Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/rtk-users/attachments/20211105/4c324132/attachment.htm>
More information about the Rtk-users
mailing list