<div dir="ltr"><div>Hi Moritz,</div><div>Thanks for the report. It's a bit hard to be convinced that something is wrong without being able to reproduce it. From the <span lang="EN-GB">RTK_PROBE_EACH_FILTER</span> log, most of the time is spent reading the projections which will be the same with or without cuda so I wonder if this is not the issue here. I can try to reproduce the issue, can you just confirm the two configurations : Cuda 10.2, ITK 5.2.1, RTK 2.1.0 vs Cuda 11.5, ITK 5.2.1 RTK 2.3.0 ?</div><div>Thanks,</div><div>Simon<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 5, 2021 at 4:20 PM Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de">schaar@imt.uni-luebeck.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="DE">
<div class="gmail-m_-1899161659826733953WordSection1">
<p class="MsoNormal"><span lang="EN-GB">Hi,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019 and even python update (3.8.5 to 3.8.12).<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Using the python wrapping of RTK I implemented own routines that use FDK similar to the rtkfdk application.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic voxel size.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU version took 17.1 and the CUDA version 1.2 seconds.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Running the new software version on the same system results in roughly 19 s CPU time but more than 7 s for the CUDA version.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">I don’t care about the actual timings but the relative increase of the CUDA version is what bothers me.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">To dig up some more information I recompiled RTK with RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I got:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">**************************************************************************************************************<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Probe Tag Starts Stops Time (s) Memory (kB) Cuda memory (kB)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">**************************************************************************************************************<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ChangeInformationImageFilter 200 200 0.0211846 0 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ConstantImageSource 1 1 0.0305991 65668 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaCropImageFilter 13 13 0.0222911 15786.8 15753.8<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaDisplacedDetectorImageFilter 13 13 0.0540568 10719.1 16384<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKBackProjectionImageFilter 13 13 0.0326397 5051.38 5041.23<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKConeBeamReconstructionFilter 1 1 5.72999 552184 211648<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKWeightProjectionFilter 13 13 0.0262806 -13892 630.154<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFFTRampImageFilter 13 13 0.148416 43095.4 12499.7<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">CudaParkerShortScanImageFilter 13 13 0.0467202 2525.85 15753.8<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ExtractImageFilter 13 13 0.0259726 15812.3 -15753.8<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ImageFileReader 200 200 0.0226735 -0.16 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ImageSeriesReader 200 200 0.066097 6.12 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">ProjectionsReader 1 1 26.0388 208488 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">StreamingImageFilter 2 2 16.0663 547512 191840<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">VnlRealToHalfHermitianForwardFFTImageFilter 2 2 0.0208174 0 0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Following the conversion on the mailing list,
<a href="https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html" target="_blank">https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html</a>, I see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which roughly 1/3 is spent in the CudaFFTRampImageFilter.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Sadly I don’t have these results for the old software version so I can’t relate these values.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">However, I also played around with v2.2.0 but it doesn’t make a difference.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Sadly, the version I used before (v2.1.0) won’t compile with CUDA 11.5 anymore. I tried to add small adjustments e.g. this commit
<a href="https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6" target="_blank">
https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6</a> to make things work but this didn’t work.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">The same happens with other errors when trying to setup ITK 5.1.2, so getting back the old version for comparison seems impossible.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Is there any direction you can point me to check what is actually the issue here? Or maybe someone has an idea what could be the reason?
</span>CUDA/RTK/ITK version?<u></u><u></u></p>
<p class="MsoNormal">Any help is appreciated.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US">Best,<u></u><u></u></span></b></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US">Moritz<u></u><u></u></span></b></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US"><u></u> <u></u></span></b></p>
</div>
</div>
_______________________________________________<br>
Rtk-users mailing list<br>
<a href="mailto:Rtk-users@public.kitware.com" target="_blank">Rtk-users@public.kitware.com</a><br>
<a href="https://public.kitware.com/mailman/listinfo/rtk-users" rel="noreferrer" target="_blank">https://public.kitware.com/mailman/listinfo/rtk-users</a><br>
</blockquote></div>