<div dir="ltr"><div>Hi,</div><div>I compiled the python packages with exactly the same configurations and I can't reproduce the issue</div><div><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0 -> 0.9 s</span></div>0.019904613494873047<br>0.6475656032562256<br>Reconstructing...<br><div>0.9730124473571777</div><div><br></div><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0</span><div>0.017342329025268555<br>0.7650339603424072<br>Reconstructing...<br>0.8823671340942383<span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"></span></div><div><br></div><div>The code I ran is the following<span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><br></span></div><div><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">#!/usr/bin/env python<br>import sys<br>import itk<br>import time<br>from itk import RTK as rtk<br><br>if len ( sys.argv ) < 3:<br> print( "Usage: FirstReconstruction <outputimage> <outputgeometry>" )<br> sys.exit ( 1 )<br><br># Defines the image type<br>GPUImageType = rtk.CudaImage[itk.F,3]<br>CPUImageType = rtk.Image[itk.F,3]<br><br># Defines the RTK geometry object<br>geometry = rtk.ThreeDCircularProjectionGeometry.New()<br>numberOfProjections = 200<br>firstAngle = 0.<br>angularArc = 360.<br>sid = 600 # source to isocenter distance<br>sdd = 1200 # source to detector distance<br>for x in range(0,numberOfProjections):<br> angle = firstAngle + x * angularArc / numberOfProjections<br> geometry.AddProjection(sid,sdd,angle)<br><br># Writing the geometry to disk<br>xmlWriter = rtk.ThreeDCircularProjectionGeometryXMLFileWriter.New()<br>xmlWriter.SetFilename ( sys.argv[2] )<br>xmlWriter.SetObject ( geometry );<br>xmlWriter.WriteFile();<br><br># Create a stack of empty projection images<br>ConstantImageSourceType = rtk.ConstantImageSource[GPUImageType]<br>constantImageSource = ConstantImageSourceType.New()<br>origin = [ -127.75, -127.75, 0. ]<br>sizeOutput = [ 512, 512, numberOfProjections ]<br>spacing = [ 0.5, 0.5, 0.5 ]<br>constantImageSource.SetOrigin( origin )<br>constantImageSource.SetSpacing( spacing )<br>constantImageSource.SetSize( sizeOutput )<br>constantImageSource.SetConstant(0.)<br><br>REIType = rtk.RayEllipsoidIntersectionImageFilter[CPUImageType, CPUImageType]<br>rei = REIType.New()<br>semiprincipalaxis = [ 50, 50, 50]<br>center = [ 0, 0, 10]<br># Set GrayScale value, axes, center...<br>rei.SetDensity(2)<br>rei.SetAngle(0)<br>rei.SetCenter(center)<br>rei.SetAxis(semiprincipalaxis)<br>rei.SetGeometry( geometry )<br>rei.SetInput(constantImageSource.GetOutput())<br><br># Create reconstructed image<br>constantImageSource2 = ConstantImageSourceType.New()<br>sizeOutput = [ 256 ] * 3<br>origin = [ -63.75 ] * 3<br>spacing = [ 0.5 ] * 3<br>constantImageSource2.SetOrigin( origin )<br>constantImageSource2.SetSpacing( spacing )<br>constantImageSource2.SetSize( sizeOutput )<br>constantImageSource2.SetConstant(0.)<br>t0 = time.time()<br>constantImageSource2.Update()<br>t1 = time.time()<br>print(t1-t0)<br><br># Graft the projections to an itk::CudaImage<br>projections = GPUImageType.New()<br>t0 = time.time()<br>rei.Update()<br>t1 = time.time()<br>print(t1-t0)<br>projections.SetPixelContainer(rei.GetOutput().GetPixelContainer())<br>projections.CopyInformation(rei.GetOutput())<br>projections.SetBufferedRegion(rei.GetOutput().GetBufferedRegion())<br>projections.SetRequestedRegion(rei.GetOutput().GetRequestedRegion())<br><br># FDK reconstruction<br>print("Reconstructing...")<br>FDKGPUType = rtk.CudaFDKConeBeamReconstructionFilter<br>feldkamp = FDKGPUType.New()<br>feldkamp.SetInput(0, constantImageSource2.GetOutput())<br>feldkamp.SetInput(1, projections)<br>feldkamp.SetGeometry(geometry)<br>feldkamp.GetRampFilter().SetTruncationCorrection(0.0)<br>feldkamp.GetRampFilter().SetHannCutFrequency(0.0)<br>t0 = time.time()<br>feldkamp.Update()<br>t1 = time.time()<br>print(t1-t0)</span></div><br>To be honest I don't see to do at this stage... Can you maybe check the same code with your two versions ? Any other suggestion?<br>Simon</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 10, 2021 at 10:03 AM Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="DE">
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">Hi Simon,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">I completely agree that this is hard to track down. That’s why I am asking for directions
</span><span style="font-size:11pt;font-family:Wingdings;color:rgb(31,73,125)" lang="EN-GB">J</span><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">To be more precise about the execution times of my example:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">The timings given in pairs of 17.1/1.2 s and 19/7 s are only the required times of the reconstruction step itself.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Reading data, pre and post processing are not part of this time measurement.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">So the 7 s average in python is similar to the 6.41 s I obtained from adding everything done in CudaFDKConeBeamReconstructionFilter
using RTK_PROBE_EACH_FILTER.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">The reconstruction step in python simply involves:<u></u><u></u></span></p>
<p><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><span>-<span style="font:7pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Instantiation of a simple class, this doesn’t add anything to the timings<u></u><u></u></span></p>
<p><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><span>-<span style="font:7pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Setting up ConstantImageSource with either rtk.Image or rtk.CudaImage<u></u><u></u></span></p>
<p><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><span>-<span style="font:7pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Setting up FDKConeBeamReconstructionFilter/CudaFDKConeBeamReconstructionFilter<u></u><u></u></span></p>
<p><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><span>-<span style="font:7pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Setting inputs, geometry and filter<u></u><u></u></span></p>
<p><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><span>-<span style="font:7pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Update() and return result<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Looks like there was a typo in my mail, the versions compared should be:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Sorry for the confusion and thanks for looking into it!<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Best,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB">Moritz<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri",sans-serif">Von:</span></b><span style="font-size:11pt;font-family:"Calibri",sans-serif"> Simon Rit <<a href="mailto:simon.rit@creatis.insa-lyon.fr" target="_blank">simon.rit@creatis.insa-lyon.fr</a>>
<br>
<b>Gesendet:</b> Mittwoch, 10. November 2021 09:32<br>
<b>An:</b> Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>><br>
<b>Cc:</b> <a href="mailto:rtk-users@public.kitware.com" target="_blank">rtk-users@public.kitware.com</a><br>
<b>Betreff:</b> Re: [Rtk-users] Slow CUDA FDK performance<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">Hi Moritz,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Thanks for the report. It's a bit hard to be convinced that something is wrong without being able to reproduce it. From the
<span lang="EN-GB">RTK_PROBE_EACH_FILTER</span> log, most of the time is spent reading the projections which will be the same with or without cuda so I wonder if this is not the issue here. I can try to reproduce the issue, can you just confirm the two configurations
: Cuda 10.2, ITK 5.2.1, RTK 2.1.0 vs Cuda 11.5, ITK 5.2.1 RTK 2.3.0 ?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Thanks,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Simon<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Fri, Nov 5, 2021 at 4:20 PM Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-color:currentcolor currentcolor currentcolor rgb(204,204,204);border-style:none none none solid;border-width:medium medium medium 1pt;padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal"><span lang="EN-GB">Hi,</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB"> </span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019 and even python update (3.8.5 to 3.8.12).</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Using the python wrapping of RTK I implemented own routines that use FDK similar to the rtkfdk application.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic voxel size.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU version took 17.1 and the CUDA version 1.2 seconds.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Running the new software version on the same system results in roughly 19 s CPU time but more than 7 s for the CUDA version.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">I don’t care about the actual timings but the relative increase of the CUDA version is what bothers me.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB"> </span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">To dig up some more information I recompiled RTK with RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I got:</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">**************************************************************************************************************</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Probe Tag Starts Stops Time (s) Memory (kB) Cuda memory (kB)</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">**************************************************************************************************************</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ChangeInformationImageFilter 200 200 0.0211846 0 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ConstantImageSource 1 1 0.0305991 65668 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaCropImageFilter 13 13 0.0222911 15786.8 15753.8</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaDisplacedDetectorImageFilter 13 13 0.0540568 10719.1 16384</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKBackProjectionImageFilter 13 13 0.0326397 5051.38 5041.23</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKConeBeamReconstructionFilter 1 1 5.72999 552184 211648</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFDKWeightProjectionFilter 13 13 0.0262806 -13892 630.154</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaFFTRampImageFilter 13 13 0.148416 43095.4 12499.7</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">CudaParkerShortScanImageFilter 13 13 0.0467202 2525.85 15753.8</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ExtractImageFilter 13 13 0.0259726 15812.3 -15753.8</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ImageFileReader 200 200 0.0226735 -0.16 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ImageSeriesReader 200 200 0.066097 6.12 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">ProjectionsReader 1 1 26.0388 208488 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">StreamingImageFilter 2 2 16.0663 547512 191840</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">VnlRealToHalfHermitianForwardFFTImageFilter 2 2 0.0208174 0 0</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB"> </span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Following the conversion on the mailing list,
<a href="https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html" target="_blank">
https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html</a>, I see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which roughly 1/3 is spent in the CudaFFTRampImageFilter.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Sadly I don’t have these results for the old software version so I can’t relate these values.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB"> </span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">However, I also played around with v2.2.0 but it doesn’t make a difference.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Sadly, the version I used before (v2.1.0) won’t compile with CUDA 11.5 anymore. I tried to add small adjustments e.g. this commit
<a href="https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6" target="_blank">
https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6</a> to make things work but this didn’t work.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">The same happens with other errors when trying to setup ITK 5.1.2, so getting back the old version for comparison seems impossible.</span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB"> </span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-GB">Is there any direction you can point me to check what is actually the issue here? Or maybe someone has an idea what could be the reason?
</span>CUDA/RTK/ITK version?<u></u><u></u></p>
<p class="MsoNormal">Any help is appreciated.<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US">Best,</span></b><u></u><u></u></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US">Moritz</span></b><u></u><u></u></p>
<p class="MsoNormal"><b><span style="color:rgb(20,90,110)" lang="EN-US"> </span></b><u></u><u></u></p>
</div>
</div>
<p class="MsoNormal">_______________________________________________<br>
Rtk-users mailing list<br>
<a href="mailto:Rtk-users@public.kitware.com" target="_blank">Rtk-users@public.kitware.com</a><br>
<a href="https://public.kitware.com/mailman/listinfo/rtk-users" target="_blank">https://public.kitware.com/mailman/listinfo/rtk-users</a><u></u><u></u></p>
</blockquote>
</div>
</div>
</div>
</blockquote></div>