<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.E-MailFormatvorlage19
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.E-MailFormatvorlage20
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DE" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi Simon,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">thank you for looking into it!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">So there is no general issue, which is nice to know.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Sadly I cannot run the old version as I end up with the following error:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">“ImportError: DLL load failed while importing _RTKPython”<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">ITK works, RTK doesn’t. And rebuilding also doesn’t work as too many things changed (CUDA, VS, ..).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In the meantime I figured out that modifying “m_ProjectionSubsetSize” helps to accelerate everything.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Checking this with “feldkamp.GetProjectionSubsetSize()” I get for the CPU and CUDA version of the FDKConeBeamReconstructionFilter
 a value of “2”.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Regarding your test code I, obviously, get very slow execution times with this value. Increasing the number of subsets
 to 200 I end up with similar values you reported.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Now I am wondering where these defaults come from.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">For CPU I assume that this comes from missing FFTW as given here
<a href="https://github.com/SimonRit/RTK/blob/46ea6e190965bcc89421075830e365063aa8c51a/include/rtkFDKConeBeamReconstructionFilter.hxx#L48">
https://github.com/SimonRit/RTK/blob/46ea6e190965bcc89421075830e365063aa8c51a/include/rtkFDKConeBeamReconstructionFilter.hxx#L48</a>
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">However, I do not understand why the CUDA version defaults to 2 instead of 16.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In CMake I kept the default: RTK_CUDA_PROJECTIONS_SLAB_SIZE=16<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">From
<a href="https://github.com/SimonRit/RTK/blob/master/rtkConfiguration.h.in#L35">https://github.com/SimonRit/RTK/blob/master/rtkConfiguration.h.in#L35</a> I assume that this gets copied to SLAB_SIZE which will be used by all CUDA codes.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">My rtkConfiguration.h also reflects this:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">#ifndef SLAB_SIZE<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">#  define SLAB_SIZE 16<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">#endif<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Do you have an explanation why this gets reduced from 16 to 2? Or a hint where I can have a look?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Moritz<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Von:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Simon Rit <simon.rit@creatis.insa-lyon.fr>
<br>
<b>Gesendet:</b> Donnerstag, 18. November 2021 12:13<br>
<b>An:</b> Moritz Schaar <schaar@imt.uni-luebeck.de><br>
<b>Cc:</b> rtk-users@public.kitware.com<br>
<b>Betreff:</b> Re: [Rtk-users] Slow CUDA FDK performance<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Hi,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">I compiled the python packages with exactly the same configurations and I can't reproduce the issue<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0 -> 0.9 s</span><o:p></o:p></p>
</div>
<p class="MsoNormal">0.019904613494873047<br>
0.6475656032562256<br>
Reconstructing...<o:p></o:p></p>
<div>
<p class="MsoNormal">0.9730124473571777<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0</span><o:p></o:p></p>
<div>
<p class="MsoNormal">0.017342329025268555<br>
0.7650339603424072<br>
Reconstructing...<br>
0.8823671340942383<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The code I ran is the following<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">#!/usr/bin/env python<br>
import sys<br>
import itk<br>
import time<br>
from itk import RTK as rtk<br>
<br>
if len ( sys.argv ) < 3:<br>
  print( "Usage: FirstReconstruction <outputimage> <outputgeometry>" )<br>
  sys.exit ( 1 )<br>
<br>
# Defines the image type<br>
GPUImageType = rtk.CudaImage[itk.F,3]<br>
CPUImageType = rtk.Image[itk.F,3]<br>
<br>
# Defines the RTK geometry object<br>
geometry = rtk.ThreeDCircularProjectionGeometry.New()<br>
numberOfProjections = 200<br>
firstAngle = 0.<br>
angularArc = 360.<br>
sid = 600 # source to isocenter distance<br>
sdd = 1200 # source to detector distance<br>
for x in range(0,numberOfProjections):<br>
  angle = firstAngle + x * angularArc / numberOfProjections<br>
  geometry.AddProjection(sid,sdd,angle)<br>
<br>
# Writing the geometry to disk<br>
xmlWriter = rtk.ThreeDCircularProjectionGeometryXMLFileWriter.New()<br>
xmlWriter.SetFilename ( sys.argv[2] )<br>
xmlWriter.SetObject ( geometry );<br>
xmlWriter.WriteFile();<br>
<br>
# Create a stack of empty projection images<br>
ConstantImageSourceType = rtk.ConstantImageSource[GPUImageType]<br>
constantImageSource = ConstantImageSourceType.New()<br>
origin = [ -127.75, -127.75, 0. ]<br>
sizeOutput = [ 512, 512,  numberOfProjections ]<br>
spacing = [ 0.5, 0.5, 0.5 ]<br>
constantImageSource.SetOrigin( origin )<br>
constantImageSource.SetSpacing( spacing )<br>
constantImageSource.SetSize( sizeOutput )<br>
constantImageSource.SetConstant(0.)<br>
<br>
REIType = rtk.RayEllipsoidIntersectionImageFilter[CPUImageType, CPUImageType]<br>
rei = REIType.New()<br>
semiprincipalaxis = [ 50, 50, 50]<br>
center = [ 0, 0, 10]<br>
# Set GrayScale value, axes, center...<br>
rei.SetDensity(2)<br>
rei.SetAngle(0)<br>
rei.SetCenter(center)<br>
rei.SetAxis(semiprincipalaxis)<br>
rei.SetGeometry( geometry )<br>
rei.SetInput(constantImageSource.GetOutput())<br>
<br>
# Create reconstructed image<br>
constantImageSource2 = ConstantImageSourceType.New()<br>
sizeOutput = [ 256 ] * 3<br>
origin = [ -63.75 ] * 3<br>
spacing = [ 0.5 ] *  3<br>
constantImageSource2.SetOrigin( origin )<br>
constantImageSource2.SetSpacing( spacing )<br>
constantImageSource2.SetSize( sizeOutput )<br>
constantImageSource2.SetConstant(0.)<br>
t0 = time.time()<br>
constantImageSource2.Update()<br>
t1 = time.time()<br>
print(t1-t0)<br>
<br>
# Graft the projections to an itk::CudaImage<br>
projections = GPUImageType.New()<br>
t0 = time.time()<br>
rei.Update()<br>
t1 = time.time()<br>
print(t1-t0)<br>
projections.SetPixelContainer(rei.GetOutput().GetPixelContainer())<br>
projections.CopyInformation(rei.GetOutput())<br>
projections.SetBufferedRegion(rei.GetOutput().GetBufferedRegion())<br>
projections.SetRequestedRegion(rei.GetOutput().GetRequestedRegion())<br>
<br>
# FDK reconstruction<br>
print("Reconstructing...")<br>
FDKGPUType = rtk.CudaFDKConeBeamReconstructionFilter<br>
feldkamp = FDKGPUType.New()<br>
feldkamp.SetInput(0, constantImageSource2.GetOutput())<br>
feldkamp.SetInput(1, projections)<br>
feldkamp.SetGeometry(geometry)<br>
feldkamp.GetRampFilter().SetTruncationCorrection(0.0)<br>
feldkamp.GetRampFilter().SetHannCutFrequency(0.0)<br>
t0 = time.time()<br>
feldkamp.Update()<br>
t1 = time.time()<br>
print(t1-t0)</span><o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
To be honest I don't see to do at this stage... Can you maybe check the same code with your two versions ? Any other suggestion?<br>
Simon<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Nov 10, 2021 at 10:03 AM Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Hi Simon,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I completely agree that this is hard to track down. That’s why I am asking for directions
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">J</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">To be more precise about the execution times of my example:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The timings given in pairs of 17.1/1.2 s and 19/7 s are only the required times of the
 reconstruction step itself.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Reading data, pre and post processing are not part of this time measurement.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">So the 7 s average in python is similar to the 6.41 s I obtained from adding everything
 done in CudaFDKConeBeamReconstructionFilter using RTK_PROBE_EACH_FILTER.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The reconstruction step in python simply involves:</span><o:p></o:p></p>
<p><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-</span><span lang="EN-GB" style="font-size:7.0pt;color:#1F497D">         
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Instantiation of a simple class, this doesn’t add anything to the timings</span><o:p></o:p></p>
<p><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-</span><span lang="EN-GB" style="font-size:7.0pt;color:#1F497D">         
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Setting up ConstantImageSource with either rtk.Image or rtk.CudaImage</span><o:p></o:p></p>
<p><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-</span><span lang="EN-GB" style="font-size:7.0pt;color:#1F497D">         
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Setting up FDKConeBeamReconstructionFilter/CudaFDKConeBeamReconstructionFilter</span><o:p></o:p></p>
<p><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-</span><span lang="EN-GB" style="font-size:7.0pt;color:#1F497D">         
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Setting inputs, geometry and filter</span><o:p></o:p></p>
<p><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-</span><span lang="EN-GB" style="font-size:7.0pt;color:#1F497D">         
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Update() and return result</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Looks like there was a typo in my mail, the versions compared should be:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Sorry for the confusion and thanks for looking into it!</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Best,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Moritz</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Von:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Simon Rit <<a href="mailto:simon.rit@creatis.insa-lyon.fr" target="_blank">simon.rit@creatis.insa-lyon.fr</a>>
<br>
<b>Gesendet:</b> Mittwoch, 10. November 2021 09:32<br>
<b>An:</b> Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>><br>
<b>Cc:</b> <a href="mailto:rtk-users@public.kitware.com" target="_blank">rtk-users@public.kitware.com</a><br>
<b>Betreff:</b> Re: [Rtk-users] Slow CUDA FDK performance</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hi Moritz,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks for the report. It's a bit hard to be convinced that something is wrong without being able to reproduce it. From the
<span lang="EN-GB">RTK_PROBE_EACH_FILTER</span> log, most of the time is spent reading the projections which will be the same with or without cuda so I wonder if this is not the issue here. I can try to reproduce the issue, can you just confirm the two configurations
 : Cuda 10.2, ITK 5.2.1, RTK 2.1.0 vs Cuda 11.5, ITK 5.2.1 RTK 2.3.0 ?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Simon<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Fri, Nov 5, 2021 at 4:20 PM Moritz Schaar <<a href="mailto:schaar@imt.uni-luebeck.de" target="_blank">schaar@imt.uni-luebeck.de</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid windowtext 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt;border-color:currentcolor currentcolor currentcolor rgb(204,204,204)">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Hi,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019 and even python update (3.8.5 to 3.8.12).</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Using the python wrapping of RTK I implemented own routines that use FDK similar to the rtkfdk application.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic voxel size.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU version took 17.1 and the CUDA version 1.2 seconds.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Running the new software version on the same system results in roughly 19 s CPU time but more than 7 s for the CUDA version.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">I don’t care about the actual timings but the relative increase of the CUDA version is what bothers me.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">To dig up some more information I recompiled RTK with RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I got:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">**************************************************************************************************************</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Probe Tag                                    Starts    Stops     Time (s)       Memory (kB)    Cuda memory (kB)</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">**************************************************************************************************************</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ChangeInformationImageFilter                 200       200       0.0211846      0              0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ConstantImageSource                          1         1         0.0305991      65668          0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaCropImageFilter                          13        13        0.0222911      15786.8        15753.8</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaDisplacedDetectorImageFilter             13        13        0.0540568      10719.1        16384</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaFDKBackProjectionImageFilter             13        13        0.0326397      5051.38        5041.23</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaFDKConeBeamReconstructionFilter          1         1         5.72999        552184         211648</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaFDKWeightProjectionFilter                13        13        0.0262806      -13892         630.154</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaFFTRampImageFilter                       13        13        0.148416       43095.4        12499.7</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">CudaParkerShortScanImageFilter               13        13        0.0467202      2525.85        15753.8</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ExtractImageFilter                           13        13        0.0259726      15812.3        -15753.8</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ImageFileReader                              200       200       0.0226735      -0.16          0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ImageSeriesReader                            200       200       0.066097       6.12           0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">ProjectionsReader                            1         1         26.0388        208488         0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">StreamingImageFilter                         2         2         16.0663        547512         191840</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">VnlRealToHalfHermitianForwardFFTImageFilter  2         2         0.0208174      0              0</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Following the conversion on the mailing list,
<a href="https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html" target="_blank">
https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html</a>, I see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which roughly 1/3 is spent in the CudaFFTRampImageFilter.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Sadly I don’t have these results for the old software version so I can’t relate these values.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">However, I also played around with v2.2.0 but it doesn’t make a difference.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Sadly, the version I used before (v2.1.0) won’t compile with CUDA 11.5 anymore. I tried to add small adjustments e.g. this commit
<a href="https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6" target="_blank">
https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6</a> to make things work but this didn’t work.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">The same happens with other errors when trying to setup ITK 5.1.2, so getting back the old version for comparison seems impossible.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-GB">Is there any direction you can point me to check what is actually the issue here? Or maybe someone has an idea what could be the reason?
</span>CUDA/RTK/ITK version?<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Any help is appreciated.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US" style="color:#145A6E">Best,</span></b><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US" style="color:#145A6E">Moritz</span></b><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US" style="color:#145A6E"> </span></b><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">_______________________________________________<br>
Rtk-users mailing list<br>
<a href="mailto:Rtk-users@public.kitware.com" target="_blank">Rtk-users@public.kitware.com</a><br>
<a href="https://public.kitware.com/mailman/listinfo/rtk-users" target="_blank">https://public.kitware.com/mailman/listinfo/rtk-users</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>