[Rtk-users] Slow CUDA FDK performance

Simon Rit simon.rit at creatis.insa-lyon.fr
Thu Nov 18 06:12:41 EST 2021


Hi,
I compiled the python packages with exactly the same configurations and I
can't reproduce the issue
old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0 -> 0.9 s
0.019904613494873047
0.6475656032562256
Reconstructing...
0.9730124473571777

new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0
0.017342329025268555
0.7650339603424072
Reconstructing...
0.8823671340942383

The code I ran is the following
#!/usr/bin/env python
import sys
import itk
import time
from itk import RTK as rtk

if len ( sys.argv ) < 3:
  print( "Usage: FirstReconstruction <outputimage> <outputgeometry>" )
  sys.exit ( 1 )

# Defines the image type
GPUImageType = rtk.CudaImage[itk.F,3]
CPUImageType = rtk.Image[itk.F,3]

# Defines the RTK geometry object
geometry = rtk.ThreeDCircularProjectionGeometry.New()
numberOfProjections = 200
firstAngle = 0.
angularArc = 360.
sid = 600 # source to isocenter distance
sdd = 1200 # source to detector distance
for x in range(0,numberOfProjections):
  angle = firstAngle + x * angularArc / numberOfProjections
  geometry.AddProjection(sid,sdd,angle)

# Writing the geometry to disk
xmlWriter = rtk.ThreeDCircularProjectionGeometryXMLFileWriter.New()
xmlWriter.SetFilename ( sys.argv[2] )
xmlWriter.SetObject ( geometry );
xmlWriter.WriteFile();

# Create a stack of empty projection images
ConstantImageSourceType = rtk.ConstantImageSource[GPUImageType]
constantImageSource = ConstantImageSourceType.New()
origin = [ -127.75, -127.75, 0. ]
sizeOutput = [ 512, 512,  numberOfProjections ]
spacing = [ 0.5, 0.5, 0.5 ]
constantImageSource.SetOrigin( origin )
constantImageSource.SetSpacing( spacing )
constantImageSource.SetSize( sizeOutput )
constantImageSource.SetConstant(0.)

REIType = rtk.RayEllipsoidIntersectionImageFilter[CPUImageType,
CPUImageType]
rei = REIType.New()
semiprincipalaxis = [ 50, 50, 50]
center = [ 0, 0, 10]
# Set GrayScale value, axes, center...
rei.SetDensity(2)
rei.SetAngle(0)
rei.SetCenter(center)
rei.SetAxis(semiprincipalaxis)
rei.SetGeometry( geometry )
rei.SetInput(constantImageSource.GetOutput())

# Create reconstructed image
constantImageSource2 = ConstantImageSourceType.New()
sizeOutput = [ 256 ] * 3
origin = [ -63.75 ] * 3
spacing = [ 0.5 ] *  3
constantImageSource2.SetOrigin( origin )
constantImageSource2.SetSpacing( spacing )
constantImageSource2.SetSize( sizeOutput )
constantImageSource2.SetConstant(0.)
t0 = time.time()
constantImageSource2.Update()
t1 = time.time()
print(t1-t0)

# Graft the projections to an itk::CudaImage
projections = GPUImageType.New()
t0 = time.time()
rei.Update()
t1 = time.time()
print(t1-t0)
projections.SetPixelContainer(rei.GetOutput().GetPixelContainer())
projections.CopyInformation(rei.GetOutput())
projections.SetBufferedRegion(rei.GetOutput().GetBufferedRegion())
projections.SetRequestedRegion(rei.GetOutput().GetRequestedRegion())

# FDK reconstruction
print("Reconstructing...")
FDKGPUType = rtk.CudaFDKConeBeamReconstructionFilter
feldkamp = FDKGPUType.New()
feldkamp.SetInput(0, constantImageSource2.GetOutput())
feldkamp.SetInput(1, projections)
feldkamp.SetGeometry(geometry)
feldkamp.GetRampFilter().SetTruncationCorrection(0.0)
feldkamp.GetRampFilter().SetHannCutFrequency(0.0)
t0 = time.time()
feldkamp.Update()
t1 = time.time()
print(t1-t0)

To be honest I don't see to do at this stage... Can you maybe check the
same code with your two versions ? Any other suggestion?
Simon

On Wed, Nov 10, 2021 at 10:03 AM Moritz Schaar <schaar at imt.uni-luebeck.de>
wrote:

> Hi Simon,
>
>
>
> I completely agree that this is hard to track down. That’s why I am asking
> for directions J
>
> To be more precise about the execution times of my example:
>
> The timings given in pairs of 17.1/1.2 s and 19/7 s are only the required
> times of the reconstruction step itself.
>
> Reading data, pre and post processing are not part of this time
> measurement.
>
> So the 7 s average in python is similar to the 6.41 s I obtained from
> adding everything done in CudaFDKConeBeamReconstructionFilter using
> RTK_PROBE_EACH_FILTER.
>
> The reconstruction step in python simply involves:
>
> -          Instantiation of a simple class, this doesn’t add anything to
> the timings
>
> -          Setting up ConstantImageSource with either rtk.Image or
> rtk.CudaImage
>
> -          Setting up
> FDKConeBeamReconstructionFilter/CudaFDKConeBeamReconstructionFilter
>
> -          Setting inputs, geometry and filter
>
> -          Update() and return result
>
>
>
> Looks like there was a typo in my mail, the versions compared should be:
>
> old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0
>
> new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0
>
>
>
> Sorry for the confusion and thanks for looking into it!
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> *Von:* Simon Rit <simon.rit at creatis.insa-lyon.fr>
> *Gesendet:* Mittwoch, 10. November 2021 09:32
> *An:* Moritz Schaar <schaar at imt.uni-luebeck.de>
> *Cc:* rtk-users at public.kitware.com
> *Betreff:* Re: [Rtk-users] Slow CUDA FDK performance
>
>
>
> Hi Moritz,
>
> Thanks for the report. It's a bit hard to be convinced that something is
> wrong without being able to reproduce it. From the RTK_PROBE_EACH_FILTER
> log, most of the time is spent reading the projections which will be the
> same with or without cuda so I wonder if this is not the issue here. I can
> try to reproduce the issue, can you just confirm the two configurations :
> Cuda 10.2, ITK 5.2.1, RTK 2.1.0 vs Cuda 11.5, ITK 5.2.1 RTK 2.3.0 ?
>
> Thanks,
>
> Simon
>
>
>
> On Fri, Nov 5, 2021 at 4:20 PM Moritz Schaar <schaar at imt.uni-luebeck.de>
> wrote:
>
> Hi,
>
>
>
> I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0.
>
> This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019
> and even python update (3.8.5 to 3.8.12).
>
> Using the python wrapping of RTK I implemented own routines that use FDK
> similar to the rtkfdk application.
>
> On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a
> 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic
> voxel size.
>
> The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU
> version took 17.1 and the CUDA version 1.2 seconds.
>
> Running the new software version on the same system results in roughly 19
> s CPU time but more than 7 s for the CUDA version.
>
> I don’t care about the actual timings but the relative increase of the
> CUDA version is what bothers me.
>
>
>
> To dig up some more information I recompiled RTK with
> RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I
> got:
>
>
> **************************************************************************************************************
>
> Probe Tag                                    Starts    Stops     Time
> (s)       Memory (kB)    Cuda memory (kB)
>
>
> **************************************************************************************************************
>
> ChangeInformationImageFilter                 200       200
> 0.0211846      0              0
>
> ConstantImageSource                          1         1
> 0.0305991      65668          0
>
> CudaCropImageFilter                          13        13
> 0.0222911      15786.8        15753.8
>
> CudaDisplacedDetectorImageFilter             13        13
>      0.0540568      10719.1        16384
>
> CudaFDKBackProjectionImageFilter             13        13
> 0.0326397      5051.38        5041.23
>
> CudaFDKConeBeamReconstructionFilter          1         1
> 5.72999        552184         211648
>
> CudaFDKWeightProjectionFilter                13        13
> 0.0262806      -13892         630.154
>
> CudaFFTRampImageFilter                       13        13
> 0.148416       43095.4        12499.7
>
> CudaParkerShortScanImageFilter               13        13
>      0.0467202      2525.85        15753.8
>
> ExtractImageFilter                           13        13
> 0.0259726      15812.3        -15753.8
>
> ImageFileReader                              200       200
> 0.0226735      -0.16          0
>
> ImageSeriesReader                            200       200
> 0.066097       6.12           0
>
> ProjectionsReader                            1         1
> 26.0388        208488         0
>
> StreamingImageFilter                         2         2         16.0663
>       547512         191840
>
> VnlRealToHalfHermitianForwardFFTImageFilter  2         2
> 0.0208174      0              0
>
>
>
> Following the conversion on the mailing list,
> https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html, I
> see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which
> roughly 1/3 is spent in the CudaFFTRampImageFilter.
>
> Sadly I don’t have these results for the old software version so I can’t
> relate these values.
>
>
>
> However, I also played around with v2.2.0 but it doesn’t make a difference.
>
> Sadly, the version I used before (v2.1.0) won’t compile with CUDA 11.5
> anymore. I tried to add small adjustments e.g. this commit
> https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6
> to make things work but this didn’t work.
>
> The same happens with other errors when trying to setup ITK 5.1.2, so
> getting back the old version for comparison seems impossible.
>
>
>
> Is there any direction you can point me to check what is actually the
> issue here? Or maybe someone has an idea what could be the reason? CUDA/RTK/ITK
> version?
>
> Any help is appreciated.
>
>
>
> *Best,*
>
> *Moritz*
>
>
>
> _______________________________________________
> Rtk-users mailing list
> Rtk-users at public.kitware.com
> https://public.kitware.com/mailman/listinfo/rtk-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/rtk-users/attachments/20211118/a84aac18/attachment-0001.htm>


More information about the Rtk-users mailing list