[Rtk-users] Stream divisions in rtkfdk

Chao Wu wuchao04 at gmail.com
Thu May 22 04:06:44 EDT 2014


Hi Simon,

Thanks for the suggestions.

The problem could be reproduced here (8G RAM, 1.5G GRAM, RTK1.0.0) by:

rtksimulatedgeometry -n 30 -o geometry.xml --sdd=1536 --sid=384
rtkprojectgeometricphantom -g geometry.xml -o projections.nii --spacing 0.6
--dimension 1944,1536 --phantomfile SheppLogan.txt
rtkfdk -p . -r projections.nii -o fdk.nii -g geometry.xml --spacing 0.4
--dimension 640,250,640 --hardware=cuda -v -l

With #define VERBOSE (btw I got it in itkCudaDataManager.cxx instead of
itkCudaImageDataManager.hxx) now I can have a better view of the GRAM
usage.
I found that the size of the volume data in the GRAM could be reduced by
--divisions but the amount of projection data sent to the GRAM are not
influenced by --lowmem switch.
So --divisions does not help much if it is mainly the projection data which
takes up GRAM, while --lowmem does not help at all. I did not look into the
more front part of the code so I am not sure if this is the designed
behaviour.

On the other hand, I am also looking for possibilities to reduce GRAM used
in the CUDA ramp filter. At least one thing should be changed, and one
thing may be considered:
- in rtkCudaFFTRampImageFilter.cu the forward FFT plan (fftFwd) should be
destroyed earlier, right after the plan being executed. A plan takes up at
least the same amount of memory as the data.
- cufftExecR2C and cufftExecC2R can be in-place. However I do not have a
clear idea about how to pad deviceProjection to the required size of
its cufftComplex counterpart.

Any comments?

Best regards,
Chao



2014-05-21 14:30 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:

> Since it fails in cufft, it's the memory of the projections that is a
> problem. Therefore, it is not surprising that --divisions has no
> influence. But --lowmem should have an influence. I would suggest:
> - to uncomment
> //#define VERBOSE
> in itkCudaImageDataManager.hxx and try to see what amount of memory
> are requested.
> - to try to reproduce the problem with simulated data so that we can
> help you in finding a solution.
> Simon
>
> On Wed, May 21, 2014 at 2:21 PM, Chao Wu <wuchao04 at gmail.com> wrote:
> > Hi Simon,
> >
> > Yes I switched on an off the --lowmem option and it has no influence on
> the
> > behaviour I mentioned.
> > In my case the system memory is sufficient to handle the projections plus
> > the volume.
> > The major bottleneck is the amount of graphics memory.
> > If I reconstruct a little bit more slices than the limit that I found
> with
> > one stream, the allocation of GPU resource for CUFFT in the
> > CudaFFTRampImageFilter will fail (which was more or less expected).
> > However with --divisions > 1 it is indeed able to reconstruct more
> slices,
> > but only a very few more; otherwise the CUFFT would fail again.
> > I would expect the limitations of the amount of slices to be
> approximately
> > proportional to the number of streams, or do I miss anything about stream
> > division?
> >
> > Thanks,
> > Chao
> >
> >
> >
> > 2014-05-21 13:43 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:
> >
> >> Hi Chao,
> >> There are two things that use memory, the volume and the projections.
> >> The --divisions option divides the volume only. The --lowmem option
> >> works on a subset of projections at a time. Did you try this?
> >> Simon
> >>
> >> On Wed, May 21, 2014 at 12:18 PM, Chao Wu <wuchao04 at gmail.com> wrote:
> >> > Hoi,
> >> >
> >> > I may need some hint about how the stream division works in rtkfdk.
> >> > I noticed that the StreamingImageFilter from ITK is used but I cannot
> >> > figure
> >> > out quickly how the division has been performed.
> >> > I did some test with reconstructing 400 1500x1200 projections into a
> >> > 640xNx640 volume (the pixel and voxel size are comparable).
> >> > The reconstructions were executed by rtkfdk with CUDA.
> >> > When I leave the origin of the volume at the center by default, I can
> >> > reconstruct up to N=200 slices with --divisions=1 due to the
> limitation
> >> > of
> >> > the graphic memory. Then when I increase the number of divisions to
> 2, I
> >> > can
> >> > only reconstruct up to 215 slices; and with divisions to 3 only up to
> >> > 219
> >> > slices. Does anyone have an idea why it scales like this?
> >> > Thanks in advance.
> >> >
> >> > Best regards,
> >> > Chao
> >> >
> >> > _______________________________________________
> >> > Rtk-users mailing list
> >> > Rtk-users at openrtk.org
> >> > http://public.kitware.com/cgi-bin/mailman/listinfo/rtk-users
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/rtk-users/attachments/20140522/2d781acf/attachment-0009.html>


More information about the Rtk-users mailing list