[Rtk-users] rtkfdk: CudaFDKWeightProjectionFilter runs twice

Chao Wu wuchao04 at gmail.com
Tue Jun 2 17:08:52 EDT 2015


Hi Simon,

I found how to reproduce it.
Instead of using proj.mha, if you use separate projection files (e.g.
proj_0.tif ~ proj_7.tif), and add the -l flag you will see the problem. The
--subsetsize is needed to show the issue (not necessarily =1 but should let
the for loop in the FDK filter run several times). The issue is not in the
first time the for loop runs, but only in sequential ones.

Here's my output using separate tif files as projections: without -l it
runs 8 times, with -l it runs 15 times = 1 + 7*2 (first time in the loop is
normal, next ones weight filter run twice).

D:\Chao\RTK>rtkfdk -p . -r proj_ -o fdk.mha -g g --hardware cuda
--subsetsize 1
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()

D:\Chao\RTK>rtkfdk -p . -r proj_ -o fdk.mha -g g --hardware cuda
--subsetsize 1 -l
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()
CudaFDKWeightProjectionFilter::GPUGenerateData()

When I print out the GRAM activities by #define VERBOSE in
itkCudaDataManager.h and add some other debug info in the code (before and
after filter->Update(), and at the beginning and end of GPUGenerateData()
of the filters) I have the following log showing the problem with -l flag
on (I show below the log of the first 4 projections and hope it is still
readable...)
When I focus on the CudaDataManager that updates (copies) the projection
data from CPU to GPU for the weight filter, I actually found two:
00000000036B7CC0 and 00000000036B80C0. The latter one only exists for
projections other than the first projection and runs during the first
weight filter update, and the former one exists for all projections and is
the one running during the only weight filter update for the first
projection and the one taking action when the weight filter reruns during
the ramp filter update for the remaining projections.
(Note that I have a small modification in the GRAM message: when a gpu
buffer is freed for a new allocation, I log it as "Deallocate" instead of
"Freed" to discriminate it from a pure free operation.)

===Projection 0:

>>>>>>>> Start Weight Update
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009700040 : 4805568
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000384B3F0::Allocate Create GPU buffer of size 4805568 Bytes :
0000002304AC0000
00000000036B7CC0::UpdateGPUBuffer CPU->GPU data copy
0000000009BA0040->0000002304AC0000 : 4805568
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
End Weight Update <<<<<<<<

>>>>>>>> Start Ramp Update
--------VVVVVVVV CudaFFTConvolutionImageFilter GenerateData--------
000000000391D390::Allocate Create GPU buffer of size 9704448 Bytes :
0000002304F60000
000000000391C800::Allocate Create GPU buffer of size 31112 Bytes :
00000023058C0000
00000000036B85C0::UpdateGPUBuffer CPU->GPU data copy
00000000039A5660->00000023058C0000 : 31112
00000000038B3A40::Allocate Create GPU buffer of size 846660 Bytes :
00000023059C0000
--------^^^^^^^^ CudaFFTConvolutionImageFilter GenerateData--------
000000000391C800::Freed GPU buffer of size 31112 Bytes : 00000023058C0000
000000000391D390::Freed GPU buffer of size 9704448 Bytes : 0000002304F60000
End Ramp Update <<<<<<<<

>>>>>>>> Start BP Update
--------VVVVVVVV CudaFDKBackProjectionImageFilter GenerateData--------
0000000003832A80::Allocate Create GPU buffer of size 4290772992 Bytes :
0000002306420000
00000000036B79C0::UpdateGPUBuffer CPU->GPU data copy
00000029FFC10040->0000002306420000 : 4290772992
--------^^^^^^^^ CudaFDKBackProjectionImageFilter GenerateData--------
End BP Update <<<<<<<<

===Projection 1:

>>>>>>>> Start Weight Update
000000000380B570::Deallocate GPU buffer of size 2402784 Bytes :
0000002303F20000
000000000380B570::Allocate Create GPU buffer of size 2418336 Bytes :
0000002303F20000
00000000036B75C0::UpdateGPUBuffer CPU->GPU data copy
000000000ABF0040->0000002303F20000 : 2418336
0000000003832530::Deallocate GPU buffer of size 4805568 Bytes :
0000002304180000
0000000003832530::Allocate Create GPU buffer of size 4836672 Bytes :
0000002304180000
0000000003832800::Deallocate GPU buffer of size 4805568 Bytes :
0000002304620000
0000000003832800::Allocate Create GPU buffer of size 4836672 Bytes :
0000002304620000
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000384B3F0::Freed GPU buffer of size 4805568 Bytes : 0000002304AC0000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000384B170::Allocate Create GPU buffer of size 4836672 Bytes :
0000002305AC0000
00000000036B80C0::UpdateGPUBuffer CPU->GPU data copy
00000000094B0040->0000002305AC0000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
End Weight Update <<<<<<<<

>>>>>>>> Start Ramp Update
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000384B170::Freed GPU buffer of size 4836672 Bytes : 0000002305AC0000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000391D110::Allocate Create GPU buffer of size 4836672 Bytes :
0000002305AC0000
00000000036B7CC0::UpdateGPUBuffer CPU->GPU data copy
0000000009950040->0000002305AC0000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
--------VVVVVVVV CudaFFTConvolutionImageFilter GenerateData--------
00000000039BFED0::Allocate Create GPU buffer of size 9704448 Bytes :
0000002304AC0000
00000000039C01A0::Allocate Create GPU buffer of size 31112 Bytes :
0000002305F60000
00000000036B81C0::UpdateGPUBuffer CPU->GPU data copy
00000000039B8470->0000002305F60000 : 31112
00000000039C02E0::Allocate Create GPU buffer of size 1097208 Bytes :
0000002306060000
00000000038B3A40::Freed GPU buffer of size 846660 Bytes : 00000023059C0000
--------^^^^^^^^ CudaFFTConvolutionImageFilter GenerateData--------
00000000039C01A0::Freed GPU buffer of size 31112 Bytes : 0000002305F60000
00000000039BFED0::Freed GPU buffer of size 9704448 Bytes : 0000002304AC0000
End Ramp Update <<<<<<<<

>>>>>>>> Start BP Update
--------VVVVVVVV CudaFDKBackProjectionImageFilter GenerateData--------
--------^^^^^^^^ CudaFDKBackProjectionImageFilter GenerateData--------
End BP Update <<<<<<<<

===Projection 2:

>>>>>>>> Start Weight Update
00000000036B75C0::UpdateGPUBuffer CPU->GPU data copy
000000000ABF0040->0000002303F20000 : 2418336
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000391D110::Freed GPU buffer of size 4836672 Bytes : 0000002305AC0000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000391CBC0::Allocate Create GPU buffer of size 4836672 Bytes :
0000002304AC0000
00000000036B80C0::UpdateGPUBuffer CPU->GPU data copy
000000000AF50040->0000002304AC0000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
End Weight Update <<<<<<<<

>>>>>>>> Start Ramp Update
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000391CBC0::Freed GPU buffer of size 4836672 Bytes : 0000002304AC0000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000391D020::Allocate Create GPU buffer of size 4836672 Bytes :
0000002304AC0000
00000000036B7CC0::UpdateGPUBuffer CPU->GPU data copy
00000000098E0040->0000002304AC0000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
--------VVVVVVVV CudaFFTConvolutionImageFilter GenerateData--------
00000000039C0420::Allocate Create GPU buffer of size 9704448 Bytes :
0000002304F60000
00000000039C0100::Allocate Create GPU buffer of size 31112 Bytes :
0000002306180000
00000000036B81C0::UpdateGPUBuffer CPU->GPU data copy
000000000A408080->0000002306180000 : 31112
00000000039C0060::Allocate Create GPU buffer of size 1183044 Bytes :
00000023058C0000
00000000039C02E0::Freed GPU buffer of size 1097208 Bytes : 0000002306060000
--------^^^^^^^^ CudaFFTConvolutionImageFilter GenerateData--------
00000000039C0100::Freed GPU buffer of size 31112 Bytes : 0000002306180000
00000000039C0420::Freed GPU buffer of size 9704448 Bytes : 0000002304F60000
End Ramp Update <<<<<<<<

>>>>>>>> Start BP Update
--------VVVVVVVV CudaFDKBackProjectionImageFilter GenerateData--------
--------^^^^^^^^ CudaFDKBackProjectionImageFilter GenerateData--------
End BP Update <<<<<<<<

===Projection 3:

>>>>>>>> Start Weight Update
00000000036B75C0::UpdateGPUBuffer CPU->GPU data copy
000000000ABF0040->0000002303F20000 : 2418336
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000391D020::Freed GPU buffer of size 4836672 Bytes : 0000002304AC0000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
000000000391D110::Allocate Create GPU buffer of size 4836672 Bytes :
0000002305A00000
00000000036B80C0::UpdateGPUBuffer CPU->GPU data copy
0000000013400040->0000002305A00000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
End Weight Update <<<<<<<<

>>>>>>>> Start Ramp Update
00000000036B78C0::UpdateCPUBuffer GPU->CPU data copy
0000002304620000->0000000009010040 : 4836672
000000000391D110::Freed GPU buffer of size 4836672 Bytes : 0000002305A00000
--------VVVVVVVV CudaFDKWeightProjectionFilter GenerateData--------
00000000039C0420::Allocate Create GPU buffer of size 4836672 Bytes :
0000002305A00000
00000000036B7CC0::UpdateGPUBuffer CPU->GPU data copy
00000000098E0040->0000002305A00000 : 4836672
--------^^^^^^^^ CudaFDKWeightProjectionFilter GenerateData--------
--------VVVVVVVV CudaFFTConvolutionImageFilter GenerateData--------
00000000039C0380::Allocate Create GPU buffer of size 9704448 Bytes :
0000002304AC0000
00000000039C0290::Allocate Create GPU buffer of size 31112 Bytes :
0000002305EA0000
00000000036B81C0::UpdateGPUBuffer CPU->GPU data copy
000000000A414080->0000002305EA0000 : 31112
00000000039C0560::Allocate Create GPU buffer of size 1081036 Bytes :
0000002305FA0000
00000000039C0060::Freed GPU buffer of size 1183044 Bytes : 00000023058C0000
--------^^^^^^^^ CudaFFTConvolutionImageFilter GenerateData--------
00000000039C0290::Freed GPU buffer of size 31112 Bytes : 0000002305EA0000
00000000039C0380::Freed GPU buffer of size 9704448 Bytes : 0000002304AC0000
End Ramp Update <<<<<<<<

>>>>>>>> Start BP Update
--------VVVVVVVV CudaFDKBackProjectionImageFilter GenerateData--------
--------^^^^^^^^ CudaFDKBackProjectionImageFilter GenerateData--------
End BP Update <<<<<<<<


Regards,
Chao


2015-06-02 17:43 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:

> Hi Chao,
> It looks bad but I couldn't reproduce the problem. What I did is add
> messages in the GPUGenerateData and launch a very simple sequence of
> command lines:
>   rtksimulatedgeometry -n 8 -o g
>   rtkprojectshepploganphantom -g g -o proj.mha
>   rtkfdk -p . -r proj.mha -o fdk.mha -g g --hardware cuda
> in which case I go only once in each GPUGenerateData. Can you give us
> a command line example where you can see the problem?
> Thanks,
> Simon
>
> On Mon, Jun 1, 2015 at 6:53 PM, Chao Wu <wuchao04 at gmail.com> wrote:
> > Hi all,
> >
> > When testing CUDA-based FDK I found the CudaFDKWeightProjectionFilter
> seems
> > to rerun unnecessarily. Details below:
> >
> > In FDKConeBeamReconstructionFilter<TInputImage, TOutputImage,
> > TFFTPrecision>::GenerateData() the three sub-filters run separately for
> > timing:
> >
> >     m_PreFilterProbe.Start();
> >     m_WeightFilter->Update();
> >     m_PreFilterProbe.Stop();
> >
> >     m_FilterProbe.Start();
> >   m_RampFilter->Update();
> > m_FilterProbe.Stop();
> >
> >     m_BackProjectionProbe.Start();
> >     m_BackProjectionFilter->Update();
> >     m_BackProjectionProbe.Stop();
> >
> > However with some debug procedure I found when executing
> > m_RampFilter->Update() the m_WeightFilter is updated again. Maybe there's
> > something wrong with filter modified time or flags of CPU/GPU buffer?
> >
> > Furthermore, if I remove m_WeightFilter->Update() then both
> m_WeightFilter
> > and m_RampFilter will rerun during update of m_BackProjectionFilter.
> Only if
> > I remove both m_WeightFilter->Update() and m_RampFilter->Update() and let
> > m_BackProjectionFilter execute the whole minipipeline, all filters will
> run
> > nicely only once for each projection subset.
> >
> > The results are identical for all cases I tested.
> >
> > I did not check whether the same issue applies to the CPU or OpenCL
> version.
> >
> > Regards,
> > Chao
> >
> > _______________________________________________
> > Rtk-users mailing list
> > Rtk-users at public.kitware.com
> > http://public.kitware.com/mailman/listinfo/rtk-users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/rtk-users/attachments/20150602/9cd72082/attachment-0002.html>


More information about the Rtk-users mailing list