[Insight-developers] transform internal types

Fri May 28 08:41:18 EDT 2010

Hi Luis,

I see that my more fundamental arguments don't make much of an impression.

Regarding your more pragmatic questions:

> I agree with you in that it is a waste of resources.
> It is however an insignificant waste of resources.
> 
> ...
> 
> How many milliseconds are we talking about ?
> 
> ...
> 
> How many milliseconds does it take to compute the 
> registration in the GPU ?
> 

We just finished implementing it. For 3D images of size about 440^3, using the MSD metric, 2000 samples from the fixed image to compute its derivative, standard gradient descent, B-spline transform, we have the following scheme:

1 convert transform parameters vector mu_k to float
2 copy mu_k to GPU
3 compute d MSD / d mu and take an optimisation step resulting in mu_{k+1}
4 copy mu_{k+1} to CPU
5 convert mu_{k+1} to double

And a second scheme that does not need to do step 1 and 5.

Experiments were performed on a Intel Xeon W3520 @ 2.66 GHz and Nvidia Geforce GTX 285,
Windows 7, 64 bit.

For the first scheme with M the size of mu and the reported values time in micro-seconds:

step  M1=100,000    M2=650,000
1+2:    1096.69      3243.99
3  :    3899.09      6374.7
4+5:    2315.11      6011.83

And the second scheme:

step  M1=100,000    M2=650,000
2  :     726.14      1834.73
3  :    3988.17      6262.58
4  :    2112.36      4952.77

Which means that:

	mili-seconds percentage
	M1    M2    M1    M2
cast	0.6	2.5	8%	16%
iter	3.9	6.3	54%	41%
copy	2.8	6.8	39%	44%
total	7.4	15.6	

So, casting back and forth to floats takes 0.6 ms = 16% of the computation time. The GetValueAndDerivative() operation on the GPU takes 41% of the time. And transfering the data back and forth is the bottle neck with 44%. Casting compared to everything neglecting data transfer is 28% of the total time.

Lessons learned:
  Casting is not neglectible.
  Data transfer between CPU and GPU is not a good idea.

Cheers,

Marius