ITK/Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14: Difference between revisions
From KitwarePublic
Jump to navigationJump to search
Arnaudgelas (talk | contribs) No edit summary |
Daviddoria (talk | contribs) m (moved ITK Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14 to ITK/Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14) |
||
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
Level Set And GPU | = Level Set And GPU = | ||
Meeting on Oct 14 2010, 3-4 pm | Meeting on Oct 14 2010, 3-4 pm | ||
Line 12: | Line 12: | ||
Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function. | Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function. | ||
while( Stopping Criterion Is Not Satisfied ) | <pre> | ||
{ | (a) while( Stopping Criterion Is Not Satisfied ) | ||
(b) { | |||
(c) for all level set ls_i in the level set container | |||
(d) { | |||
(e) for all pixels in the domain of ls_i | |||
(f) { | |||
(g) Compute all speed terms // iterate on the Term Container | |||
(h) Evaluate the updated levelset function | |||
(i) Compute time step from CFL Condition | |||
(j) } | |||
(k) Reinitialize to signed distance function (if requested by user) | |||
} | (l) } | ||
(m)} | |||
</pre> | |||
We list out 3 scenarios here where GPU can be used: | We list out 3 scenarios here where GPU can be used: | ||
GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance. | * GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance. | ||
* The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory. | |||
** Downside: | |||
*** memory limitation of the GPU (<2 Gb) | |||
*** Code duplication: CPU and GPU | |||
*** Note: Copy b/w memory 4Gb/s | |||
*** Fastest solution in terms of performance!!! | |||
* In the last scenario, the code nesting is different: | |||
<pre> | |||
for all level set ls_i in the level set container | for all level set ls_i in the level set container | ||
{ | { | ||
Line 43: | Line 46: | ||
{ | { | ||
Evaluate the update | Evaluate the update | ||
Compute time step from CFL Condition for this term | |||
} | } | ||
} | } | ||
Line 49: | Line 52: | ||
Reinitialize to signed distance function (if requested by user) | Reinitialize to signed distance function (if requested by user) | ||
} | } | ||
</pre> | |||
In this one the GPU Implementation will occur for the most nested for loop. | In this one the GPU Implementation will occur for the most nested for loop. | ||
Keep copying the level set and image in each iteration in the GPU | ** Keep copying the level set and image in each iteration in the GPU | ||
Second most optimal implementation for GPU | ** Second most optimal implementation for GPU | ||
But the good point: there is no code duplication!!! | ** But the good point: there is no code duplication!!! |
Latest revision as of 16:00, 9 December 2011
Level Set And GPU
Meeting on Oct 14 2010, 3-4 pm Cambridge, MA
Attendees: Arnaud Gelas, Kishore Mosaliganti, Won-Ki Jeong
Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function.
(a) while( Stopping Criterion Is Not Satisfied ) (b) { (c) for all level set ls_i in the level set container (d) { (e) for all pixels in the domain of ls_i (f) { (g) Compute all speed terms // iterate on the Term Container (h) Evaluate the updated levelset function (i) Compute time step from CFL Condition (j) } (k) Reinitialize to signed distance function (if requested by user) (l) } (m)}
We list out 3 scenarios here where GPU can be used:
- GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance.
- The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory.
- Downside:
- memory limitation of the GPU (<2 Gb)
- Code duplication: CPU and GPU
- Note: Copy b/w memory 4Gb/s
- Fastest solution in terms of performance!!!
- Downside:
- In the last scenario, the code nesting is different:
for all level set ls_i in the level set container { for all terms in the term container { for all pixels in the domain of ls_i { Evaluate the update Compute time step from CFL Condition for this term } } Evaluate the updated levelset function Reinitialize to signed distance function (if requested by user) }
In this one the GPU Implementation will occur for the most nested for loop.
- Keep copying the level set and image in each iteration in the GPU
- Second most optimal implementation for GPU
- But the good point: there is no code duplication!!!