ITK/Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14: Difference between revisions

From KitwarePublic
Jump to navigationJump to search
(Created page with "Level Set And GPU Meeting on Oct 14 2010, 3-4 pm Cambridge, MA Attendees: Arnaud Gelas Kishore Mosaliganti Won-Ki Jeong Levelset implementation can happen with GPU involvemen...")
 
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
Level Set And GPU
= Level Set And GPU =


Meeting on Oct 14 2010, 3-4 pm
Meeting on Oct 14 2010, 3-4 pm
Line 5: Line 5:


Attendees:
Attendees:
Arnaud Gelas
Arnaud Gelas,
Kishore Mosaliganti
Kishore Mosaliganti,
Won-Ki Jeong
Won-Ki Jeong




Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function.
Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function.
while( Stopping Criterion Is Not Satisfied )
{
    for all level set ls_i in the level set container
    {
      for all pixels in the domain of ls_i
      {
          Compute all speed terms // iterate on the Term Container
          Evaluate the updated levelset function
          Compute time step from CFL Condition
      }
    Reinitialize to signed distance function (if requested by user)
    }
}


<pre>
(a) while( Stopping Criterion Is Not Satisfied )
(b) {
(c)    for all level set ls_i in the level set container
(d)    {
(e)      for all pixels in the domain of ls_i
(f)      {
(g)          Compute all speed terms // iterate on the Term Container
(h)          Evaluate the updated levelset function
(i)          Compute time step from CFL Condition
(j)      }
(k)    Reinitialize to signed distance function (if requested by user)
(l)    }
(m)}
</pre>


We list out 3 scenarios here where GPU can be used:
We list out 3 scenarios here where GPU can be used:
GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance.
* GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance.
The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory.
Downside:
memory limitation of the GPU (<2 Gb)
Code duplication: CPU and GPU
Note: Copy b/w memory 4Gb/s
Fastest solution in terms of performance!!!
In the last scenario, the code nesting is different:


* The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory.
** Downside:
*** memory limitation of the GPU (<2 Gb)
*** Code duplication: CPU and GPU
*** Note: Copy b/w memory 4Gb/s
*** Fastest solution in terms of performance!!!
* In the last scenario, the code nesting is different:
<pre>
     for all level set ls_i in the level set container
     for all level set ls_i in the level set container
     {
     {
Line 43: Line 46:
{
{
      Evaluate the update
      Evaluate the update
                  Compute time step from CFL Condition for this term
              Compute time step from CFL Condition for this term
}
}
       }
       }
Line 49: Line 52:
       Reinitialize to signed distance function (if requested by user)
       Reinitialize to signed distance function (if requested by user)
     }
     }
</pre>
In this one the GPU Implementation will occur for the most nested for loop.
In this one the GPU Implementation will occur for the most nested for loop.
Keep copying the level set and image in each iteration in the GPU
** Keep copying the level set and image in each iteration in the GPU
Second most optimal implementation for GPU
** Second most optimal implementation for GPU
But the good point: there is no code duplication!!!
** But the good point: there is no code duplication!!!

Latest revision as of 16:00, 9 December 2011

Level Set And GPU

Meeting on Oct 14 2010, 3-4 pm Cambridge, MA

Attendees: Arnaud Gelas, Kishore Mosaliganti, Won-Ki Jeong


Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function.

(a) while( Stopping Criterion Is Not Satisfied )
(b) {
(c)    for all level set ls_i in the level set container
(d)    {
(e)       for all pixels in the domain of ls_i
(f)       {
(g)           Compute all speed terms // iterate on the Term Container
(h)           Evaluate the updated levelset function
(i)           Compute time step from CFL Condition
(j)       }
(k)    Reinitialize to signed distance function (if requested by user)
(l)    }
(m)}

We list out 3 scenarios here where GPU can be used:

  • GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance.
  • The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory.
    • Downside:
      • memory limitation of the GPU (<2 Gb)
      • Code duplication: CPU and GPU
      • Note: Copy b/w memory 4Gb/s
      • Fastest solution in terms of performance!!!
  • In the last scenario, the code nesting is different:
    for all level set ls_i in the level set container
    {
      for all terms in the term container 
       {
	for all pixels in the domain of ls_i
	{
	      Evaluate the update
              Compute time step from CFL Condition for this term
	}
      }
      Evaluate the updated levelset function
      Reinitialize to signed distance function (if requested by user)
    }

In this one the GPU Implementation will occur for the most nested for loop.

    • Keep copying the level set and image in each iteration in the GPU
    • Second most optimal implementation for GPU
    • But the good point: there is no code duplication!!!