[vtk-developers] potential speedup for the vtkpolydatatoimagestencil

Mon Jan 23 14:04:16 EST 2012

On Mon, Jan 23, 2012 at 11:19 AM, Mark Roden <mmroden at gmail.com> wrote:
> Hi David,
>
> I'm looking to make the vtkpolydatatoimagestencil class faster.  Right
> now, on a core i7 machine and in using release-compiled code,
> translating a body mask in an rtstruct to pixels using this filter is
> prohibitively expensive in time (upwards of a minute for the mask).
>
> There are three possible speedups to do here, in my mind.  I wanted to
> clear them by you, because I don't want to fork vtk to solve this
> problem, but the problem has become a serious problem for us.
>
> First, treat each plane independently of one another, and then have
> each plane processed by different threads.  This change is fairly
> straightforward theoretically, and something you mentioned you were
> looking into a while back
> (http://www.vtk.org/pipermail/vtkusers/2011-January/114538.html).  Is
> this something you're still investigating?

There were three things that I was considering:
1) multi-threading so that each CPU gets N slices to work on
2) increasing the efficiency of the polygon cutting code, I added some
nice cutting code to vtkClipClosedSurface and planned to eventually
also use it in vtkPolyDataToImageStencil
3) improving the efficiency or extent insertion for the stencils

So far I've only done #3, which was the least important but was the
easiest.  Right now my own apps are bottlenecking on my segmentation
algorithms, rather than on vtkPolyDataToImageStencil, so improving
vtkPolyDataToImageStencil hasn't been a high priority.

> For my work on other projects, I've found that the intel tbb
> (http://threadingbuildingblocks.org/) makes multithreading this kind
> of work very easy; the library is free and works with any c++ project
> that uses the C++0x standard and can use lambda expressions.

VTK will have to continue to support pre-C++0x compilers for a long
time, several more years at least.  So if threading is to be done, it
should be done with VTK's threading classes.

> Second, change the interior while/for loop collision detection to be a
> hash table.  Right now, in the ThreadedExecute function, there is this
> code:
>
>    for (vtkIdType i = 0; i < numberOfPoints; ++i)
>      {
> ...
>      while( lines->GetNextCell(npts, pointIds) )
>        {
>        for (vtkIdType j = 0; j < npts; ++j)
>          if ( pointIds[j] == i )
>            {
>
> But what if collision detection was changed to uses a hash table where
> the hashing function automatically detected point collisions through a
> single pass through the data?

I use a hash vtkClipClosedSurface to accelerate the clipping (i.e. #2
on my to-do list above).  Take a look at the vtkClipClosedSurface
code, specifically the vtkCCSEdgeLocator class.

> Third, there does not appear to be an iterator over the points vector,
> but a Get and Set function.  These functions appear to be pretty slow,
> and go through several thunking layers before data can actually be set
> or not.  Is there an iterator class for points as there is for lines?
> If not, how hard would it be to create such a class?

The vtkPoints::GetData() method returns an array (either a
vtkFloatArray or a vtkDoubleArray) that contains the points.  Once you
have this array, the GetTupleValue() method is a purely inlined method
that can be used to efficiently get the points.  If you know ahead of
time whether the points are double or float, then this is probably the
most efficient way of accessing them, apart from getting the raw
"float *" or "double *".

 - David