[vtk-developers] vtkCellArray memory layout

Thu Mar 21 08:21:51 EDT 2013

>> On Wed, Mar 20, 2013 at 4:35 PM, Will Schroeder <
will.schroeder at kitware.com> wrote:
>>>
>>> David-
>>>
>>> It's hard to answer this in a short manner which is what I prefer
>>> so please forgive the length of this email. I am not sure what is
>>> motivating this work so it's hard to address what you are
>>> doing. Having said this, I would be very,very careful messing
>>> around with the innards of this class, since it might mean delving
>>> deep into some very nasty algorithms like decimation, etc. which
>>> could get ugly very fast (this may be FUD I'm not sure how hard it
>>> would be without further investigation).

Let me back up a bit -- as Berk hinted at, we're looking at ways to
abstract away the implementation of certain data structures so that
alternate storage for tuple arrays, cells, etc may be used. This is done
with the coprocessing projects in mind, so that simulation codes may
directly use their native data structures with certain components of the
pipeline. By introducing containers that map their native memory layouts
into the vtkDataArray or vtkCellArray format, they can avoid having to
allocate memory for a second copy of the data and save some cycles that
would be needed to reformat the layout.

The trouble I'm having with the vtkCellArray API is that the current
random-access methods are very difficult to map onto any other memory
layout. Since the lookup is done by an index into the mixed array of cell
sizes and cell data, any other storage layout must do some non-trivial work
to translate these indices into something that can be used to find the data
that is requested. Using the vector of vectors idea as an example, one
would need to sum the sizes of each inner vector to find the correct cell.
Something like:

// Return cell id given a vtkCellArray index
vtkIdType translateCellArrayIndex(vtkIdType index)
{
  vtkIdType currentIndex = 0;
  vtkIdType cellIndex = 0;
  while (currentIndex < index)
    {
    currentIndex += 1; // npoints
    // Storage is the vector of vectors, each inner vector stores a cell
    currentIndex += this->Storage[cellIndex++].size(); // cell data
    }
  return cellIndex;
}

This is quite expensive for each index lookup! By modifying the API to use
cellIds for random access instead of these implementation-specific indices,
it becomes much, much easier to find the requested cell. The data structure
I mentioned would allow vtkCellArray to do the same thing it's doing now,
but with the benefit of fast cell-based random access.

>>> First, what you are describing is well known as documented in the
>>> VTK Textbook which I assume you have read. Remember this structure
>>> is almost 20 years old and back in the day the cell array mapped
>>> directly into some display list representations (I believe). We
>>> were supporting OpenGL, plus other graphics systems and this was a
>>> very convenient way to represent primitives like polygons and
>>> triangles, etc. Given this, it was very easy to incrementally add
>>> other data structures to provide random access (cell types) and
>>> links from points back to cells (cell links).
>>>
>>> The other thing driving this structure was the ease/speed of
>>> memory allocation into large contiguous arrays. It was and still
>>> is much faster to allocate, and communicate such structures
>>> (read/write, sockets, etc.).For a comparison look at the ITK mesh
>>> sometime, it's a mess, few use it, and it's very fragmented. VTK
>>> has the virtue of simplicity which should not be
>>> underrated. Sometimes we developers get too caught up in cool
>>> implementation tricks and forget about our users and applications
>>> :-)
>>>
>>> Having said all of this, times change as do underlying
>>> technologies (graphics libraries) and our understanding of
>>> technology and how it is used. VTK was originally envisioned as a
>>> read-only, demonstration system:load data, demonstrate some
>>> algorithms, and visualize. To my astonishment people now want to
>>> use it for crazy stuff like editing meshes :-) I certainly think
>>> some clean up of the API is most likely warranted since it grew
>>> organically over 20 years. And as you point out there are
>>> definitely inefficiencies, and so on.

Ah, ok -- this is what I was curious about. So the current implementation
was chosen as it was efficient for use with fixed-pipeline OpenGL.

>>> If I was going to do something today I would ditch the old stuff
>>> and create a new vtkMesh. As a matter of fact Bob O'Bara and I did
>>> precisely this in a project a while back; it held great promise
>>> and really enabled cool things like editing, full topological
>>> hierarchy representation, etc. It would also be nice to add modern
>>> concepts like iterators, etc. into the mix. But since I don't
>>> quite now what you are trying to accomplish it's hard to make
>>> recommendations.

That does sound nice, but it's far more ambitious than what I'm interested
in at the moment ;-)

On Wed, Mar 20, 2013 at 8:13 PM, Robert Maynard <robert.maynard at kitware.com>
wrote:
> The current layout is high advantageous when you consider a fixed
> pipeline, where you iterate a single array and decide what cell
> types you are drawing based on that single array. If we had a flat
> array of point ids without any cell type we can easily create VBO's
> size based off that and quickly blast the coordinates into that
> without branching.

That's what I was thinking. vtkPolyData for example, already stores
different types of cells in separate arrays: vertices, lines, polygons, and
triangle strips all get their own instance of vtkCellArray. If we separated
out triangles and quads from other polygons, it would be easy on modern GL
implementations to just dump the point index array into a VBO after making
the modifications I mentioned.

> I think a more complicated data structure is required, but dual
> arrays is just the tip of the iceberg. The dual array structure
> still has really high overhead for cell types like LINE, and
> TRIANGLE ( 50% and 33% increase in size ). Currently we generally
> have datasets where cells are inserted into CellArray grouped by
> cell type. So generally people load in all Tets and than all
> Hexahedrons and so forth. This leads me to want to use a data
> structure for cell types that represent contiguous spare ranges, ie
> [0, 300) are triangles, and [300,700] are quads. This reduces the
> memory overhead dramatically when we have chunks of contiguous
> cells.

Hmm...a similar idea could be implemented with the vtkCellArray layout I
proposed. If the vtkCellArray could be restricted to only store a single
type of cell through it's API or subclassing, the offsets array could be
left empty and the cell offsets would be computed as needed. This would
save memory and be VBO friendly as you mentioned...

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20130321/5f418a19/attachment.html>