Fri Oct 24 13:02:35 EDT 2014
slowdown of VTK when used to render very large numbers (100's to 1000's)
of actors. Last year, Bob O'Bara created a benchmark program that
demonstrates this effect quantitatively. This program showed, for
example, that rendering 1000 actors, each with one geometric primitive
(we use a cube), runs 15 times slower than rendering a single actor with
1000 geometric primitives. In both cases, the actual geometry rendered
OpenGL, and the resulting images, are identical, yet the 1000-actor case
takes 15X more time.
In the past few weeks, by making a number of modifications to VTK 3.2
(using the May 11 nightly build) we have reduced this difference in
rendering times from a factor of 15X to 2X. From this work, we have
identified many issues that influence rendering speed for large numbers
of actors, and would like to share our major findings with the
developers. Our goal is to stimulate discussion and suggest possible
enhancements to VTK that would benefit applications using large numbers
of actors. Following is a list of the major issues we discovered and the
code modifications we used to improve rendering speed. We very much
like to see these improvements (or equivalent improvements based on
suggestions of the "right way" to do some of these things) incorporated
Issue 1: Numerous OpenGL "state" calls are made by the
vtkOpenGLProperty::Render() method. At least 12 gl* calls are made for
each actor every frame. (This was discussed in the vtk-developers list
beginning May 16.)
Action: We modified the vtkOpenGLProperty::Render() method to call new
methods in the vtkOpenGLRenderer class instead of calling OpenGL
directly. The new methods in vtkOpenGLRenderer store and check the last
OpenGL settings for things such as material colors, shade model, and
point size, and only make the gl* call if there's a change.
Issue 2: glMultMatrixd() is called by vtkOpenGLActor::Render(). In the
general case, this is needed to render the actor in the right place. In
our applications, however, the actors nearly always have an identity
matrix, making the glMultMatrixd() call unnecessary. (Mark Beall posted
this to the developers list on May 17, but there were no responses.)
Action: We added an "identity" flag to the vtkMatrix4x4 class to keep
track of its state. It's not ideal since the matrix elements in this
class are public (i.e., someone can change the matrix without the class,
or our identity flag, knowing about it). We also, of course, modified
vtkOpenGLActor::Render() method to check the matrix identity flag before
Issue 3: The vtkFieldData::GetMTime() creates an iterator even when it
contains no data.
Action: We added a line to check if vtkFieldData::NumberOfArrays is
greater than zero before instantiating a vtkFieldData::Iterator.
Issue 4: The vtkActor::GetBounds() method traverses the VTK pipeline
checking modified times ad infinitum. With 1000 actors and 120 frames,
this means 120,000 calls to vtkActor::GetMTime(),
vtkPolyDataMapper::GetMTime(), vtkDataObject::GetMTime(), and (our
from Issue 3) vtkFieldData::GetMTime().
Action: Since we almost never modify our actors (especially their
geometry), we added a new protected member called "UpdateMode" to
vtkProp3D. It is used to encode 1 of 3 possible modes:
In VTK_UPDATE_DEFAULT mode, the actor behaves in VTK's normal way. The
VTK_UPDATE_ONCE mode configures the actor to update one time only, and
the VTK_UPDATE_DONE mode indicates not to perform the update, but
use the results from the previous/initial update.
Issue 5: The vtkPolyDataMapper::RenderPiece() method traverses the VTK
pipeline checking modification times.
Action: Using our new vtkProp3D::UpdateMode, we bypassed nearly all of
the logic in the RenderPiece method for VTK_UPDATE_DONE actors.
Issue 6: The vtkPolyDataMapper::RenderPiece() method uses the vtkTimer
measure rendering time.
Action: In VTK_UPDATE_DONE mode, we bypass the timer, using the result
computed in the previous/initial render.
Issue 7: The vtkProp3D::GetMatrix() method traverses the VTK pipeline
checking modification times, and then calls vtkMatrix4x4::DeepCopy().
Note that in this case the DeepCopy is just copying the matrix to
Action: Although we don't know why the DeepCopy call is in there, we
it out and now return a pointer to the vtkProp3D::Matrix data member.
Issue 8: The vtkOpenGLPolyDataMapper::Draw() method puts 3 OpenGL calls
-- glDisable(GL_COLOR_MATERIAL), glDisable(GL_LIGHTING), and
glEnable(GL_LIGHTING) -- in every display list.
Action: For the GL_COLOR_MATERIAL case, we used the same workaround as
Issue #1 -- adding new logic in the vtkOpenGLRenderer class to check the
OpenGL setting before making the gl* call. The GL_LIGHTING calls were a
bit more interesting. The Draw() method disables lighting for the case
when there are no normals available for either lines or vertices (and
then re-enables lighting for surfaces). We modified the code to simply
check for the presence of lines or vertices first, before worrying about
normals. Since our test case actors don't display lines or vertices,
kept the GL_LIGHTING calls out of our display lists.
Note: This suggest that, in order to get high throughput with large
numbers of actors, you cannot mix and match lines and surfaces randomly,
but will be much better off grouping actors with lines only and surfaces
Issue 9: The vtkOpenGLActor::Render() calls glDepthMask(), to either set
or clear it depending on the actor opacity.
Action: We added Enable/Disable methods to the vtkRenderer that check
OpenGL state before making the gl* call, in the same way that we
the gl* calls in the vtkProperty::Render() method (Issue #1).
Issue 10: The vtkRenderer::Render() method allocates and destroys 3
arrays (PropArray, RayCastPropArray, and RenderIntoImagePropArray) each
Action: We took a shortcut and only destroy the arrays if an actor is
added or removed between frames. In the general case, we'd suggest
reallocating the arrays only if the number of actors has increased since
the previous render.
Issue 11: The vtkOpenGLPolyDataMapper::RenderPiece() method calls
MakeCurrent(), which in turn calls glXGetCurrentContext(). This was
discussed in the vtk-developers list beginning May 28.
Action: Nothing was done for this one.
Issue 12: The vtkRenderer::RenderOverlay() method ends up calling
vtkProp::SafeDownCast() for each actor every frame. This is a bit
frustrating since, as we understand it, none of our actors contribute
anything to the RenderOverlay method.
Action: Nothing has been done yet, but one suggestion is to let the
application software enable/disable the RenderOverlay method.
Another approach would be to change how the renderer stores the props.
Right now everything is stored in one big list. It seems it would be
possible to divide this up into multiple lists based on what the type of
the prop is. For example, when the prop is added to the renderer, the
SafeDownCast call could be made and it could be put in the appropriate
list. This would also solve Issue 13 below.
Issue 13: The vtkRenderer::Render() ends up calling
vtkProp::SafeDownCast() for each actor every frame.
Action: Nothing has been done yet.
Issue 14: The vtkFrustumCoverageCuller::Cull() processes the same actor
geometry every frame. Our actors have very simple geometry, and culling
may not be of any benefit except when zoomed in very close.
Action: Nothing has been done yet, but perhaps making culling an option,
either on an actor-by-actor case or for the whole frame, is suggested.
Issue 15: Using Quantify to measure CPU time, we have observed that the
glFlush() calls for the 1000-actor case consume twice as many CPU cycles
as the 1-actor case (82.1 million vs. 42.3 million). At the bottom of
glFlush() call hierarchy is the system writev() function. For the
case, each call to writev takes an average of 338K cycles, whereas for
the 1000-actor case, each call takes an average of 664K cycles.
Action: Nothing has been done, other than to verify that the sequence of
gl* calls and display lists are equivalent for both test cases.
1223 Peoples Avenue
Troy, NY 12180
mailto: johnt at simmetrix.com
More information about the vtk-developers