No subject

Fri Oct 24 13:02:35 EDT 2014

slowdown of VTK when used to render very large numbers (100's to 1000's) 
of actors. Last year, Bob O'Bara created a benchmark program that 
demonstrates this effect quantitatively. This program showed, for 
example, that rendering 1000 actors, each with one geometric primitive 
(we use a cube), runs 15 times slower than rendering a single actor with 
1000 geometric primitives. In both cases, the actual geometry rendered
by 
OpenGL, and the resulting images, are identical, yet the 1000-actor case 
takes 15X more time.

In the past few weeks, by making a number of modifications to VTK 3.2 
(using the May 11 nightly build) we have reduced this difference in 
rendering times from a factor of 15X to 2X. From this work, we have 
identified many issues that influence rendering speed for large numbers 
of actors, and would like to share our major findings with the 
developers. Our goal is to stimulate discussion and suggest possible 
enhancements to VTK that would benefit applications using large numbers 
of actors. Following is a list of the major issues we discovered and the 
code modifications we used to improve rendering speed. We very much
would 
like to see these improvements (or equivalent improvements based on 
suggestions of the "right way" to do some of these things) incorporated 
into VTK.

Issue 1: Numerous OpenGL "state" calls are made by the 
vtkOpenGLProperty::Render() method. At least 12 gl* calls are made for 
each actor every frame. (This was discussed in the vtk-developers list 
beginning May 16.) 

Action: We modified the vtkOpenGLProperty::Render() method to call new 
methods in the vtkOpenGLRenderer class instead of calling OpenGL 
directly. The new methods in vtkOpenGLRenderer store and check the last 
OpenGL settings for things such as material colors, shade model, and 
point size, and only make the gl* call if there's a change. 

Issue 2: glMultMatrixd() is called by vtkOpenGLActor::Render(). In the 
general case, this is needed to render the actor in the right place. In 
our applications, however, the actors nearly always have an identity 
matrix, making the glMultMatrixd() call unnecessary. (Mark Beall posted  
this to the developers list on May 17, but there were no responses.) 

Action: We added an "identity" flag to the vtkMatrix4x4 class to keep 
track of its state. It's not ideal since the matrix elements in this 
class are public (i.e., someone can change the matrix without the class, 
or our identity flag, knowing about it). We also, of course, modified
the 
vtkOpenGLActor::Render() method to check the matrix identity flag before 
calling glMultMatrixd(). 

Issue 3: The vtkFieldData::GetMTime() creates an iterator even when it 
contains no data. 

Action: We added a line to check if vtkFieldData::NumberOfArrays is 
greater than zero before instantiating a vtkFieldData::Iterator. 

Issue 4: The vtkActor::GetBounds() method traverses the VTK pipeline 
checking modified times ad infinitum. With 1000 actors and 120 frames, 
this means 120,000 calls to vtkActor::GetMTime(), 
vtkPolyDataMapper::GetMTime(), vtkDataObject::GetMTime(), and (our
friend 
from Issue 3) vtkFieldData::GetMTime(). 

Action: Since we almost never modify our actors (especially their 
geometry), we added a new protected member called "UpdateMode" to 
vtkProp3D. It is used to encode 1 of 3 possible modes: 

        VTK_UPDATE_DEFAULT 
        VTK_UPDATE_ONCE 
        VTK_UPDATE_DONE 

In VTK_UPDATE_DEFAULT mode, the actor behaves in VTK's normal way. The 
VTK_UPDATE_ONCE mode configures the actor to update one time only, and 
the VTK_UPDATE_DONE mode indicates not to perform the update, but
instead 
use the results from the previous/initial update. 

Issue 5: The vtkPolyDataMapper::RenderPiece() method traverses the VTK 
pipeline checking modification times. 

Action: Using our new vtkProp3D::UpdateMode, we bypassed nearly all of 
the logic in the RenderPiece method for VTK_UPDATE_DONE actors. 

Issue 6: The vtkPolyDataMapper::RenderPiece() method uses the vtkTimer
to 
measure rendering time. 

Action: In VTK_UPDATE_DONE mode, we bypass the timer, using the result 
computed in the previous/initial render. 

Issue 7: The vtkProp3D::GetMatrix() method traverses the VTK pipeline 
checking modification times, and then calls vtkMatrix4x4::DeepCopy(). 
Note that in this case the DeepCopy is just copying the matrix to
itself.

Action: Although we don't know why the DeepCopy call is in there, we
took 
it out and now return a pointer to the vtkProp3D::Matrix data member. 

Issue 8: The vtkOpenGLPolyDataMapper::Draw() method puts 3 OpenGL calls 
-- glDisable(GL_COLOR_MATERIAL), glDisable(GL_LIGHTING), and 
glEnable(GL_LIGHTING) -- in every display list. 

Action: For the GL_COLOR_MATERIAL case, we used the same workaround as
in 
Issue #1 -- adding new logic in the vtkOpenGLRenderer class to check the 
OpenGL setting before making the gl* call. The GL_LIGHTING calls were a 
bit more interesting. The Draw() method disables lighting for the case 
when there are no normals available for either lines or vertices (and 
then re-enables lighting for surfaces). We modified the code to simply 
check for the presence of lines or vertices first, before worrying about 
normals. Since our test case actors don't display lines or vertices,
this 
kept the GL_LIGHTING calls out of our display lists. 

Note: This suggest that, in order to get high throughput with large 
numbers of actors, you cannot mix and match lines and surfaces randomly, 
but will be much better off grouping actors with lines only and surfaces 
only. 

Issue 9: The vtkOpenGLActor::Render() calls glDepthMask(), to either set 
or clear it depending on the actor opacity. 

Action: We added Enable/Disable methods to the vtkRenderer that check
the 
OpenGL state before making the gl* call, in the same way that we
replaced 
the gl* calls in the vtkProperty::Render() method (Issue #1). 

Issue 10: The vtkRenderer::Render() method allocates and destroys 3 
arrays (PropArray, RayCastPropArray, and RenderIntoImagePropArray) each 
frame. 

Action: We took a shortcut and only destroy the arrays if an actor is 
added or removed between frames. In the general case, we'd suggest 
reallocating the arrays only if the number of actors has increased since 
the previous render. 

Issue 11: The vtkOpenGLPolyDataMapper::RenderPiece() method calls 
MakeCurrent(), which in turn calls glXGetCurrentContext(). This was 
discussed in the vtk-developers list beginning May 28. 

Action: Nothing was done for this one. 

Issue 12: The vtkRenderer::RenderOverlay() method ends up calling 
vtkProp::SafeDownCast() for each actor every frame. This is a bit 
frustrating since, as we understand it, none of our actors contribute 
anything to the RenderOverlay method. 

Action: Nothing has been done yet, but one suggestion is to let the 
application software enable/disable the RenderOverlay method. 
Another approach would be to change how the renderer stores the props. 
Right now everything is stored in one big list. It seems it would be 
possible to divide this up into multiple lists based on what the type of 
the prop is. For example, when the prop  is added to the renderer, the 
SafeDownCast call could be made and it could be put in the appropriate 
list. This would also solve Issue 13 below.

Issue 13: The vtkRenderer::Render() ends up calling 
vtkProp::SafeDownCast() for each actor every frame. 

Action: Nothing has been done yet. 

Issue 14: The vtkFrustumCoverageCuller::Cull() processes the same actor 
geometry every frame. Our actors have very simple geometry, and culling 
may not be of any benefit except when zoomed in very close. 

Action: Nothing has been done yet, but perhaps making culling an option, 
either on an actor-by-actor case or for the whole frame, is suggested. 

Issue 15: Using Quantify to measure CPU time, we have observed that the 
glFlush() calls for the 1000-actor case consume twice as many CPU cycles 
as the 1-actor case (82.1 million vs. 42.3 million). At the bottom of
the 
glFlush() call hierarchy is the system writev() function. For the
1-actor 
case, each call to writev takes an average of 338K cycles, whereas for 
the 1000-actor case, each call takes an average of 664K cycles. 

Action: Nothing has been done, other than to verify that the sequence of 
gl* calls and display lists are equivalent for both test cases.

John Tourtellott
-------------------------------------------------------
Simmetrix Inc.
1223 Peoples Avenue
Troy, NY  12180
voice:  518-276-2728
fax:    518-276-2944
mailto: johnt at simmetrix.com
-------------------------------------------------------