[vtk-developers] Flying Edges

Fri Sep 11 16:11:17 EDT 2015

FYI- vtkFlyingEdges2D and vtkFlyingEdges3D have been merged into VTK
proper. Thanks to help from Rob, Sujin, Bill L., Utkarsh, Sankhesh and many
others...

We will be presenting a paper at LDAV 2015 <http://www.ldav.org/> in
October. (Ask me for a preview copy of you are interested.)

Benchmarking indicates this is a very fast isocontouring algorithm,
especially if you have multiple cores your machine (without generating
seams or requiring point merging...build VTK with TBB using vtkSMPTools).
So far we've run it on 2048^3 volumes with up to 160 threads, it runs
great. Like any new algorithm I'm sure there are warts and implementation
issues, so I'd enjoy hearing feedback. Maybe someday we'll slip it under
the hood of vtkContourFilter, but I want to make sure it is rock solid
first.

If you are really ambitious, there is room for further improvement in the
implementation including parallelizing the prefix_sum (third pass), and in
the fourth pass (decoupling the topology and geometry generation). I can
tell you more if you are interested.

Best,
W

On Sat, Jan 10, 2015 at 10:04 AM, Will Schroeder <will.schroeder at kitware.com
> wrote:

> Over the holidays I gave myself the gift of designing and implementing a
> new contouring algorithm :-) To my utter delight and amazement I stumbled
> upon a novel approach that is readily parallelizable and yet wicked fast in
> serial mode as well. The algorithm (vtkFlyingEdges2D/vtkFlyingEdges3D) has
> been pushed into gerrit for review if anyone is interested (FlyingEdges
> branch). Consider this a work in progress, we are benchmarking and working
> through parallel implementation details now. We are planning a paper
> submission in the near future.
>
> TLDR Details
> + As many of you know the VTK community is taking aim at seriously
> reworking the system to better support parallel computing. Much of this
> work is being led by Berk Geveci and his team. For example, Berk introduced
> the vtkSMPTools into VTK about a year ago, and we are actively working with
> some big name chip manufacturers, DOE Labs, research organizations, and
> other customers/collaborators to get this ambitious undertaking done.
> Hopefully individuals in this VTK community will also lend their
> significant expertise as well.
> + Under prodding by Berk, and inspired by Patrick O'Leary's "born
> parallel" slogan to rethinking systems and algorithms for parallel
> computing, the motivation was to take one of the most important
> visualization algorithms (isocontouring) and see if we could redesign it
> for emerging parallel approaches.
> + The venerable Marching Cubes algorithm introduced the key idea that the
> topology of an isosurface within a voxel can be captured with an index into
> a simple, finite case table. Yet the implementation of the algorithm can be
> slower than you might expect because, for example, on interior voxels,
> voxel edges can be processed up to four times, and vertices visited eight
> times (ignoring gradient calculations). Also the output mesh is dynamically
> created: a priori the number of output points and triangles is unknown,
> which means that memory resizing (realloc's) are required. Finally, the
> algorithm is often implemented with a "locator" or hash table to prevent
> the creation of duplicate points on shared voxel edges (the so-called point
> merging problem). Not good for parallel computing.
> + Ken Martin's extremely fast Synchronized Templates algorithm introduced
> the notion of a "voxel axes" or "edge triad" which consists of the three
> x-y-z voxel axes located at the voxel origin. This triad is moved through
> the volume in such a way as to intersect each voxel edge only one time
> (with special treatment on the +x+y+z volume boundaries). (An aside: Ken
> wrote this like 15 years ago, it's been the VTK standard ever since.
> Initially we filed a patent application [since abandoned] but that was
> before we realized how stupid it is for an open source company to hold
> patents. Live and learn :-))
> + Flying Edges builds on these concepts and introduces some novel ideas.
> It is a 3-pass algorithm: the first two passes gather information about the
> volume including "intersections" along x-edges, and the number of output
> primitives and points. The final pass generates output into preallocated
> arrays. Some of the key ideas include:
> -- each pass traverses x-rows in the volume completely independently
> meaning that it is readily parallelizable.
> -- the case table is based on the four voxel x-edges (which can be easily
> transformed to and from a vertex-based MC table). Because the cases are
> edge based, the computation of cases is parallel separable.
> -- The novel concept of "computational trimming" is used to avoid massive
> amounts of work. Basically the fact that isosurfaces are topologically
> smooth is combined with the results of the first pass (building
> x-voxel-edge cases) to reason about where the isosurface can exist in the
> volume. This very simple topological reasoning enables the algorithm to
> skip portions of x-rows, entire x-rows, and even x-y slices very rapidly.
> -- Some simple edge metadata is gathered for each x-row volume edge which
> can be used to allocate and partition memory for separate threads to write
> into without contention (i.e., no mutexes or locks). This metadata also
> enables the algorithm to process the voxels along each x-row in such a way
> as to synchronously increment the point ids and triangle ids without
> requiring point merging or hashing. In fact, topology generation (i.e.,
> creating triangles) is completely independent from point/geometry
> generation (i.e., interpolating point coordinates and attributes along
> edges).
> + Early results show a 2-5x speedup over Synchronized Templates (>10x over
> vtkImageMarchingCubes) when running in serial. However, this is very data
> dependent and I suspect compiler dependent since there is a lot of C++
> templating and in-lining going on (make sure to build optimized/release
> mode). Anyway I am very excited because once the parallel loops are
> functioning properly we should see much more performance improvement. The
> scalability may be hampered however because data movement over the memory
> bus may slow things up (computations are relatively modest compared to the
> total data processed). We'll be benchmarking performance over the next
> couple of months to see where we really are and of course we'll update this
> community at that time.
> + The memory overhead of the algorithm is roughly 2-bits per vertex.
> However as implemented now I'm avoiding bit packing and just use a byte
> (unsigned char) per vertex.
> + Other interesting tidbits:
> -- The basic algorithm is extremely simple to implement. Most of the
> complications come from dealing with volume boundaries. It would be
> possible to significantly reduce code complexity by either padding the
> +x+y+z volume boundaries by a layer of voxels, or alternatively not
> processing the outermost layer of voxels.
> -- Because I am lazy, the algorithm reuses the MC case table as enumerated
> in VTK (vtkMarchingCubesTriangleCases.h). However at instantiation the MC
> table is converted to a Flying Edges edge-based case table. Also note that
> the MC table as originally defined is for a left-handed coordinate system
> (see the original paper). So reordering of triangles is required to make
> sure that normals and triangle ordering is consistent.
> -- On our radar is potential vectorization of operation like interpolation
> across voxel edges. As you know some of the big hardware manufacturers are
> really pushing this now and we'd like to learn how to best take advantage
> of this emerging resource.
> -- Probably the biggest downside of this algorithm is that is can produce
> degenerate triangles (i.e., zero-area triangles in 3D, zero-length line
> segments in 2D). The major reason is that when preallocating output memory,
> we do so based on topological evaluation. However, during actual point
> generation in degenerate cases (a degenerate case occurs when one of more
> vertex scalar values == isocontour value) multiple intersection points can
> occur at the vertex producing degeneracies. Since discarding a degenerate
> triangle would mess up the output preallocation we can't just throw away
> degenerate primitives. However I have already thought through a solution to
> this problem but it is based on (spoiler alert) a much more complex case
> table that enumerates 3-values per vertex: (<,==,>) when compared to the
> isocontour value. This is a novel idea as well, mostly the academic
> literature completely ignores the reality of degeneracies. Maybe next
> holiday season we'll slip it into the algorithm.....
>
> Okay I've got to get back to the pointy-haired management stuff. But
> please feel free to provide feedback, or offers of collaboration :-) In
> particular we'd love support, either in funding or brainpower, to do even
> more cool parallel processing stuff. Hoping to hear from you, and I hope
> your coming year is as exciting as mine is looking to be!
>
> Best,
> W
>
> --
> William J. Schroeder, PhD
> Kitware, Inc.
> 28 Corporate Drive
> Clifton Park, NY 12065
> will.schroeder at kitware.com
> http://www.kitware.com
> (518) 881-4902
>

-- 
William J. Schroeder, PhD
Kitware, Inc.
28 Corporate Drive
Clifton Park, NY 12065
will.schroeder at kitware.com
http://www.kitware.com
(518) 881-4902
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20150911/f0a58ffe/attachment.html>