# ParaView/Line Integral Convolution

## Introduction

 Figure 1 Examples of ParaView's surface LIC in action. Figure 1a Surface LIC showing coherent structures in the magnetic fields of turbulent plasma. Figure 1b Surface LIC applied to a multiblock CFD simulation of a NASA launch vehicle with block bounds colored by composite id. Detail view of figure 11a. Figure 1c Surface LIC applied to a contour of a CFD simulation of air flow in an office.

The line integral convolution(LIC) vector field visualization technique convolves noise with a vector field producing streaking patterns that follow vector field tangents. Originally the technique was developed for use with 2D image based data but has since been extended to work on arbitrary surfaces and volumes. ParaView supports LIC on arbitrary surfaces via the Surface LIC plugin. Some examples of ParaView's surface LIC in action are shown in Figure 1. ParaView's implementation has been designed to work with composite data in parallel and includes a number of customizations that facilitate interactive data exploration. This document describes the parallelization and features and optimizations facilitating interactive data exploration that were released with ParaView 4.1 and VTK 6.1.

## ParaView Surface LIC Plugin

Figure 2 Run time tunable parameters for the surface LIC plugin in ParaView's display Properties panel.

 Integrator Select Input Vectors This is used to select the vector field. Number Of Steps Number of integration steps used in the first LIC pass. When the two pass LIC is enabled, the number of steps used in the second pass is set automatically. Step Size Integration step size given in the original vector field's units. Normalize Vectors Enable vector field normalization during integration. Two pass image LIC Enable the two-pass image LIC algorithm. Scalar Color Shaders Color Mode Select the shader that is used to combine lit and pseudocolored surface geometry with the LIC. (Blend, Multiply) LIC Intensity Sets the intensity for LIC pattern when using Blend shader. Map Mode Bias Sets the additive term used to brighten or darken the final colors when using the Multiply shader. Contrast Enhancement Enhance Contrast Enable contrast enhancement shaders (Off, LIC Only, LIC and Color, Color Only) Low LIC Contrast Enhancement Factor Adjusts the minimum intensity value realized in the output of the image LIC contrast enhancement. This can be used to make the dark portions of LIC streaks darker when more contrast is needed. High LIC Contrast Enhancement Factor Adjusts the maximum intensity value realized in the output of the image LIC contrast enhancement. This can be used to make light portions of LIC streaks lighter when more contrast is needed or when pseudocoloring is dark. Low Color Contrast Enhancement Factor Adjusts the minimum lightness value realized in the output of the surface LIC painter color contrast enhancement. Can be used to darken low intensity colors when more contrast is needed. High Color Contrast Enhancement Factor Adjusts the maximum lightness value realized in the output of the surface LIC painter color contrast enhancement. Can be used to brighten high intensity colors when pseduocoloring is dark. AntiAlias Sets the number of times the image LIC antialiasing stage is applied. 0 disables the antialiasing stage. Fragment masking Mask On Surface Use the magnitude of the surface projected vectors for the masking test. If the vector field has an out-of-surface component disabling Mask On Surface will result in masked fragments matching the pseudocoloring. Mask Threshold Vector magnitude below which fragments are masked. If set less than zero masking is disabled. Mask Intensity The fraction of Mask Color to blend with lit and scalar colored surface geometry in the place of the LIC where the threshold criteria is satisfied. Mask Color An RGB tuple defining the color to use when masking fragments. Noise texture generator Noise Type Select the noise distribution or type. (uniform, Gaussian, Perlin). Noise Texture Size Set the dimension of one side of the square noise texture. For example a setting of 128 results in a ${\displaystyle 128^{2}}$ pixel noise texture. Large values may negatively impact performance. Noise Grain Size Set the dimension of one side of a texture noise element. For example if set to 2 then each noise element takes up ${\displaystyle 2^{2}}$ pixels in the noise texture. Min Noise Value Set the minimum gray scale intensity value in the generated noise. Max Noise Value Set the maximum gray scale intensity value in the generated noise. Number Of Noise Levels Set the number of discrete gray scale intensity values in the generated noise. Impulse Noise Probability Set the probability that any given noise element in the generated texture will be filled in by the noise generator. A setting of 1 results in all elements filled while settings less than 1 produce impulse noise where elements not filled take on a background color. Impulse Noise Background Value Set the gray scale intensity to be used when a noise element is not filled when generating impulse noise. Noise Generator Seed A seed value for the random number generator. Parallelization Composite Strategy Select the compositing strategy used for parallel operations (INPLACE, DISJOINT, BALANCED, AUTO). The default, AUTO, will select a strategy based on the input data an view parameters. Interactivity Use LIC For LOD When enabled LIC is computed on LOD during interaction. This can negatively impact performance.

## OpenGL requirements

ParaView implements surface LIC in OpenGL using a framebuffer ping-pong technique from early GPGPU computing. The required OpenGL extensions are listed in the following table. These requirements are satisfied for most OpenGL 2.1 implementations and all OpenGL 3.0 and newer implementations. The surface LIC painter may be used without graphics hardware by using the llvmpipe OSMesa state tracker found in Mesa3D OpenGL v9.2 and newer. See ParaView and Mesa3D for information on configuring ParaView for use with OSMesa. The OpenGL support on a given system may be checked using either ParaView's or VTK's regression tests. ParaView's LIC tests can be executed by issuing the command "ctest -R SurfaceLIC -L PARAVIEW --verbose" while VTK's LIC tests can be executed by issuing the command "ctest -R LIC --verbose" from their respective build directories. The output of these tests will indicate whether or not the system supports the required extensions.

 OpenGL extensions used by the surface LIC painter GL_ARB_vertex_buffer_object, GL_ARB_pixel_buffer_object, GL_ARB_depth_buffer_float, GL_ARB_multitexture, GL_EXT_texture3D, GL_ARB_texture_non_power_of_two, GL_ARB_texture_float, GL_ARB_depth_texture, GL_ARB_draw_buffers, GL_EXT_framebuffer_object, GL_EXT_framebuffer_blit, GL_ARB_shading_language_100, GL_ARB_shader_objects, GL_ARB_vertex_shader, GL_ARB_fragment_shader

## Internal Pipeline

 Figure 3 Pipeline internals. Figure 3a (left) Surface LIC pipeline. Figure 3b (right) Internal image LIC pipeline.

The surface LIC algorithm projects vectors defined on an arbitrary surface onto the surface and then from physical space into screen space where an image LIC is computed. During the projection phase lit pseudocolored surface geometry is rendered into a texture for later use. When running in parallel moving the vector field into screen space necessitates a compositing step that makes the screen space vector field consistent in regions of interprocess screen space overlap, and the addition of guard pixel halos ensuring consistent results at process boundaries. Once the image space LIC computation is made it's combined with the previously rendered lit pseudocolored surface geometry and copied into the back buffer with a depth test. A schematic of the algorithm is presented in figure 3a. Optional processing stages are shaded gray, cached textures are represented by red parallelograms, and green double arrows indicate inter-process communication that occurs only during parallel operation. On the right half of the figure a break-out diagram detailing the processing stages used in our image LIC algorithm are shown.

## Noise generator

 Figure 4. LIC of ion velocity colored by velocity magnitude. The difference between the left an right panels is achieved by varying the impulse probability and noise grain parameters of ParaView's noise generator. Fragment masking was applied where ${\displaystyle |V_{i}|<=1e^{-5}}$ to mask unconvolved fragments in regions of flow stagnation.

For a given vector dataset the streaking patterns produce by the LIC can vary widely based on the properties of the noise convolved, screen resolution, and scene or view parameters. The properties of the noise play an important role in determining the characteristics of streaking patterns and can be varied easily giving us a simple means for controlling streaking patterns realized. For example, contrast in the streaking patterns are strongly influenced by the selection of noise distribution, and its minimum, maximum, and number of noise levels, while the width of the streaks produced is strongly tied to the noise grain size. Additionally, the number of light and dark pixels in the result can be controlled by varying the impulse probability along with the choice of background intensity. Figure 4 shows an example of how varying noise texture parameters can result in markedly different the streaking patterns. ParaView's noise texture generator defines the following 9 run-time tunable degrees of freedom, which together can be used to modify streaking pattern, dynamic range, and contrast in the resulting LIC.

 Noise generator parameters Noise type Select a statistical distribution or type of noise to be generated. By default Gaussian noise is generated. However, uniformly distributed noise, or Perlin noise can also be generated. The choice of noise distribution impacts the contrast and dynamic range in the resulting LIC. Texture size This parameter controls the size of the square noise texture in each direction. In the case of Perlin noise the texture size is adjusted to the nearest power of 2. Typically a sizes in the range of 128 to 512 work well. Note that very large sizes can reduce GPU performance. Grain size Select the number of pixels in each direction that each generated noise element fills in the resulting texture. For Perlin noise this sets the size of the largest scale, and must be a power of 2. Grain size can be used to control the LIC streak size. Min value This parameter sets the lowest gray scale value realizable in the generated noise texture. This parameter can range between 0 and 1 and the default value is 0. The minimum value in the LIC will be greater than or equal to the minimum noise value. Therefor, this influences the minimum and standard deviation of LIC intensities. Max value This parameter sets the highest gray scale value realizable in the generated noise texture. This parameter can range between 0 and 1 and the default value is 0.8. The maximum value in the LIC will be less than or equal to the maximum noise value. Therefor, this influences the maximum and standard deviation of LIC intensities. Number of levels Set the number of realizable gray scale values. This parameter can range from 2 to 1024 and the default value is 1024. This parameter influences the contrast and overall smoothness in the LIC. Impulse probability This parameter controls how likely a given element is to be assigned a value. When set to 1 all elements are filled. When set less than 1 a fraction of the texture's elements are filled with generated noise. Elements that are not filled take on a background color value. The default impulse probability is 1. Background color The gray scale value to use for untouched pixels when the impulse probability parameter is set less than 1. The default background value is 0. RNG seed Modify the seed value fed to the random number generator. This can be used to generate similar noise distributions. Figure 5 Varying properties of the noise input to the image LIC can be an effective method for controlling streaking characteristics in the resulting LIC. Gaussian, uniform, Perlin, and impulse noise are shown.

## Two pass image LIC

Figure 6 Intermediate results from each stage in the two pass image LIC with contrast enhancement and antialiasing enabled. The result, shown in the CE2 panel, has high contrast LIC streaks with a good balance of light and dark pixels making it ideal for combination with psudocoloring of scalar data.

ParaView implements a two-pass LIC computation. Enabling the two-pass computation activates an algorithm sub-pipeline that applies a number of image processing filters which can greatly improve the visibility of the LIC's streaking patterns. In the first pass a traditional LIC is computed. Image processing filters, consisting of an optional contrast enhancement(CE) filter and a Laplace edge-enhance(EE) filter, are then applied strengthening the streaking patterns. In the second pass LIC is recomputed using the output of the image processing filters in the place of a noise texture. Integration in the second pass is made using ${\displaystyle 1/2}$ of the number of integration steps to account for the fact that the filtered first pass output is relatively smooth. The image processing and LIC stages that comprise the two pass algorithm are shown in Figure 3b. Figure 6 shows the progression as the streaking patterns form and are strengthened by each successive stage in the two-pass image LIC computation.

Figure 7 Scalar color shaders. Blending (left column) lit colored geometry with the LIC compared to mapping (right column) them onto the LIC, with (bottom row) and without (top row) scalar colors mapped onto the lit surface rendering.

### Mapping colors onto the LIC

The mapping fragment shader is described by the following equation:

${\displaystyle \left.c_{ij}=(L_{ij}+f)*S_{ij}\right.}$

where the indices ${\displaystyle i,j}$ identify a specific fragment, ${\displaystyle c}$ is the final RGB color, ${\displaystyle L}$ is LIC gray scale intensity, ${\displaystyle S}$ is the scalar RGB color, and ${\displaystyle f}$ is a biasing parameter, typically 0, that may be used for fine tuning. When ${\displaystyle f=0}$, the typical case, colors are transferred directly to the final image where the LIC is 1, and a linearly scaled transfer of scalar colors where LIC gray scale color is less than one down to 0, where the final color is black. The bias parameter ${\displaystyle f}$ may be set to small positive or negative values between -1 and 1 to increase or decrease LIC values uniformly resulting in brighter or darker images. When ${\displaystyle f!=0}$ final fragment colors, ${\displaystyle c}$, are clamped such that ${\displaystyle 0<=c<=1}$.

With the mapping approach the distribution of intensity values in the LIC directly affect the accuracy and intensity with which scalar colors and lighting effects are transferred in the final rendering. With this shader the RGB values produced will be less then or equal to the maximum intensity values in the image LIC. The greater the number of pixels with an intensity close to 1, the more accurately and brighter the lit pseduocolored surface geometry is rendered. Of course, the need for many high intensity pixels must be balanced with need for a sufficient number of highly contrasting pixels, where the value is closer to 0, in order to accurately represent the LIC pattern itself. ParaView's noise generator and contrast enhancement stages provide a number of features that aid in interactively achieving this balance.

### Blending colors with the LIC

The blending fragment shader is described by the following equation:

${\displaystyle \left.c_{ij}=L_{ij}*I+S_{ij}*(1-I)\right.}$

where the indices ${\displaystyle i,j}$ identify a specific fragment, ${\displaystyle c}$ is final RGB color, ${\displaystyle L}$ is LIC gray scale value, ${\displaystyle S}$ is the scalar RGB color, and ${\displaystyle I}$ is a constant ranging from 0 to 1, with a default of 0.8. Decreasing ${\displaystyle I}$ to obtain brighter colors has the affect of diminishing the intensity of the LIC, while increasing ${\displaystyle I}$ to obtain stronger LIC has the affect of washing out colors.

In some cases when colors are bright the LIC is difficult to see and attaining a usable result will require sacrificing both visibility of the LIC and brightness of pseudocoloring. The blending shader, like the mapping shader, benefits from high contrast streaking patterns and the right balance of high intensity pixels in the LIC in order to preserve streaking patterns and pseudocoloring in the final rendering. Note that despite the fact that it inherently decreases visibility of features in scalar coloring and image LIC the blending approach can be especially useful with curved surfaces and pronounced lighting effects and also when scalar color map is very intense.

## Contrast enhancement

The convolution process inherently tends to decrease both the contrast and dynamic range in the LIC narrowing and concentrating the distribution of resulting intensity values around a mid tone. The use of Gaussian noise during LIC computation produces relatively smooth and pixelation free streaking but tends to worsen this narrowing effect since the input intensities are already highly concentrated about a central intensity value. An example of the narrowing and concentration resulting from the convolution can be seen in the top row of figure 8a where the output of the first LIC pass is shown with its intensity distribution on the right. This narrowing and concentration of intensities in the LIC can result in an overall dark and dull image making the combination with lit pseduocolored surface geometry difficult. An example of a dark and dull result can be seen in the left panel of figure 9. In order to counteract the narrowing and darkening affects of the convolution three optional contrast enhancement (CE) stages have been added, one after each LIC stage and one after the combination of scalar colors and LIC. The new stages increase both dynamic range and contrast, improve the streaking patterns, and facilitate combination of LIC with pre-rendered lit pseudocolored surface geometry.

Figure 8 shows the input and output images of each of the contrast enhancement stages along side their respective intensity distributions. The intensity distributions for input images include vertical lines indicating the min and max values. The contrast enhancement algorithm works by stretching the input distributions so that in the output distribution the min is 0 and the max is 1 increasing the contrast and dynamic range in the final rendered image. An example of how the CE stages can be used improve the efficacy of scalar color shader is shown in the right panel of figure 9.

 Figure 8 CE stage examples. Vertical lines plotted in the input distribution show min and max values. The CE stages map the input distribtions onto the range 0 to 1. Use of the CE adjustment factors results in an accumulation of output values that are exactly 0 or 1. Figure 8a Input and output of first image LIC CE stage with histogram. Figure 8b Input and output of the second image LIC CE stage. Figure 8c Input and output of the surface LIC painter color CE stage.

### Image LIC CE stages

The image LIC CE stages are implemented by histogram stretching of the gray scale intensities as follows:

${\displaystyle \left.c_{ij}={\frac {c_{ij}-m}{M-m}}\right.}$

where, the indices ${\displaystyle i,j}$ identify a specific fragment, ${\displaystyle c}$ is the fragment's gray scale intensity, ${\displaystyle m}$ is the intensity to map to 0, ${\displaystyle M}$ is the intensity to map to 1. In the first CE stage, which is applied on the input of the EE stage, ${\displaystyle m}$ and ${\displaystyle M}$ are always set to the current minimum and maximum gray scale color of all fragments. However, in the final CE stage ${\displaystyle m}$ and ${\displaystyle M}$ may be individually adjusted using the following set of equations:

${\displaystyle \left.m=\min(C)+F_{m}*\left(\max(C)-\min(C)\right)\right.}$

${\displaystyle \left.M=\max(C)-F_{M}*\left(\max(C)-\min(C)\right)\right.}$

where, ${\displaystyle C=\{c_{00},c_{01},...,c_{nm}\}}$, are the set of gray scale intensities in the input image and ${\displaystyle F_{m}}$ and ${\displaystyle F_{M}}$ are adjustment factors that take on values between 0 and 1. Setting these factors to 0 maps the current minimum and maximum intensity onto 0 and 1 respectively stretching the values in between and filling the entire range effectively increasing the dynamic range. Increasing ${\displaystyle F_{m}}$ shifts the minimum intensity left mapping the lower portion of the input's intensity distribution onto values less than 0, increasing the number of low intensity values in the output which darkens the darker parts of LIC streaks. Increasing ${\displaystyle F_{M}}$ shifts the maximum intensity right mapping the upper portion of the input intensity distribution on to values greater than 1, increasing the number of high intensity values in the output which lightens the lighter parts of the LIC streaks.

The adjustment factors provide a means to increase the contrast of LIC streaks and improve the efficacy of the scalar color shaders which rely on a good balance between light and dark LIC intensities in order to successfully represent both lit scalar pseudocolored surface geometry and LIC in the same rendering. They become especially useful when the minimum and/or maximum intensity values in the CE stage input are unrepresentative of the LIC as a whole. For example, this can occur when, near the dataset boundary or in regions of stagnant flow, input noise values are convolved relatively less than in the majority of the rest of the LIC. These relatively unconvolved fragments have unrepresentative low and high intensity values reducing the efficacy of the CE stage. The adjustment factors can be used to correct this so that more representative values are mapped to 0 and 1. Figure 8b shows an example where the adjustment factors have been used. The affects of the adjustment factors can be seen in the output distribution's accumulation of values that are exactly 0 and 1 which is the result of clamping values mapped above 1 and below 0.

### Antialias stage

Occasionally the use of the image LIC CE stages results in jagged or pixelated streaking patterns in the output of the second LIC pass. This pixelation is a result of over saturation occuring during the processing of sharp intensity transitions by the EE stage. The level of pixelation that's introduced depends on a number of factors such as the vector data, properties of the noise texture, number of integration steps taken, and the min and max CE factors used. Pixelation can be reduced by increasing the number of integration steps or enabling the optional anti-aliasing (AA) stage. The AA stage, when enabled, is applied to the input of the final LIC CE stage. By applying the AA stage before the final CE stage pixelation can be removed while allowing the final CE stage to boost contrast in preparation for the combination of the LIC and lit pseudoclored surface geometry by the scalar color shaders. This helps to ensure bright scalar colors in the final image. An examples showing the use of the AA stage is shown in figure 6.

### Painter CE stage

After the combination of lit pseudocolored surface geometry and LIC an optional color contrast enhancement (CCE) stage may be applied. This may be used to increase the contrast in the LIC's streaking patterns and brighten pseudocolors. The CCE stage is implemented using histogram stretching on the fragments lightness in the HSL color space.

${\displaystyle \left.L_{ij}={\frac {L_{ij}-m}{M-m}}\right.}$

where, the indices ${\displaystyle i,j}$ identify a specific fragment, ${\displaystyle L}$ is the fragment's lightness in HSL space, ${\displaystyle m}$ is the lightness to map to 0, ${\displaystyle M}$ is the lightness to map to 1. ${\displaystyle m}$ and ${\displaystyle M}$ take on minimum and maximum lightness over all fragments by default but may be individually adjusted by the following set of equations:

${\displaystyle \left.m=\min(L)+F_{m}*\left(\max(L)-\min(L)\right)\right.}$

${\displaystyle \left.M=\max(L)-F_{M}*\left(\max(L)-\min(L)\right)\right.}$

where, ${\displaystyle L}$ are fragment lightness values and ${\displaystyle F_{m}}$ and ${\displaystyle F_{M}}$ are the adjustment factors that take on values between 0 and 1. When ${\displaystyle F_{m}}$ and ${\displaystyle F_{M}}$ are 0 the current minimum and maximum lightness values found in the stage's input are used. Increasing ${\displaystyle F_{m}}$ shifts the minimum lightness value left stretching the lower tail of the distribution across values less than 0, increasing the dark colors in the output, darkening the darker colors in the image. Increasing ${\displaystyle F_{M}}$ shifts the maximum right mapping the distributions upper tail on to values greater than 1, increasing the number of high light colors in the output, intensifying brighter colors. Because lightness channel is clamped in the range of 0 to 1, increasing the adjustment factors too much leads to over saturation in the resulting image. Figure 9, which compares the result with(right) and without(left the CE stage, shows an example of the improvement that may be attained using the CCE stage.

 Figure 9. Improving the transfer of scalar coloring via CE stages. Visualization of surface LIC of magnetic field colored by magnetic field magnitude. Figure 9a (left) Without CE stages. Figure 9b (right) With CE and CCE stages enabled.

Masking is a technique where by a specialized shader handles combining LIC fragments with lit pseudocolored surface geometry where the vector magnitude is below a user provided threshold. This provides control over how fragments near regions of stagnant flow are handled. When integrating without normalization the convolution doesn't smooth out noise in these regions as much as it does where the flow is strong and these relatively unconvolved noise values can become outliers in the LIC intensity distribution reducing the efficacy of the contrast enhancement stage and potentially disrupting the visualization. When masking is enabled these unconvolved noise values are discarded. In their place lit psudocolored geometry is blended with a masking color resulting in a visually harmonious match of the pseduocolored LIC intensity across the entire surface.

Fragments are masked according to the following equation:

${\displaystyle \left.c_{ij}=M*I+S_{ij}*(1-I)\right.}$

where the indices ${\displaystyle i,j}$ identify a specific fragment, ${\displaystyle c}$ is final RGB color, ${\displaystyle M}$ is the RGB mask color, ${\displaystyle S}$ is the scalar RGB color, and ${\displaystyle I}$ is the mask color intensity. This allows one control over the masking process so that:

• by setting the mask threshold less than the smallest vector magnitude unconvolved noise is rendered directly.
• by setting a unique mask color and mask intensity greater than 0, masked fragments are highlighted.
• by setting mask intensity to 0, masked fragments are replaced by lit pseudocolored surface geometry at their full intensity.
• by setting mask intensity greater than 0, masked fragments are blended with a masking color harmoniously matching the intensity of the nearby LIC

Figure 4 shows an example of fragment masking where fragments with ${\displaystyle |V|<=1e^{-5}}$ are blended harmoniously with the surrounding LIC. Without fragment masking unconvolved noise in stagnant regions of the flow disrupted the visualization. Figure 10a shows an example where masking wasn't necessary as flow around stagnant regions varied smoothly. In figures 1a and 1b masking is used with the mask intensity set to 0 in order to display the lit surface of the launch vehicle at full intensity where vector data doesn't exist.

## Integrator Normalization

 Figure 10 Vector normalization during integration, shown on the right, accurately captures the tangent field but loses information about the relative flow strength as can be seen by comparing to the figure on the left which was generated with normalization disabled.

Normalizing vectors during integration is a trick that can be used to simplify integrator configuration and give the LIC a uniformly smooth look. By using normalized vector field values the convolution occurs over the same integrated arclength for all pixels in the image. This gives the result a smooth and uniform look and makes it possible to provide reasonable default values for step size and number of steps to the integrator independent of the input vector field. The resulting visualization accurately shows the tangent field however perceptual cues indicating variation in the relative strength in the flow are lost and can make weak insignificant features prominent and strong dominant features less so. For example figure 10 shows a flow where integrator normalization results in the visual emphasis of insignificant features in a stagnant part of the flow. In this case visualizing the tangent field lead to much confusion and debate during the analysis of the dataset. Disabling integrator normalization resulted in an accurate visualization of flow features and resolved the confusion. Because the visualization of the tangent field using integrator normalization generally produce good results and integrator normalization significantly simplifies algorithm configuration it's the default in ParaView. However, when visualizing flows with large variations in flow speed it can be useful to disable integrator normalization in order to get a representative visualization of the flow.

## Optimizations for interactivity

Given the complexity of the surface LIC pipeline, the computational expense of computing the LIC itself, and the large number of shader and noise generator parameters available, in order to deliver interactive rendering performance as paramters are adjusted it is important to render quickly and efficiently. Large differences in the run time of the various stages provide the potential for huge speed ups during interaction when the more expensive stages can be skipped. For instance typically the vector projection and image LIC stages split the majority of the rendering time about equally with remaining stages running orders of magnitude faster. When either of these more expensive stages can be skipped during interaction rendering performance is dramatically improved. To make this possible the output of each shader stage is cached and parameters are grouped according to the shader stage that they affect. See figure 3 for the details. As a user interacts with the visualization, cached results are re-used whenever possible so that only the stage affected by the interaction, and the stages downstream from it, are re-executed drastically speeding up interactive exploration.

## Parallelization

 Figure 11 Surface LIC computed on a composite dataset with 8 composite nodes in parallel using 4 processes. Left: the result is shown with composite node bounds colored by composite id. Center: The input or source screen space domain decomposition colored by process id is overlaid with surface geometry. Right: In-place disjoint compositor's target screen space domain decomposition colored by process id is overlaid on the output of the image LIC stage.

ParaView's data parallel pipeline allows us to handle datasets larger than can fit on a single compute node and provides a means for achieving faster rendering time on very large datatsets. However, load balancing considerations are made based on the data reported available by the reader and depending on the situation filters may redistribute data or add or subtract data during the pipeline execution producing load imbalance by the time rendering occurs. In addition the distribution of rendering work depends view parameters such as camera position, view angle, and the positions of near and far clipping planes. All of this tends to make load balancing a computationally costly rendering algorithm such as the surface LIC nontrivial.

The surface LIC algorithm is unique compared to other parallel rendering algorithms in two ways. First, in the data parallel setting the integration step requires access to off-process vector data in order to produce consistent results at process boundaries. This is dealt with by adding a gaurd pixel halo to each screen space pixel extent over which the LIC will be computed. Second, after vectors have been locally projected into screen space they must then be composited where ever there is inter-process screen space overlap to ensure global vector field correctness. Both the generation of guard pixels and vector field compositing occur inside of ParaView's image compositing pass further complicating the situation.

### Vector field compositing stage

Given the relatively high computational cost of computing the surface LIC it's important to have a good parallel distribution of rendering work and it can be beneficial to redistribute screen space data to achieve a more balanced work distribution. The vector field compositing stage provides an opportunity to load balance the LIC computation. However, working inside ParaView's image compositing pass makes any attempt at load balancing challenging because ParaView expects the initial screen space data distribution to remain fixed. This places some restrictions on what load balancing schemes will be practical since any reorganization of screen space data, by moving data from the initial decomposition to a more favorable one, requires moving it back after the computation has been made adding communication overhead. When attempting to load balance rendering computations in this environment the goal of achieving equal distribution of work must be balanced by the compositing and communication costs.

ParaView provides 3 vector field compositing strategies for composite data, in-place, in-place disjoint, and balanced. To reduce communication overhead prior to compositing all of the strategies minimize the input screen space bounds using the cached depth buffer values. Once the strategy is selected and the target domain decomposition is determined each of the target extents is again minimized using depth buffer values in order to further reduce the amount of compositing and communication overhead.

#### In-place compositing

The in-place strategy composites the vector field onto the minimized screen space extents of the input dataset without additional screen-space load balancing. This strategy is optimal when there is no off-process screen space data overlap, as in the case of computing LIC on a slice. However, the strategy can be inefficient when there is substantial off-process screen space data overlap because each overlapping pixel extent becomes a target for compositing and the LIC is redundantly computed on each. For example the screen space decomposition shown in figure 11b is what would be use for in-place compositing in that situation. There are large portions of the screen that overlap on as many as 4 processes resulting in the 4 times duplication of compositing and integration work over the duplicated regions. In this case the in-place strategy is inefficient because of the large number of overlapping data across processes.

#### In-place disjoint compositing

The in-place disjoint strategy adds load balancing to the in-place strategy. The target domain decomposition to which vector field will be composited and LIC computed on is constructed by making the minimized input screen space domain decomposition disjoint with respect to itself. The disjointification assigns each pixel to a single process so that redundant compositing and computation are eliminated. Data is left in-place which minimizes the compositing costs. An example of the domain decomposition used by the the in-place disjoint strategy is shown in far right panel of figure 11. The in-place disjoint strategy can be more efficient than the in-place strategy when there is a high degree of off-process screen space overlap. However, the disjointification process results in an increased number of screen space extents which tends to increase the number of guard pixels required which becomes a performance issue as the number of guard pixels approaches the number of valid pixels. Scaling studies show that this becomes a concern for very large parallel runs. In order to work with ParaView's image compositing pass the in-place disjoint strategy includes a scatter stage that moves the computed LIC back onto the original domain decomposition. This scatter stage doesn't require compositing because the disjointification process ensures that its source and destination extents are unique. Not accounting for guard pixel compositing, in the worst case the total communication cost of the in-place disjoint strategy is the same as the in-place strategy. Hence, when there are many overlapping off-process pixels in the initial screen space domain decomposition, and as long as the ratio of guard pixels to valid pixels is small (<<1), the disjoint strategy is the better choice.

#### Balanced compositing

The balanced strategy partitions the global minimized input screen space extent into equal size tiles assigning one tile to each process. The screen space vector field data is composited onto the new target domain decomposition where LIC is computed. Like the in-place disjoint strategy the balanced strategy assigns pixels to processes uniquely resulting in the LIC being computed only once for each process, and like th ein-place disjoint strategy a scatter stage is required to move data back onto the initial domain decomposition. One advantage of the balanced strategy over the in-place disjoint strategy is that because there is only one pixel extent per process, the ratio of guard pixels to valid pixels tends to be much smaller as the number of processes increases. However, the compositing costs can be relatively higher because data is not necessarily left in-place, and in situations where valid pixels don't fill the screen space extents some processes may be left with no work.

#### Automatic compositing

In the automatic compositing strategy an estimate of the compositing cost is made and a heuristic is used to select either the in-place or in-place disjoint strategy. Thus the benefits of both of these strategies are leveraged while some of the downsides are avoided on a case by case basis with out user intervention. This is the recommended strategy and is used by default.

## Acknowledgment

The work was performed at Lawrence Berkeley National Laboratory (LBNL) under a non-federal agreement with the University of Tennessee supported by the National Science Foundation under Grant number SF OR.13425-001.01, National Institute for Computational Sciences (NICS) NSF Center for Remote Data Analysis and Visualization (RDAV).

This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Some of the datasets used in the work were created under grant number DE-FG02-10ER55076, Kinetic Physics of Homogeneous Turbulence in Collisionless Plasmas.