|(4 intermediate revisions by 3 users not shown)|
= Parallel ParaView =
One of the main purposes of ParaView is to allow users to create visualizations of large data sets that reside on parallel systems without first collecting the data to a single machine. Transferring the data is often slow and wasteful of disk resources, and the visualization of large data sets can easily overwhelm the processing and especially memory resources of even high-performance workstations. This chapter first describes the concepts behind the parallelism in ParaView. We then discuss in detail the process of starting-up ParaView's parallel server components. Lastly, we explain how a parallel visualization session is initiated from within the user interface. Parallel rendering is an essential part of parallel ParaView, so essential that we've given it its own [[ParaView/Users Guide/Parallel Rendering | chapter]] in this version of the book.
The task of setting up a cluster for visualization is unfortunately outside of the scope of this book. However, there are several online resources that will help to get you started including:
== Parallel Structure ==
ParaView has three main logical components: client, data server, and render server. The client is responsible for the GUI and is the interface between you the user and ParaView as a whole. The data server reads in files and processes the data through the pipeline. The render server takes the processed data and renders it to present the results to you.
[[File:ParaView_UsersGuide_parallel_architecture.png|thumb|center|800px|'''Figure 11.1''' Parallel Architecture]]
The three logical components can be combined in various different configurations. When ParaView is started, the client is connected to what is called the built-in server; in this case, all three components exist within the same process. Alternatively, you can run the server as an independent program and connect a remote client to it, or run the server as a standalone parallel batch program without a GUI. In this case the server process contains both the data and render server components. The server can also be started as two separate programs: one for the data server and one for the render server. The server programs are data-parallel programs that can be run as a set of independent processes running on different CPUs. The processes use MPI to coordinate their activities as each works on different pieces of the data.
[[file:ParaView_UsersGuide_common_configurations.png|thumb|center|800px|'''Figure 11.2''' Common Configurations of the logical components]]
=== Client ===
The client is responsible for the user interface of the application. ParaView’s general-purpose client was written to make powerful visualization and analysis capabilities available from an easy-to-use interface. The client component is a serial program that controls the server components through the Server Manager API.
=== Data Server ===
The data server is primarily constructed from VTK readers, sources, and filters. It is responsible for reading and/or generating data, processing it, and producing geometric models that the render server and client will display. The data server exploits data parallelism by partitioning the data, adding ghost levels around the partitions as needed, and running synchronous parallel filters. Each data server process has an identical VTK pipeline, and each process is told which partition of the data it should load and process. By splitting the data, ParaView is able to use the entire aggregate system memory and thus make large data processing possible.
=== Render Server ===
The render server is responsible for rendering the geometry. Like the data server, the render server can be run in parallel and has identical visualization pipelines (only the rendering portion of the pipeline) in all of its processes. Having the ability to run the render server separately from the data server allows for an optimal division of labor between computing platforms. Most large computing clusters are primarily used for batch simulations and do not have hardware rendering resources. Since it is not desirable to move large data files to a separate visualization system, the data server can run on the same cluster that ran the original simulation. The render server can be run on a separate visualization cluster that has hardware rendering resources.
It is possible to run the render server with fewer processes than the data server but never more. Visualization clusters typically have fewer nodes than batch simulation clusters, and processed geometry is usually significantly smaller than the original simulation dump. ParaView repartitions the geometric models on the data server before they are sent to the render server.
== MPI Availability ==
Until recently, in order to use ParaView's parallel processing features, one needed to build ParaView from the source code as described in the [[ParaView:Build_And_Install | Appendix]]. This was because there are many different versions of MPI, the library ParaView’s servers use internally for parallel communication, and for high performance computer users it is extremely important to use the version that is delivered with your networking hardware.
As of ParaView 3.10 however, we have begun to package MPI with our binary releases. If you have a multi-core workstation, you can now simply turn on the '''Use Multi-Core''' setting under ParaView's [[ParaView/ Users_Guide/ Settings | Settings]] to make use of all of them. This option makes parallel data server mode the default configuration, which can be very effective when you are working on computationally bound intensive processing tasks.
Otherwise, and when you need to run ParaView on an actual distributed memory cluster, you need to start up the various components and establish connections between them as is described in the next section.