ParaView/Users Guide/Batch Processing

From KitwarePublic
Jump to navigationJump to search

Batch Processing

ParaView's pvbatch and pvpython command line executables substitute a Python interpreter for the Qt GUI interface that most users control ParaView's back-end data processing and rendering engine through. Either may be used for batch processing, that is to replay Visualization sessions in an exact, easily repeated way. The input to either comes in the form of the same python script that was described in the previous section.

Of the two, pvbatch is more specialized for batch processing and suited to running in an offline mode on dedicated data-processing supercomputers because:

  • It does not take commands from the terminal, which is usually unavailable on this class of machines.

Therefore you must supply a filename of the script you want pvbatch to execute.

  • It is permanently joined to the back-end server and thus does not require TCP socket connections to it.

Therefore, in the scripts that you give to pvbatch, it is not possible to Disconnect() from the paired server or Connect()to a different one.

  • It can be run directly as an MPI parallel program in which all pvbatch processes divide the work and cooperate.

Therefore, you typically start pvbatch like this:

[mpiexec -N <numprocessors>] pvbatch [args-for-pvbatch] script-filename [args-for-script]

Creating the Input Deck

There are at least three ways to create a batch script.

The hardest one is writing it by hand using the syntax described in the previous section. You can of course use any text editor for this but you will probably be more productive if you set up a more fully featured python IDE like Idle or the python shell within the ParaView GUI so that you have access to interactive documentation, tab completion and quick preview capabilities. Another alternative is to let the ParaView GUI client record all of your actions into a python script by using the Python Trace feature. Later you can easily tweak the recorded script once you become familiar with ParaView's python syntax. The third, and to longtime ParaView users the most traditional way, is to instead record a ParaView state file and then load that via a small python script as demonstrated in the first example below.


Loading a state file and saving a rendered result

>>> from paraview.simple import *
# Load the state
>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")

At this point you have a working pipeline instantiated on the server which you can use introspection on to access and then arbitrarily control anything within. At the core ParaView's is a visualization engine so we will demonstrate by simply generate and saving an image.

# Make sure that the view in the state is the active one so we don't have to refer to it by name.
>>> SetActiveView(GetRenderView())
# Now render and save.
>>> Render()
>>> WriteImage("/Users/berk/image.png")

parameter study

Parameter studies are one example of how batch processing can be extremely useful. In a parameter study one or more pipeline parameters (a filename, a timestep, or a filter property for example) are varied across some range but an otherwise identical script is replayed numerous times and results are saved. After the suite of sessions complete the set of results are easy to compare. For this type of work I recommend writing a higher level script that varies the parameter and for each value spawns off a pvbatch session where the parameter gets passed in as an argument to the ParaView python script.

The following is a slightly condensed version of a hierarchical set of scripts written during a benchmark study. This benchmark is an example of a parameter study in which the number of triangles rendered in the scene is varied and afterward we examine the output to determine how the rendering rate differs as a function of that parameter change.

This top level script varies the number of triangles and then submits parallel jobs to the cluster's PBS batch queue. See the qsub manpages or ask your system administrators for the exact syntax of the submission command.

for NUMTRIS in 10 20 30 40 50
    mkdir ~/tmp/run${RUNID}

    qsub -N run${RUNID} \
        -l "walltime=0:${TLIMIT}:0.0 select=${NNODES}:ncpus=8:arch=wds024c" \
        -j eo -e ~/tmp/run${ID}/outstreams.log \

    let RUNID+=1

The second level script is executed whenever it gets to the top of PBS's priority queue. It examines the parameters it is given and then runs paraview's pvbatch executable with them. It also does some bookkeeping tasks that are helpful when debugging the batch submission process.


#setup MPI environment
source ${HOME}/

#prepare and run the parallel pvbatch program for the parameter value we are given
batch_command="${HOME}/ParaView-3.8.1/build/bin/pvbatch ${HOME}/ -# ${RUNID} -nt ${NUMTRIS}"
mpirun -np $NNODES --hostfile $PBS_NODEFILE $batch_command

#move the results to more permanent storage
mv /tmp/bench* ${HOME}/tmp/run${DDM_RUNNUM}

The final level is the script that is executed by pvbatch.

from paraview.simple import *
from optparse import OptionParser
import paraview.benchmark
import math
import sys
import time

parser = OptionParser()
parser.add_option("-#", "--runid", action="store", dest="runid",type="int",
                  default=-1, help="an identifier for this run")
parser.add_option("-nt", "--triangles", action="store", dest="triangles",type="int",
                  default=1, help="millions of triangles to render")
(options, args) = parser.parse_args()
print "########################################"
print "RUNID = ", options.runid
print "START_TIME = ", time.localtime()
print "ARGS = ", sys.argv
print "OPTIONS = ", options
print "########################################"


TS = Sphere()
TS.PhiResolution = side
TS.ThetaResolution = side

dr = Show()
view.UseImmediateMode = 0
view = Render()

cam = GetActiveCamera()
for i in range(0,50):
  WriteImage('/tmp/bench_%d_image_%d.jpg' % (options.runid, i))

print "total Polygons:" + str(dr.SMProxy.GetRepresentedDataInformation(0).GetPolygonCount())
print "view.ViewSize:" + str(view.ViewSize)

logname="/tmp/bench_" + str(options.runid) + "_rawlog.txt"

print "#######"
print "END_TIME = ", time.localtime()

large data example

Another important example is for visualizing extremely large datasets that can not be easily worked with interactively. In this setting, the user first constructs a visualization of a small but representative data set. Typically this takes place by recording a session in the standard GUI client running on some small and easily accessed machine. Later, the user changed the filename property of the reader in the recorded session file. Finally the user submits the script to a larger machine which performs the visualization offline and saves results for later inspection.

The essential thing that you need to be able to do for this is to substitute the filename and location of the original small dataset with the name and locations of the large one. There are two ways to do this.

The first way is to directly edit the filename in either the ParaView state file or the python script where it is loaded. The task is made easier by the fact that all readers conventionally name the input file name property "FileName". Standard python scripts are well described in other sections so we will describe paraview state files here instead. A paraview state file has the extension .pvsm and the internal format is a text based XML file. Simply open the pvsm file in a text editor, search for FileName and replace all occurances of the old with the new.

For reference, the portion of a pvsm file that specifies a reader's input file is:

    <Proxy group="sources" type="LegacyVTKFileReader" id="160" servers="1">
      <Property name="FileNameInfo" id="160.FileNameInfo" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
      <Property name="FileNames" id="160.FileNames" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
        <Domain name="files" id="160.FileNames.files"/>
      <Property name="TimestepValues" id="160.TimestepValues"/>
      <SubProxy name="Reader" servers="1"/>

The second way is to set up the pipeline and then use introspection to find and then change the filename. This approach is easier to parameterize but somewhat more fragile since not all readers respond well to having their names changed once established. You should at least use caution and try to change the filename before the pipeline first runs. Otherwise more readers will be confused and you will also waste time processing the smaller file. When loading state files the proper place to do this is immediately after the LoadState command. For python scripts the place to do this is as near to the creation of the reader as possible, and certainly before any Update or Render commands. An example of how to do this follows:

>>> from paraview.simple import *
# Load the state
>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")
# Now the pipeline will be instantiated but it will not have updated yet.
# You can programmatically obtain the reader from the pipeline starting with this command, which lists all readers, sources and filters in the pipeline.
>>> GetSources()
#{('box.ex2', '274'): <paraview.servermanager.ExodusIIReader object at 0x21b3eb70>}
# But it is easier if you note that readers are typically named according to the name of the file that they are created for.
>>> reader = FindSource('box.ex2')
#Now you can change the filename with these two commands:
>>> reader.FileName = ['/path_to/can.ex2']
>>> reader.FileNameChanged()