ParaView/Users Guide/Batch Processing

From KitwarePublic
Jump to navigationJump to search

Batch Processing

ParaView's pvbatch and pvpython command line executables substitute a python interpreter for the Qt GUI interface that most users control ParaView's back end data processing and rendering engine through. Either may be used for batch processing, that is to replay Visualization sessions in an exact, easily repeated way. The input to either comes in the form of the same python script that was described in the previous section.

Of the two, pvbatch is more specialized for batch processing and suited to running in an offline mode on dedicated data processing supercomputers because:

  • It does not take in commands from the terminal, which is usually unavailable on this class of machines.

Therefore you must supply a filename of the script you want pvbatch to execute.

  • It it is permanently joined to the backend server and thus does not require TCP socket connections to it.

Therefore in the scripts that you give to pvbatch it is not possible to Disconnect() from the paired server or Connect() to a different one.

  • It can be run directly as an MPI parallel program in which all pvbatch processes divide up the work and cooperate.

Therefore you typically start pvbatch like this:

[mpiexec -N <numprocessors>] pvbatch [args-for-pvbatch] script-filename [args-for-script]

Creating the Input Deck

There are at least three ways to create a batch script.

The hardest one is writing it by hand using the syntax described in the previous section. You can of course use any text editor for this but you will probably be more productive if you set up a more fully featured python IDE like Idle or the python shell within the ParaView GUI so that you have access to interactive documentation, tab completion and quick preview capabilities. Another alternative is to let the ParaView GUI client record all of your actions into a python script by using the Python Trace feature. Later you can easily tweak the recorded script once you become familiar with ParaView's python syntax. The third, and to longtime ParaView users the most traditional way, is to instead record a ParaView state file and then load that via a small python script as demonstrated in the first example below.

Examples

Loading a state file and saving a rendered result

<source lang="python"> >>> from paraview.simple import *

  1. Load the state

>>> servermanager.LoadState("/Users/berk/myteststate.pvsm") </source>

At this point you have a working pipeline instantiated on the server which you can use introspection on to access and then arbitrarily control anything within. At the core ParaView's is a visualization engine so we will demonstrate by simply generate and saving an image.

<source lang="python">

  1. Make sure that the view in the state is the active one so we don't have to refer to it by name.

>>> SetActiveView(GetRenderView())

  1. Now render and save.

>>> Render() >>> WriteImage("/Users/berk/image.png") </source>

parameter study

Parameter studies are one example of how batch processing can be extremely useful. In a parameter study one or more pipeline parameters (a filename, a timestep, or a filter property for example) are varied across some range but an otherwise identical script is replayed numerous times and results are saved. After the suite of sessions complete the set of results are easy to compare. For this type of work I recommend writing a higher level script that varies the parameter and for each value spawns off a pvbatch session where the parameter gets passed in as an argument to the ParaView python script.

The following is a slightly condensed version of a hierarchical set of scripts written during a benchmark study. This benchmark is an example of a parameter study in which the number of triangles rendered in the scene is varied and afterward we examine the output to determine how the rendering rate differs as a function of that parameter change.

This top level script varies the number of triangles and then submits parallel jobs to the cluster's PBS batch queue. See the qsub manpages or ask your system administrators for the exact syntax of the submission command.

<source lang="bash"> RUNID=0 NNODES=8 TLIMIT=10 for NUMTRIS in 10 20 30 40 50 do

   mkdir ~/tmp/run${RUNID}
   qsub -N run${RUNID} \
       -l "walltime=0:${TLIMIT}:0.0 select=${NNODES}:ncpus=8:arch=wds024c" \
       -j eo -e ~/tmp/run${ID}/outstreams.log \
       -v "RUNID=${ID} NNODES=${NNODES} NUMTRIS=${NUMTRIS}" \
       ~/level2.sh
   let RUNID+=1

done </source>

The second level script is executed whenever it gets to the top of PBS's priority queue. It examines the parameters it is given and then runs paraview's pvbatch executable with them. It also does some bookkeeping tasks that are helpful when debugging the batch submission process.

<source lang="bash"> echo "RUN NUMBER=${RUNID}"

  1. setup MPI environment

source ${HOME}/openmpipaths.sh

  1. prepare and run the parallel pvbatch program for the parameter value we are given

batch_command="${HOME}/ParaView-3.8.1/build/bin/pvbatch ${HOME}/level3.py -# ${RUNID} -nt ${NUMTRIS}" mpirun -np $NNODES --hostfile $PBS_NODEFILE $batch_command

  1. move the results to more permanent storage

mv /tmp/bench* ${HOME}/tmp/run${DDM_RUNNUM} </source>

The final level is the script that is executed by pvbatch. <source lang="python"> from paraview.simple import * from optparse import OptionParser import paraview.benchmark import math import sys import time

parser = OptionParser() parser.add_option("-#", "--runid", action="store", dest="runid",type="int",

                 default=-1, help="an identifier for this run")

parser.add_option("-nt", "--triangles", action="store", dest="triangles",type="int",

                 default=1, help="millions of triangles to render")

(options, args) = parser.parse_args()

print "########################################" print "RUNID = ", options.runid print "START_TIME = ", time.localtime() print "ARGS = ", sys.argv print "OPTIONS = ", options print "########################################"

paraview.benchmark.maximize_logs()

TS = Sphere() side=math.sqrt(options.triangles*1000000/2) TS.PhiResolution = side TS.ThetaResolution = side

dr = Show() view.UseImmediateMode = 0 view = Render()

cam = GetActiveCamera() for i in range(0,50):

 cam.Azimuth(3)
 Render()
 WriteImage('/tmp/bench_%d_image_%d.jpg' % (options.runid, i))

print "total Polygons:" + str(dr.SMProxy.GetRepresentedDataInformation(0).GetPolygonCount()) print "view.ViewSize:" + str(view.ViewSize)

paraview.benchmark.get_logs() logname="/tmp/bench_" + str(options.runid) + "_rawlog.txt" paraview.benchmark.dump_logs(logname)

print "#######" print "END_TIME = ", time.localtime() </source>

large data example

Another important example is for visualizing extremely large datasets that can not be easily worked with interactively. In this setting, the user first constructs a visualization off a small but representative data set. Typically this takes place by recording a session in the standard GUI client running on some small and easily accessed machine. Later, the user edits the filename property of the reader in the recorded session file to point to the large full resolution data. Finally the user submits the script to a larger machine which performs the visualization and saves off results offline.