ParaView/Users Guide/Batch Processing

From KitwarePublic
Jump to navigationJump to search

Batch Processing

ParaView's pvbatch and pvpython command line executables substitute a python interpreter for the Qt GUI interface that most users control ParaView's back end data processing and rendering engine through. Either may be used for batch processing, that is to replay Visualization sessions in an exact, easily repeated way. The input to either comes in the form of the same python script that was described in the previous section.

Of the two, pvbatch is more specialized for batch processing and suited to running in an offline mode on dedicated data processing supercomputers because:

  • It does not take in commands from the terminal, which is usually unavailable on this class of machines.

Therefore you must supply a filename of the script you want pvbatch to execute.

  • It it is permanently joined to the backend server and thus does not require TCP socket connections to it.

Therefore in the scripts that you give to pvbatch it is not possible to Disconnect() from the paired server or Connect() to a different one.

  • It can be run directly as an MPI parallel program in which all pvbatch processes divide up the work and cooperate.

Therefore you typically start pvbatch like this:

[mpiexec -N <numprocessors>] pvbatch [args-for-pvbatch] script-filename [args-for-script]

Creating the Input Deck

There are at least three ways to create a batch script.

The hardest one is writing it by hand using the syntax described in the previous section. You can of course use any text editor for this but you will probably be more productive if you set up a more fully featured python IDE like Idle or the python shell within the ParaView GUI so that you have access to interactive documentation, tab completion and quick preview capabilities. Another alternative is to let the ParaView GUI client record all of your actions into a python script by using the Python Trace feature. Later you can easily tweak the recorded script once you become familiar with ParaView's python syntax. The third, and to longtime ParaView users the most traditional way, is to instead record a ParaView state file and then load that via a small python script as demonstrated in the first example below.

Examples

Loading a state file and saving a rendered result

<source lang="python"> >>> from paraview.simple import *

  1. Load the state

>>> servermanager.LoadState("/Users/berk/myteststate.pvsm") </source>

At this point you have a working pipeline instantiated on the server which you can use introspection on to access and then arbitrarily control anything within. At the core ParaView's is a visualization engine so we will demonstrate by simply generate and saving an image.

<source lang="python">

  1. Make sure that the view in the state is the active one so we don't have to refer to it by name.

>>> SetActiveView(GetRenderView())

  1. Now render and save.

>>> Render() >>> WriteImage("/Users/berk/image.png") </source>

parameter study

Parameter studies are one example of how batch processing can be extremely useful. In a parameter study one or more pipeline parameters (a filename, a timestep, or a filter property for example) are varied across some range but an otherwise identical script is replayed numerous times and results are saved. After the suite of sessions complete the set of results are easy to compare. For this type of work I recommend writing a higher level script that varies the parameter and for each value spawns off a pvbatch session where the parameter gets passed in as an argument to the ParaView python script.

The following is a slightly condensed version of a hierarchical set of scripts written during a benchmark study. This benchmark is an example of a parameter study in which the number of triangles rendered in the scene is varied and afterward we examine the output to determine how the rendering rate differs as a function of that parameter change.

This top level script varies the number of triangles and then submits parallel jobs to the cluster's PBS batch queue. See the qsub manpages or ask your system administrators for the exact syntax of the submission command.

<source lang="bash"> RUNID=0 NNODES=8 TLIMIT=10 for NUMTRIS in 10 20 30 40 50 do

   mkdir ~/tmp/run${RUNID}
   qsub -N run${RUNID} \
       -l "walltime=0:${TLIMIT}:0.0 select=${NNODES}:ncpus=8:arch=wds024c" \
       -j eo -e ~/tmp/run${ID}/outstreams.log \
       -v "RUNID=${ID} NNODES=${NNODES} NUMTRIS=${NUMTRIS}" \
       ~/level2.sh
   let RUNID+=1

done </source>

The second level script is executed whenever it gets to the top of PBS's priority queue. It examines the parameters it is given and then runs paraview's pvbatch executable with them. It also does some bookkeeping tasks that are helpful when debugging the batch submission process.

<source lang="bash"> echo "RUN NUMBER=${RUNID}"

  1. setup MPI environment

source ${HOME}/openmpipaths.sh

  1. prepare and run the parallel pvbatch program for the parameter value we are given

batch_command="${HOME}/ParaView-3.8.1/build/bin/pvbatch ${HOME}/level3.py -# ${RUNID} -nt ${NUMTRIS}" mpirun -np $NNODES --hostfile $PBS_NODEFILE $batch_command

  1. move the results to more permanent storage

mv /tmp/bench* ${HOME}/tmp/run${DDM_RUNNUM} </source>

The final level is the script that is executed by pvbatch. <source lang="python"> from paraview.simple import * from optparse import OptionParser import paraview.benchmark import math import sys import time

parser = OptionParser() parser.add_option("-#", "--runid", action="store", dest="runid",type="int",

                 default=-1, help="an identifier for this run")

parser.add_option("-nt", "--triangles", action="store", dest="triangles",type="int",

                 default=1, help="millions of triangles to render")

(options, args) = parser.parse_args()

print "########################################" print "RUNID = ", options.runid print "START_TIME = ", time.localtime() print "ARGS = ", sys.argv print "OPTIONS = ", options print "########################################"

paraview.benchmark.maximize_logs()

TS = Sphere() side=math.sqrt(options.triangles*1000000/2) TS.PhiResolution = side TS.ThetaResolution = side

dr = Show() view.UseImmediateMode = 0 view = Render()

cam = GetActiveCamera() for i in range(0,50):

 cam.Azimuth(3)
 Render()
 WriteImage('/tmp/bench_%d_image_%d.jpg' % (options.runid, i))

print "total Polygons:" + str(dr.SMProxy.GetRepresentedDataInformation(0).GetPolygonCount()) print "view.ViewSize:" + str(view.ViewSize)

paraview.benchmark.get_logs() logname="/tmp/bench_" + str(options.runid) + "_rawlog.txt" paraview.benchmark.dump_logs(logname)

print "#######" print "END_TIME = ", time.localtime() </source>

large data example

Another important example is for visualizing extremely large datasets that can not be easily worked with interactively. In this setting, the user first constructs a visualization of a small but representative data set. Typically this takes place by recording a session in the standard GUI client running on some small and easily accessed machine. Later, the user changed the filename property of the reader in the recorded session file. Finally the user submits the script to a larger machine which performs the visualization offline and saves results for later inspection.

The essential thing that you need to be able to do for this is to substitute the filename and location of the original small dataset with the name and locations of the large one. There are two ways to do this.

The first way is to directly edit the filename in either the ParaView state file or the python script where it is loaded. The task is made easier by the fact that all readers conventionally name the input file name property "FileName". Standard python scripts are well described in other sections so we will describe paraview state files here instead. A paraview state file has the extension .pvsm and the internal format is a text based XML file. Simply open the pvsm file in a text editor, search for FileName and replace all occurances of the old with the new.

For reference, the portion of a pvsm file that specifies a reader's input file is: <source lang="xml">

   <Proxy group="sources" type="LegacyVTKFileReader" id="160" servers="1">
     <Property name="FileNameInfo" id="160.FileNameInfo" number_of_elements="1">
       <Element index="0" value="/Data/molar.vtk"/>
     </Property>
     <Property name="FileNames" id="160.FileNames" number_of_elements="1">
       <Element index="0" value="/Data/molar.vtk"/>
       <Domain name="files" id="160.FileNames.files"/>
     </Property>
     <Property name="TimestepValues" id="160.TimestepValues"/>
     <SubProxy name="Reader" servers="1"/>
   </Proxy>

</source>

The second way is to set up the pipeline and then use introspection to find and then change the filename. This approach is easier to parameterize but somewhat more fragile since not all readers respond well to having their names changed once established. You should at least use caution and try to change the filename before the pipeline first runs. Otherwise more readers will be confused and you will also waste time processing the smaller file. When loading state files the proper place to do this is immediately after the LoadState command. For python scripts the place to do this is as near to the creation of the reader as possible, and certainly before any Update or Render commands. An example of how to do this follows:

<source lang="python"> >>> from paraview.simple import *

  1. Load the state

>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")

  1. Now the pipeline will be instantiated but it will not have updated yet.
  2. You can programmatically obtain the reader from the pipeline starting with this command, which lists all readers, sources and filters in the pipeline.

>>> GetSources()

  1. {('box.ex2', '274'): <paraview.servermanager.ExodusIIReader object at 0x21b3eb70>}
  2. But it is easier if you note that readers are typically named according to the name of the file that they are created for.

>>> reader = FindSource('box.ex2')

  1. Now you can change the filename with these two commands:

>>> reader.FileName = ['/path_to/can.ex2'] >>> reader.FileNameChanged() </source>