[Paraview] Parallel anomoly

Kent Eschenberg eschenbe at psc.edu
Fri Feb 9 15:50:58 EST 2007


I don't know if this will help. I'm working with 2.4.4 on a Redhat 
Enterprise SMP system.

I'm using mpich 1.2.7p1, the ch_p4 "device", and ssh instead of rsh. 
This device generates some extra processes for communication (they show 
up as using 0 CPU time) so one sees at least twice as many processes as 
expected.

I am using an almost trival bash script, server.job, that does a few 
things then executes

    pvserver $@ -rc --client-host=www.client.edu

The additional arguments passed to the script by mpich go in at the "$@" 
and must come first. Mpich includes a few parameters that are position 
dependent and the mpich code (linked to pvserver) can't find its 
arguments if they are not where they are expected. Using this script, 
unfortunately, will make the following a little bit harder to follow.

For this test I ran

    $ mpirun -np 2 server.job

Here is the information I get from ps. Each item includes the CPU time 
followed by the command and its arguments. There were alse a few copies 
of ssh involved that I didn't list. I've edited the text a little:

    www.server.edu is the SMP server
    www.client.edu is my workstation running the client
    "/me" is my working directory
    "/paraview" is the path to pvserver

This is the primary instance of the job run by mpirun:
0.0
/bin/bash
/me/server.job -p4pg /me/PI7282 -p4wd /home/me

This is the first instance of pvserver:
0.7
/paraview/pvserver
-p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu

This is probably the I/O process for the first pvserver:
0.0
/paraview/pvserver
-p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu

It looks like this is used to fire off the second instance:
0.0
ssh
www.server.edu -l eschenbe -n /me/server.job www.server.edu 33413
\-p4amslave \-p4yourname www.server.edu \-p4rmrank 1

The second instance first runs my default shell, tcsh:
0.0
tcsh
-c /me/server.job www.server.edu 33413 \-p4amslave \-p4yourname
www.server.edu \-p4rmrank 1

This looks like the second instance of the job:
0.0
/bin/bash
/me/server.job www.server.edu 33413 -p4amslave -p4yourname
www.server.edu -p4rmrank 1

Finally, the second instance of pvserver:
0.7
/paraview/pvserver
www.server.edu 33413 4amslave -p4yourname www.server.edu
-p4rmrank 1 -rc --client-host=www.client.edu

And the communication process for the second pvserver:
0.0
/paraview/pvserver
www.server.edu 33413 4amslave -p4yourname www.server.edu
-p4rmrank 1 -rc --client-host=www.client.edu

It works fine as it is but changing MPICH to use "shmem" and skipping 
server.job might reduce the number of processes a little.

Kent
Pittsburgh Supercomputing Center


More information about the ParaView mailing list