[Paraview] Parallel anomoly
Kent Eschenberg
eschenbe at psc.edu
Fri Feb 9 15:50:58 EST 2007
I don't know if this will help. I'm working with 2.4.4 on a Redhat
Enterprise SMP system.
I'm using mpich 1.2.7p1, the ch_p4 "device", and ssh instead of rsh.
This device generates some extra processes for communication (they show
up as using 0 CPU time) so one sees at least twice as many processes as
expected.
I am using an almost trival bash script, server.job, that does a few
things then executes
pvserver $@ -rc --client-host=www.client.edu
The additional arguments passed to the script by mpich go in at the "$@"
and must come first. Mpich includes a few parameters that are position
dependent and the mpich code (linked to pvserver) can't find its
arguments if they are not where they are expected. Using this script,
unfortunately, will make the following a little bit harder to follow.
For this test I ran
$ mpirun -np 2 server.job
Here is the information I get from ps. Each item includes the CPU time
followed by the command and its arguments. There were alse a few copies
of ssh involved that I didn't list. I've edited the text a little:
www.server.edu is the SMP server
www.client.edu is my workstation running the client
"/me" is my working directory
"/paraview" is the path to pvserver
This is the primary instance of the job run by mpirun:
0.0
/bin/bash
/me/server.job -p4pg /me/PI7282 -p4wd /home/me
This is the first instance of pvserver:
0.7
/paraview/pvserver
-p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu
This is probably the I/O process for the first pvserver:
0.0
/paraview/pvserver
-p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu
It looks like this is used to fire off the second instance:
0.0
ssh
www.server.edu -l eschenbe -n /me/server.job www.server.edu 33413
\-p4amslave \-p4yourname www.server.edu \-p4rmrank 1
The second instance first runs my default shell, tcsh:
0.0
tcsh
-c /me/server.job www.server.edu 33413 \-p4amslave \-p4yourname
www.server.edu \-p4rmrank 1
This looks like the second instance of the job:
0.0
/bin/bash
/me/server.job www.server.edu 33413 -p4amslave -p4yourname
www.server.edu -p4rmrank 1
Finally, the second instance of pvserver:
0.7
/paraview/pvserver
www.server.edu 33413 4amslave -p4yourname www.server.edu
-p4rmrank 1 -rc --client-host=www.client.edu
And the communication process for the second pvserver:
0.0
/paraview/pvserver
www.server.edu 33413 4amslave -p4yourname www.server.edu
-p4rmrank 1 -rc --client-host=www.client.edu
It works fine as it is but changing MPICH to use "shmem" and skipping
server.job might reduce the number of processes a little.
Kent
Pittsburgh Supercomputing Center
More information about the ParaView
mailing list