[Paraview] Parallel anomoly

James Galbraith james.galbraith at inl.gov
Fri Feb 9 18:35:02 EST 2007


Hi Kent,

Thanks for the information.  I had no idea the command line arguments 
that mpich adds were position dependent.  Is that written down 
somewhere, hopefully somewhere outside your head??

After looking at your listings, it looks like you are having some of the 
same anomolies I was seeing.  The ssh command used to create the second 
pvserver has the following command line parameters added by mpich: 
"\-p4amslave \-p4yourname www.server.edu \-p4rmrank 1".  When you look 
at the actual command line for 33413, it shows the command line 
arguments as: "4amslave -p4yourname www.server.edu -p4rmrank 1".  For 
some reason the leading "-p" is being stripped off the -p4amslave 
argument.  Not sure if this is a problem but something I noticed.

I tried creating a startup script like the one you described.  It did 
insert the mpich arguments ahead of mine (complete with the missing -p 
ahead of the 4amslave arg):
    /home/galbja/paraview-2.4.4-bin/bin/pvserver-real smclinux 16390   
4amslave -p4yourname smclinux -p4rmrank 1 -rc --use-offscreen-rendering
but when I open a data file and accept it, it still looks like only the 
first server is doing any work.  When I run the same scenario on our 
large smp box (using SGI mpi), the four processes all take about 100% of 
their cpus for a period.  I don't see this on my cluster or dual-quad 
system using mpich.  I'm not convinced the server processes know what is 
going on and who is supposed to be doing what...

Any more ideas??  What to look at??

Thanks again for all your help

Jim

Kent Eschenberg wrote:
> I don't know if this will help. I'm working with 2.4.4 on a Redhat 
> Enterprise SMP system.
>
> I'm using mpich 1.2.7p1, the ch_p4 "device", and ssh instead of rsh. 
> This device generates some extra processes for communication (they 
> show up as using 0 CPU time) so one sees at least twice as many 
> processes as expected.
>
> I am using an almost trival bash script, server.job, that does a few 
> things then executes
>
>    pvserver $@ -rc --client-host=www.client.edu
>
> The additional arguments passed to the script by mpich go in at the 
> "$@" and must come first. Mpich includes a few parameters that are 
> position dependent and the mpich code (linked to pvserver) can't find 
> its arguments if they are not where they are expected. Using this 
> script, unfortunately, will make the following a little bit harder to 
> follow.
>
> For this test I ran
>
>    $ mpirun -np 2 server.job
>
> Here is the information I get from ps. Each item includes the CPU time 
> followed by the command and its arguments. There were alse a few 
> copies of ssh involved that I didn't list. I've edited the text a little:
>
>    www.server.edu is the SMP server
>    www.client.edu is my workstation running the client
>    "/me" is my working directory
>    "/paraview" is the path to pvserver
>
> This is the primary instance of the job run by mpirun:
> 0.0
> /bin/bash
> /me/server.job -p4pg /me/PI7282 -p4wd /home/me
>
> This is the first instance of pvserver:
> 0.7
> /paraview/pvserver
> -p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu
>
> This is probably the I/O process for the first pvserver:
> 0.0
> /paraview/pvserver
> -p4pg /me/PI7282 -p4wd /me -rc --client-host=www.client.edu
>
> It looks like this is used to fire off the second instance:
> 0.0
> ssh
> www.server.edu -l eschenbe -n /me/server.job www.server.edu 33413
> \-p4amslave \-p4yourname www.server.edu \-p4rmrank 1
>
> The second instance first runs my default shell, tcsh:
> 0.0
> tcsh
> -c /me/server.job www.server.edu 33413 \-p4amslave \-p4yourname
> www.server.edu \-p4rmrank 1
>
> This looks like the second instance of the job:
> 0.0
> /bin/bash
> /me/server.job www.server.edu 33413 -p4amslave -p4yourname
> www.server.edu -p4rmrank 1
>
> Finally, the second instance of pvserver:
> 0.7
> /paraview/pvserver
> www.server.edu 33413 4amslave -p4yourname www.server.edu
> -p4rmrank 1 -rc --client-host=www.client.edu
>
> And the communication process for the second pvserver:
> 0.0
> /paraview/pvserver
> www.server.edu 33413 4amslave -p4yourname www.server.edu
> -p4rmrank 1 -rc --client-host=www.client.edu
>
> It works fine as it is but changing MPICH to use "shmem" and skipping 
> server.job might reduce the number of processes a little.
>
> Kent
> Pittsburgh Supercomputing Center


-- 
"To Do Is To Be" - Plato
"To Be is To Do" - Descartes
"Do Be Do Be Do" - Sinatra

James A. Galbraith
Idaho National Laboratory (INL)
Battelle Energy Alliance (BEA)
P.O. box 1625
Idaho Falls, ID  83415-3779
James.Galbraith at inl.gov
(208)526-1864



More information about the ParaView mailing list