[Paraview] strange behaviour with PV3.10 on sgi altix system

pratik pratik.mallya at gmail.com
Thu Apr 28 05:08:02 EDT 2011


Also, I wrote to a person who seemed to have the same problem with sgi 
mpt and this is what he wrote back (he did not use paraview, but ran a 
cluster of sgi altix systems). I don't know much of mpi so i'm still 
looking into this, but i thought this may be of some assistance to the 
paraview developers if they try to see why this problem is occurring:

On Wed, Apr 27, 2011 at 02:53:49PM +1000, pratik for help wrote:


> >  The startup mechanism for SGI MPI jobs is quite complex and depends on
> >  the type of executable you are running. If you encounter errors such as
> >  ctrl_connect/connect: Connection refused
> >  or
> >  mpirun: MPT error (MPI_RM_sethosts): err=-1: could not run executable
> >  (case #3)
> >  contact us for an explanation.
> >  
> >  Can you please explain why such errors occur? I am running paraview on a
> >  sgi altix cluster and am getting the exact same error!
>    
I worked in some depth on MPT while we had our Altix.  Here are the
details that I remember.

During startup, mpirun will listen on a certain IP/port.  It puts the
IP/port into an environment variable (MPI_ENVIRONMENT, perhaps? I
forget, it starts with MPI_* though), and then starts the worker
processes.  The worker processes (actually, 1 "shepherd" process per
node) will examine $MPI_ENVIRONMENT, and then using those details,
connect back to the mpirun process.  This connection is then used to
communicate job details, as well as stdin/out/err.

The error indicates that this connection could not be made.  The main
reasons are, either the $MPI_ENVIRONMENT variable hasn't been
propagated properly, or some other process has already connected to
the mpirun (the mpirun will stop listening once it receives the right
number of connections), usually because some other MPI program has
already connected (eg. if the MPI worker program is somehow run
twice), or perhaps if there is a firewall or (TCP/IP) networking issue
between the remote worker nodes and the node running mpirun.

I hope that helps.

Kev

-- Dr Kevin Pulo kevin.pulo at anu.edu.au Academic Consultant / Systems 
Programmer www.kev.pulo.com.au NCI NF / ANU SF +61 2 6125 7568


On Thursday 28 April 2011 02:33 PM, pratik wrote:
> Hi,
> Also, can you please tell me how can i rebuild paraview with the 
> *static* library of the plugin (i.e the .a file)? Although this is a 
> very inelegant way to solve the problem, I just want the functionality 
> of the TensorGlyph  plugin.
>
> pratik
> On Thursday 28 April 2011 02:09 PM, pratik wrote:
>> Hi Utkarsh,
>> So...do you have a hunch what may be going on? I'm sorry if i have 
>> been troubling you a lot, but this is really the last stage to get PV 
>> working on the cluster:as i said before, the build with 
>> BUILD_SHARED_LIB  off worked perfectly, but the one with that option 
>> on did not....
>> The  thing that bothers me is that it is definetly not something 
>> wrong with sgi mpt, since one build of pvserver is working fine. 
>> Having reached so far, it is driving me crazy that it is still not 
>> able to work :(
>>
>> If you need any more information please do let me know. Once again 
>> thanks for all the help.
>>
>> pratik
>> On Wednesday 27 April 2011 08:35 PM, pratik wrote:
>>> I think it is :
>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>> ldd /home/pratikm/source/ParaView/ParaView-3.10.1/BUILD/bin/pvserver 
>>> |grep mp
>>>     libmpi++abi1002.so => 
>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002b61473a3000)
>>>     libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so 
>>> (0x00002b61474d0000)
>>>     libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so 
>>> (0x00002b6147854000)
>>>     libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so 
>>> (0x00002b614ed46000)
>>>     libimf.so => /opt/intel/Compiler/11.1/038/lib/intel64/libimf.so 
>>> (0x00002b6153a16000)
>>>     libsvml.so => 
>>> /opt/intel/Compiler/11.1/038/lib/intel64/libsvml.so 
>>> (0x00002b6153d69000)
>>>     libintlc.so.5 => 
>>> /opt/intel/Compiler/11.1/038/lib/intel64/libintlc.so.5 
>>> (0x00002b6153f80000)
>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>> ldd 
>>> /home/pratikm/source/ParaView/ParaView-3.10.1/NEWBUILD/bin/pvserver 
>>> |grep mp
>>>     libmpi++abi1002.so => 
>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002ac9ae446000)
>>>     libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so 
>>> (0x00002ac9ae573000)
>>>     libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so 
>>> (0x00002ac9ae8f7000)
>>>     libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so 
>>> (0x00002ac9aee0f000)
>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>> ldd /home/pratikm/install/bin/pvserver |grep mp
>>>     libmpi.so => /usr/lib64/libmpi.so (0x00002b0a9c9e3000)
>>>
>>> These are precisely the libraries i specified; the first one is the 
>>> pvserver with shared libs enabled, second one with shared lib 
>>> disabled, and the last one is the "installed" pvserver(installed 
>>> version of pvserver with shared libs enabled)
>>> Again, the last one is the "installed" pvserver; i am not quite sure 
>>> why the path has changed, but i am 90% sure that 
>>> /usr/lib64/libmpi.so refers to the same sgi mpi lib.
>>>
>>> pratik
>>> On Wednesday 27 April 2011 07:26 PM, Utkarsh Ayachit wrote:
>>>> Do a "pvserver --ldd", is it using the correct mpi libraries?
>>>>
>>>> Utkarsh
>>>>
>>>> On Wed, Apr 27, 2011 at 8:43 AM, pratik<pratik.mallya at gmail.com>  
>>>> wrote:
>>>>> Also, i tried to start the pvserver (with shared libraries 
>>>>> enabled) on just
>>>>> the head node:
>>>>> pratikm at annapurna:~/install/bin>  /usr/bin/mpirun -v -np 2
>>>>> /home/pratikm/install/bin/pvserver
>>>>> MPI: libxmpi.so 'SGI MPT 1.23  03/28/09 11:45:59'
>>>>> MPI: libmpi.so  'SGI MPT 1.23  03/28/09 11:43:39'
>>>>>
>>>>> and it just hangs there!
>>>>>
>>>>> pratik
>>>>> On Wednesday 27 April 2011 06:05 PM, pratik wrote:
>>>>>> oh! I'm sorry about that....
>>>>>> the client stalls indefinitely, but the server will stop 
>>>>>> executing. Since
>>>>>> I am running pv using PBS, the output file of the mpirun gives this:
>>>>>> MPI: libxmpi.so 'SGI MPT 1.23  03/28/09 11:45:59'
>>>>>> MPI: libmpi.so  'SGI MPT 1.23  03/28/09 11:43:39'
>>>>>>     MPI Environmental Settings
>>>>>> MPI: MPI_DSM_DISTRIBUTE (default: not set) : 1
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> ctrl_connect/connect: Connection refused
>>>>>> MPI: MPI_COMM_WORLD rank 2 has terminated without calling 
>>>>>> MPI_Finalize()
>>>>>> MPI: aborting job
>>>>>>
>>>>>> Attached is the cmakecahe of my server if you want to look at it.
>>>>>>
>>>>>> pratik
>>>>>> On Wednesday 27 April 2011 05:55 PM, Utkarsh Ayachit wrote:
>>>>>>> You need to be more specific about the "something" that's going 
>>>>>>> wrong
>>>>>>> before anyone can provide any additional information.
>>>>>>>
>>>>>>> Utkarsh
>>>>>>>
>>>>>>> On Wed, Apr 27, 2011 at 3:26 AM, 
>>>>>>> pratik<pratik.mallya at gmail.com>    wrote:
>>>>>>>> Hi,
>>>>>>>> I built 2 versions of pv on the sgi altix cluster here(sgi mpt
>>>>>>>> mpi)...one
>>>>>>>> with BUILD_SHARED_LIBS enabled and one without. Now, the static 
>>>>>>>> pvserver
>>>>>>>> functions properly (i am accessing thru laptop via the reverse
>>>>>>>> connection
>>>>>>>> method) BUT the one with shared_libs enabled does not! Can this
>>>>>>>> behaviour be
>>>>>>>> explained? (the second one fails to establish a 
>>>>>>>> connection...something
>>>>>>>> wrong
>>>>>>>> with pvserver)
>>>>>>>> I have EXACTLY the same cmakecache on both build EXCEPT the
>>>>>>>> BUILD_SHARED_LIBS option.
>>>>>>>> I know that there are many many things that could go wrong in a 
>>>>>>>> cluster
>>>>>>>> installation. So any hints/experience/hunch as to what is going 
>>>>>>>> on is
>>>>>>>> welcome.
>>>>>>>>
>>>>>>>> pratik
>>>>>>>> _______________________________________________
>>>>>>>> Powered by www.kitware.com
>>>>>>>>
>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>
>>>>>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>>>>>> http://paraview.org/Wiki/ParaView
>>>>>>>>
>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>>>>
>>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20110428/f0b806e8/attachment.htm>


More information about the ParaView mailing list