[Paraview] strange behaviour with PV3.10 on sgi altix system

pratik pratik.mallya at gmail.com
Thu Apr 28 07:56:28 EDT 2011


Hi,
I took the cmakecache from the static build and toggled the 
build_share_lib to ON and compiled and it  is working!
previously i had added some python bindings and other stuff too...

pratik
On Thursday 28 April 2011 02:38 PM, pratik wrote:
> Also, I wrote to a person who seemed to have the same problem with sgi 
> mpt and this is what he wrote back (he did not use paraview, but ran a 
> cluster of sgi altix systems). I don't know much of mpi so i'm still 
> looking into this, but i thought this may be of some assistance to the 
> paraview developers if they try to see why this problem is occurring:
> On Wed, Apr 27, 2011 at 02:53:49PM +1000, pratik for help wrote:
>
>    
>> >  The startup mechanism for SGI MPI jobs is quite complex and depends on
>> >  the type of executable you are running. If you encounter errors such as
>> >  ctrl_connect/connect: Connection refused
>> >  or
>> >  mpirun: MPT error (MPI_RM_sethosts): err=-1: could not run executable
>> >  (case #3)
>> >  contact us for an explanation.
>> >  
>> >  Can you please explain why such errors occur? I am running paraview on a
>> >  sgi altix cluster and am getting the exact same error!
>>    
> I worked in some depth on MPT while we had our Altix.  Here are the
> details that I remember.
>
> During startup, mpirun will listen on a certain IP/port.  It puts the
> IP/port into an environment variable (MPI_ENVIRONMENT, perhaps? I
> forget, it starts with MPI_* though), and then starts the worker
> processes.  The worker processes (actually, 1 "shepherd" process per
> node) will examine $MPI_ENVIRONMENT, and then using those details,
> connect back to the mpirun process.  This connection is then used to
> communicate job details, as well as stdin/out/err.
>
> The error indicates that this connection could not be made.  The main
> reasons are, either the $MPI_ENVIRONMENT variable hasn't been
> propagated properly, or some other process has already connected to
> the mpirun (the mpirun will stop listening once it receives the right
> number of connections), usually because some other MPI program has
> already connected (eg. if the MPI worker program is somehow run
> twice), or perhaps if there is a firewall or (TCP/IP) networking issue
> between the remote worker nodes and the node running mpirun.
>
> I hope that helps.
>
> Kev
>
> -- Dr Kevin Pulo kevin.pulo at anu.edu.au Academic Consultant / Systems 
> Programmer www.kev.pulo.com.au NCI NF / ANU SF +61 2 6125 7568
>
> On Thursday 28 April 2011 02:33 PM, pratik wrote:
>> Hi,
>> Also, can you please tell me how can i rebuild paraview with the 
>> *static* library of the plugin (i.e the .a file)? Although this is a 
>> very inelegant way to solve the problem, I just want the 
>> functionality of the TensorGlyph  plugin.
>>
>> pratik
>> On Thursday 28 April 2011 02:09 PM, pratik wrote:
>>> Hi Utkarsh,
>>> So...do you have a hunch what may be going on? I'm sorry if i have 
>>> been troubling you a lot, but this is really the last stage to get 
>>> PV working on the cluster:as i said before, the build with 
>>> BUILD_SHARED_LIB  off worked perfectly, but the one with that option 
>>> on did not....
>>> The  thing that bothers me is that it is definetly not something 
>>> wrong with sgi mpt, since one build of pvserver is working fine. 
>>> Having reached so far, it is driving me crazy that it is still not 
>>> able to work :(
>>>
>>> If you need any more information please do let me know. Once again 
>>> thanks for all the help.
>>>
>>> pratik
>>> On Wednesday 27 April 2011 08:35 PM, pratik wrote:
>>>> I think it is :
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>>> ldd 
>>>> /home/pratikm/source/ParaView/ParaView-3.10.1/BUILD/bin/pvserver 
>>>> |grep mp
>>>>     libmpi++abi1002.so => 
>>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002b61473a3000)
>>>>     libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so 
>>>> (0x00002b61474d0000)
>>>>     libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so 
>>>> (0x00002b6147854000)
>>>>     libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so 
>>>> (0x00002b614ed46000)
>>>>     libimf.so => /opt/intel/Compiler/11.1/038/lib/intel64/libimf.so 
>>>> (0x00002b6153a16000)
>>>>     libsvml.so => 
>>>> /opt/intel/Compiler/11.1/038/lib/intel64/libsvml.so 
>>>> (0x00002b6153d69000)
>>>>     libintlc.so.5 => 
>>>> /opt/intel/Compiler/11.1/038/lib/intel64/libintlc.so.5 
>>>> (0x00002b6153f80000)
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>>> ldd 
>>>> /home/pratikm/source/ParaView/ParaView-3.10.1/NEWBUILD/bin/pvserver 
>>>> |grep mp
>>>>     libmpi++abi1002.so => 
>>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002ac9ae446000)
>>>>     libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so 
>>>> (0x00002ac9ae573000)
>>>>     libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so 
>>>> (0x00002ac9ae8f7000)
>>>>     libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so 
>>>> (0x00002ac9aee0f000)
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin> 
>>>> ldd /home/pratikm/install/bin/pvserver |grep mp
>>>>     libmpi.so => /usr/lib64/libmpi.so (0x00002b0a9c9e3000)
>>>>
>>>> These are precisely the libraries i specified; the first one is the 
>>>> pvserver with shared libs enabled, second one with shared lib 
>>>> disabled, and the last one is the "installed" pvserver(installed 
>>>> version of pvserver with shared libs enabled)
>>>> Again, the last one is the "installed" pvserver; i am not quite 
>>>> sure why the path has changed, but i am 90% sure that 
>>>> /usr/lib64/libmpi.so refers to the same sgi mpi lib.
>>>>
>>>> pratik
>>>> On Wednesday 27 April 2011 07:26 PM, Utkarsh Ayachit wrote:
>>>>> Do a "pvserver --ldd", is it using the correct mpi libraries?
>>>>>
>>>>> Utkarsh
>>>>>
>>>>> On Wed, Apr 27, 2011 at 8:43 AM, pratik<pratik.mallya at gmail.com>  
>>>>> wrote:
>>>>>> Also, i tried to start the pvserver (with shared libraries 
>>>>>> enabled) on just
>>>>>> the head node:
>>>>>> pratikm at annapurna:~/install/bin>  /usr/bin/mpirun -v -np 2
>>>>>> /home/pratikm/install/bin/pvserver
>>>>>> MPI: libxmpi.so 'SGI MPT 1.23  03/28/09 11:45:59'
>>>>>> MPI: libmpi.so  'SGI MPT 1.23  03/28/09 11:43:39'
>>>>>>
>>>>>> and it just hangs there!
>>>>>>
>>>>>> pratik
>>>>>> On Wednesday 27 April 2011 06:05 PM, pratik wrote:
>>>>>>> oh! I'm sorry about that....
>>>>>>> the client stalls indefinitely, but the server will stop 
>>>>>>> executing. Since
>>>>>>> I am running pv using PBS, the output file of the mpirun gives 
>>>>>>> this:
>>>>>>> MPI: libxmpi.so 'SGI MPT 1.23  03/28/09 11:45:59'
>>>>>>> MPI: libmpi.so  'SGI MPT 1.23  03/28/09 11:43:39'
>>>>>>>     MPI Environmental Settings
>>>>>>> MPI: MPI_DSM_DISTRIBUTE (default: not set) : 1
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> MPI: MPI_COMM_WORLD rank 2 has terminated without calling 
>>>>>>> MPI_Finalize()
>>>>>>> MPI: aborting job
>>>>>>>
>>>>>>> Attached is the cmakecahe of my server if you want to look at it.
>>>>>>>
>>>>>>> pratik
>>>>>>> On Wednesday 27 April 2011 05:55 PM, Utkarsh Ayachit wrote:
>>>>>>>> You need to be more specific about the "something" that's going 
>>>>>>>> wrong
>>>>>>>> before anyone can provide any additional information.
>>>>>>>>
>>>>>>>> Utkarsh
>>>>>>>>
>>>>>>>> On Wed, Apr 27, 2011 at 3:26 AM, 
>>>>>>>> pratik<pratik.mallya at gmail.com>    wrote:
>>>>>>>>> Hi,
>>>>>>>>> I built 2 versions of pv on the sgi altix cluster here(sgi mpt
>>>>>>>>> mpi)...one
>>>>>>>>> with BUILD_SHARED_LIBS enabled and one without. Now, the 
>>>>>>>>> static pvserver
>>>>>>>>> functions properly (i am accessing thru laptop via the reverse
>>>>>>>>> connection
>>>>>>>>> method) BUT the one with shared_libs enabled does not! Can this
>>>>>>>>> behaviour be
>>>>>>>>> explained? (the second one fails to establish a 
>>>>>>>>> connection...something
>>>>>>>>> wrong
>>>>>>>>> with pvserver)
>>>>>>>>> I have EXACTLY the same cmakecache on both build EXCEPT the
>>>>>>>>> BUILD_SHARED_LIBS option.
>>>>>>>>> I know that there are many many things that could go wrong in 
>>>>>>>>> a cluster
>>>>>>>>> installation. So any hints/experience/hunch as to what is 
>>>>>>>>> going on is
>>>>>>>>> welcome.
>>>>>>>>>
>>>>>>>>> pratik
>>>>>>>>> _______________________________________________
>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>
>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>
>>>>>>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>>>>>>> http://paraview.org/Wiki/ParaView
>>>>>>>>>
>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20110428/caab2bcf/attachment-0001.htm>


More information about the ParaView mailing list