[Paraview] strange behaviour with PV3.10 on sgi altix system
pratik
pratik.mallya at gmail.com
Thu Apr 28 07:56:28 EDT 2011
Hi,
I took the cmakecache from the static build and toggled the
build_share_lib to ON and compiled and it is working!
previously i had added some python bindings and other stuff too...
pratik
On Thursday 28 April 2011 02:38 PM, pratik wrote:
> Also, I wrote to a person who seemed to have the same problem with sgi
> mpt and this is what he wrote back (he did not use paraview, but ran a
> cluster of sgi altix systems). I don't know much of mpi so i'm still
> looking into this, but i thought this may be of some assistance to the
> paraview developers if they try to see why this problem is occurring:
> On Wed, Apr 27, 2011 at 02:53:49PM +1000, pratik for help wrote:
>
>
>> > The startup mechanism for SGI MPI jobs is quite complex and depends on
>> > the type of executable you are running. If you encounter errors such as
>> > ctrl_connect/connect: Connection refused
>> > or
>> > mpirun: MPT error (MPI_RM_sethosts): err=-1: could not run executable
>> > (case #3)
>> > contact us for an explanation.
>> >
>> > Can you please explain why such errors occur? I am running paraview on a
>> > sgi altix cluster and am getting the exact same error!
>>
> I worked in some depth on MPT while we had our Altix. Here are the
> details that I remember.
>
> During startup, mpirun will listen on a certain IP/port. It puts the
> IP/port into an environment variable (MPI_ENVIRONMENT, perhaps? I
> forget, it starts with MPI_* though), and then starts the worker
> processes. The worker processes (actually, 1 "shepherd" process per
> node) will examine $MPI_ENVIRONMENT, and then using those details,
> connect back to the mpirun process. This connection is then used to
> communicate job details, as well as stdin/out/err.
>
> The error indicates that this connection could not be made. The main
> reasons are, either the $MPI_ENVIRONMENT variable hasn't been
> propagated properly, or some other process has already connected to
> the mpirun (the mpirun will stop listening once it receives the right
> number of connections), usually because some other MPI program has
> already connected (eg. if the MPI worker program is somehow run
> twice), or perhaps if there is a firewall or (TCP/IP) networking issue
> between the remote worker nodes and the node running mpirun.
>
> I hope that helps.
>
> Kev
>
> -- Dr Kevin Pulo kevin.pulo at anu.edu.au Academic Consultant / Systems
> Programmer www.kev.pulo.com.au NCI NF / ANU SF +61 2 6125 7568
>
> On Thursday 28 April 2011 02:33 PM, pratik wrote:
>> Hi,
>> Also, can you please tell me how can i rebuild paraview with the
>> *static* library of the plugin (i.e the .a file)? Although this is a
>> very inelegant way to solve the problem, I just want the
>> functionality of the TensorGlyph plugin.
>>
>> pratik
>> On Thursday 28 April 2011 02:09 PM, pratik wrote:
>>> Hi Utkarsh,
>>> So...do you have a hunch what may be going on? I'm sorry if i have
>>> been troubling you a lot, but this is really the last stage to get
>>> PV working on the cluster:as i said before, the build with
>>> BUILD_SHARED_LIB off worked perfectly, but the one with that option
>>> on did not....
>>> The thing that bothers me is that it is definetly not something
>>> wrong with sgi mpt, since one build of pvserver is working fine.
>>> Having reached so far, it is driving me crazy that it is still not
>>> able to work :(
>>>
>>> If you need any more information please do let me know. Once again
>>> thanks for all the help.
>>>
>>> pratik
>>> On Wednesday 27 April 2011 08:35 PM, pratik wrote:
>>>> I think it is :
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin>
>>>> ldd
>>>> /home/pratikm/source/ParaView/ParaView-3.10.1/BUILD/bin/pvserver
>>>> |grep mp
>>>> libmpi++abi1002.so =>
>>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002b61473a3000)
>>>> libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so
>>>> (0x00002b61474d0000)
>>>> libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so
>>>> (0x00002b6147854000)
>>>> libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so
>>>> (0x00002b614ed46000)
>>>> libimf.so => /opt/intel/Compiler/11.1/038/lib/intel64/libimf.so
>>>> (0x00002b6153a16000)
>>>> libsvml.so =>
>>>> /opt/intel/Compiler/11.1/038/lib/intel64/libsvml.so
>>>> (0x00002b6153d69000)
>>>> libintlc.so.5 =>
>>>> /opt/intel/Compiler/11.1/038/lib/intel64/libintlc.so.5
>>>> (0x00002b6153f80000)
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin>
>>>> ldd
>>>> /home/pratikm/source/ParaView/ParaView-3.10.1/NEWBUILD/bin/pvserver
>>>> |grep mp
>>>> libmpi++abi1002.so =>
>>>> /opt/sgi/mpt/mpt-1.23/lib/libmpi++abi1002.so (0x00002ac9ae446000)
>>>> libmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libmpi.so
>>>> (0x00002ac9ae573000)
>>>> libsma.so => /opt/sgi/mpt/mpt-1.23/lib/libsma.so
>>>> (0x00002ac9ae8f7000)
>>>> libxmpi.so => /opt/sgi/mpt/mpt-1.23/lib/libxmpi.so
>>>> (0x00002ac9aee0f000)
>>>> pratikm at annapurna:~/source/ParaView/ParaView-3.10.1/NEWBUILD/bin>
>>>> ldd /home/pratikm/install/bin/pvserver |grep mp
>>>> libmpi.so => /usr/lib64/libmpi.so (0x00002b0a9c9e3000)
>>>>
>>>> These are precisely the libraries i specified; the first one is the
>>>> pvserver with shared libs enabled, second one with shared lib
>>>> disabled, and the last one is the "installed" pvserver(installed
>>>> version of pvserver with shared libs enabled)
>>>> Again, the last one is the "installed" pvserver; i am not quite
>>>> sure why the path has changed, but i am 90% sure that
>>>> /usr/lib64/libmpi.so refers to the same sgi mpi lib.
>>>>
>>>> pratik
>>>> On Wednesday 27 April 2011 07:26 PM, Utkarsh Ayachit wrote:
>>>>> Do a "pvserver --ldd", is it using the correct mpi libraries?
>>>>>
>>>>> Utkarsh
>>>>>
>>>>> On Wed, Apr 27, 2011 at 8:43 AM, pratik<pratik.mallya at gmail.com>
>>>>> wrote:
>>>>>> Also, i tried to start the pvserver (with shared libraries
>>>>>> enabled) on just
>>>>>> the head node:
>>>>>> pratikm at annapurna:~/install/bin> /usr/bin/mpirun -v -np 2
>>>>>> /home/pratikm/install/bin/pvserver
>>>>>> MPI: libxmpi.so 'SGI MPT 1.23 03/28/09 11:45:59'
>>>>>> MPI: libmpi.so 'SGI MPT 1.23 03/28/09 11:43:39'
>>>>>>
>>>>>> and it just hangs there!
>>>>>>
>>>>>> pratik
>>>>>> On Wednesday 27 April 2011 06:05 PM, pratik wrote:
>>>>>>> oh! I'm sorry about that....
>>>>>>> the client stalls indefinitely, but the server will stop
>>>>>>> executing. Since
>>>>>>> I am running pv using PBS, the output file of the mpirun gives
>>>>>>> this:
>>>>>>> MPI: libxmpi.so 'SGI MPT 1.23 03/28/09 11:45:59'
>>>>>>> MPI: libmpi.so 'SGI MPT 1.23 03/28/09 11:43:39'
>>>>>>> MPI Environmental Settings
>>>>>>> MPI: MPI_DSM_DISTRIBUTE (default: not set) : 1
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> ctrl_connect/connect: Connection refused
>>>>>>> MPI: MPI_COMM_WORLD rank 2 has terminated without calling
>>>>>>> MPI_Finalize()
>>>>>>> MPI: aborting job
>>>>>>>
>>>>>>> Attached is the cmakecahe of my server if you want to look at it.
>>>>>>>
>>>>>>> pratik
>>>>>>> On Wednesday 27 April 2011 05:55 PM, Utkarsh Ayachit wrote:
>>>>>>>> You need to be more specific about the "something" that's going
>>>>>>>> wrong
>>>>>>>> before anyone can provide any additional information.
>>>>>>>>
>>>>>>>> Utkarsh
>>>>>>>>
>>>>>>>> On Wed, Apr 27, 2011 at 3:26 AM,
>>>>>>>> pratik<pratik.mallya at gmail.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> I built 2 versions of pv on the sgi altix cluster here(sgi mpt
>>>>>>>>> mpi)...one
>>>>>>>>> with BUILD_SHARED_LIBS enabled and one without. Now, the
>>>>>>>>> static pvserver
>>>>>>>>> functions properly (i am accessing thru laptop via the reverse
>>>>>>>>> connection
>>>>>>>>> method) BUT the one with shared_libs enabled does not! Can this
>>>>>>>>> behaviour be
>>>>>>>>> explained? (the second one fails to establish a
>>>>>>>>> connection...something
>>>>>>>>> wrong
>>>>>>>>> with pvserver)
>>>>>>>>> I have EXACTLY the same cmakecache on both build EXCEPT the
>>>>>>>>> BUILD_SHARED_LIBS option.
>>>>>>>>> I know that there are many many things that could go wrong in
>>>>>>>>> a cluster
>>>>>>>>> installation. So any hints/experience/hunch as to what is
>>>>>>>>> going on is
>>>>>>>>> welcome.
>>>>>>>>>
>>>>>>>>> pratik
>>>>>>>>> _______________________________________________
>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>
>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>
>>>>>>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>>>>>>> http://paraview.org/Wiki/ParaView
>>>>>>>>>
>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20110428/caab2bcf/attachment-0001.htm>
More information about the ParaView
mailing list