[Paraview] Failing to connect to an MPI server in certain cases

Sean Ziegeler seanzig at users.sourceforge.net
Thu Feb 21 18:31:08 EST 2008


Well, it seems to be working now.  My only guess is that it had 
something to do with the X server.  I compiled with regular OpenGL (not 
Mesa), and used the --use-offscreen-rendering switch, so I assume it was 
trying to use pbuffers or likewise that still require access to an X 
server.  (I did this on purpuse as I do actually want this capability 
eventually, but for now I don't have remote X access to each machine)

At first it was working intermittently.  When it did work it waited 
several seconds, then told me that remote rendering would be disabled 
(which is fine).

After that, I explicitly cleared the DISPLAY env var in the batch 
script.  Now it connects instantly (still gives the remote rendering 
disabled) and works every time.

Perhaps it was a comm. timeout.  While the server process was waiting 
for a non-responsive X-server, perhaps the client gave up?

Thanks,
Sean

Moreland, Kenneth wrote:
> Offhand I know of nothing that should cause your problem.  We use pbs to launch jobs on our vis clusters and it works fine.  We do have to use reverse connections because (1) outside computers cannot make connections to the cluster nodes and (2) we do not know where the server is going to be allocated anyway.
> 
> Do you have any information the pvserver job?  Do you have its output?  Is it exiting normally or crashing?  Is there any chance you could run it in a debugger?
> 
> -Ken
> 
>> -----Original Message-----
>> From: paraview-bounces+kmorel=sandia.gov at paraview.org [mailto:paraview-
>> bounces+kmorel=sandia.gov at paraview.org] On Behalf Of Sean Ziegeler
>> Sent: Thursday, February 21, 2008 1:38 PM
>> To: ParaView
>> Subject: [Paraview] Failing to connect to an MPI server in certain cases
>>
>> We use MPI across a grid of Linux x86_64 workstations.  I've compiled PV
>> 3.2.1 with OpenMPI, and it works fine if I use plain-old mpirun.
>> However, if I submit a job through GridEngine (to do load balancing for
>> everyone), it runs the server ok, but I can't connect to it.  I get the
>> following errors:
>>
>> ERROR: In
>> /home/ziegeler/paraview/src/3.2.1-
>> zig/Servers/Common/vtkServerConnection.cxx,
>> line 67
>> vtkServerConnection (0x159b0d0): Server Connection Closed!
>>
>> ERROR: In
>> /home/ziegeler/paraview/src/3.2.1-
>> zig/Servers/Common/vtkServerConnection.cxx,
>> line 351
>> vtkServerConnection (0x159b0d0): Server could failed to gather
>> information.
>>
>> Submitting a parallel job via a batch queue system can affect the
>> environment variables and such, but I would think pvserver would simply
>> fail to execute.  I'm looking in the code around where those errors
>> message occur, but I can't find anything obviously wrong.  Anyone have
>> any ideas?  Anyone know what those error messages tend to indicate other
>> than a general communication failure?
>>
>> Thanks,
>> Sean
>>
>> _______________________________________________
>> ParaView mailing list
>> ParaView at paraview.org
>> http://www.paraview.org/mailman/listinfo/paraview
> 
> 
> 



More information about the ParaView mailing list