[Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server Connection Closed! / Server failed to gather information./cslog

Utkarsh Ayachit utkarsh.ayachit at kitware.com
Wed Mar 17 10:26:05 EDT 2010


Posting back to the mailing list to see if anyone else has any idea
what's going on.

On Tue, Mar 16, 2010 at 12:08 PM, SCHROEDER, Martin
<Martin.SCHROEDER at mtu.de> wrote:
> Hm, i don't think it's broken, because wit works under some circumstances (on one only host, on multiple hots with pvserver c/s-streamlogging turned on..
>
> With 2 processes on two hosts and valgrind attached, it works.
> even  i get some messages when conenction the client to the server
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2.
> ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2
>
> Aftre correcting the tiles settings for pvserver it works.
>
>
> With 8 processes, 2 on each of 4 hosts it crashes like described before.
>
>
> the mpirun debug output for the last try was:
>
> cp003158:17549] procdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/0/1
> [cp003159:32354] procdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/0/2
> [cp003158:17549] jobdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/0
> [cp003158:17549] top: openmpi-sessions-ya06894 at cp003158_0
> [cp003158:17549] tmp: /tmp
> [cp003159:32354] jobdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/0
> [cp003159:32354] top: openmpi-sessions-ya06894 at cp003159_0
> [cp003159:32354] tmp: /tmp
> [cp003162:31564] procdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/0/3
> [cp003163:31530] procdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/0/4
> [cp003163:31530] jobdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/0
> [cp003163:31530] top: openmpi-sessions-ya06894 at cp003163_0
> [cp003163:31530] tmp: /tmp
> [cp002860:20714] [[2374,0],0] node[0].name cp002860 daemon 0 arch ffc91200
> [cp002860:20714] [[2374,0],0] node[1].name cp003158 daemon 1 arch ffc91200
> [cp002860:20714] [[2374,0],0] node[2].name cp003159 daemon 2 arch ffc91200
> [cp002860:20714] [[2374,0],0] node[3].name cp003162 daemon 3 arch ffc91200
> [cp002860:20714] [[2374,0],0] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003158:17549] [[2374,0],1] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003159:32354] [[2374,0],2] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003158:17549] [[2374,0],1] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003158:17549] [[2374,0],1] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003158:17549] [[2374,0],1] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003158:17549] [[2374,0],1] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003159:32354] [[2374,0],2] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003159:32354] [[2374,0],2] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003159:32354] [[2374,0],2] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003159:32354] [[2374,0],2] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003162:31564] jobdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/0
> [cp003162:31564] top: openmpi-sessions-ya06894 at cp003162_0
> [cp003162:31564] tmp: /tmp
> [cp003162:31564] [[2374,0],3] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003162:31564] [[2374,0],3] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003162:31564] [[2374,0],3] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003162:31564] [[2374,0],3] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003162:31564] [[2374,0],3] node[4].name cp003163 daemon 4 arch ffc91200
> [cp002860:20714] Info: Setting up debugger process table for applications
>  MPIR_being_debugged = 0
>  MPIR_debug_state = 1
>  MPIR_partial_attach_ok = 1
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size = 8
>  MPIR_proctable:
>    (i, host, exe, pid) = (0, cp003158, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 17550)
>    (i, host, exe, pid) = (1, cp003158, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 17551)
>    (i, host, exe, pid) = (2, cp003159, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 32355)
>    (i, host, exe, pid) = (3, cp003159, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 32356)
>    (i, host, exe, pid) = (4, cp003162, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31565)
>    (i, host, exe, pid) = (5, cp003162, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31566)
>    (i, host, exe, pid) = (6, cp003163, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31531)
>    (i, host, exe, pid) = (7, cp003163, /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31532)
> [cp003163:31530] [[2374,0],4] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003163:31530] [[2374,0],4] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003163:31530] [[2374,0],4] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003163:31530] [[2374,0],4] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003163:31530] [[2374,0],4] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003158:17562] procdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/1/1
> [cp003158:17562] jobdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/1
> [cp003158:17562] top: openmpi-sessions-ya06894 at cp003158_0
> [cp003158:17562] tmp: /tmp
> [cp003158:17563] procdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/1/0
> [cp003158:17563] jobdir: /tmp/openmpi-sessions-ya06894 at cp003158_0/2374/1
> [cp003158:17563] top: openmpi-sessions-ya06894 at cp003158_0
> [cp003158:17563] tmp: /tmp
> [cp003158:17562] [[2374,1],1] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003158:17562] [[2374,1],1] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003158:17562] [[2374,1],1] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003158:17562] [[2374,1],1] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003158:17562] [[2374,1],1] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003158:17563] [[2374,1],0] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003158:17563] [[2374,1],0] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003158:17563] [[2374,1],0] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003158:17563] [[2374,1],0] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003158:17563] [[2374,1],0] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003159:32370] procdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/1/3
> [cp003159:32370] jobdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/1
> [cp003159:32370] top: openmpi-sessions-ya06894 at cp003159_0
> [cp003159:32370] tmp: /tmp
> [cp003159:32370] [[2374,1],3] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003159:32370] [[2374,1],3] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003159:32370] [[2374,1],3] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003159:32370] [[2374,1],3] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003159:32370] [[2374,1],3] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003159:32369] procdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/1/2
> [cp003159:32369] jobdir: /tmp/openmpi-sessions-ya06894 at cp003159_0/2374/1
> [cp003159:32369] top: openmpi-sessions-ya06894 at cp003159_0
> [cp003159:32369] tmp: /tmp
> [cp003159:32369] [[2374,1],2] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003159:32369] [[2374,1],2] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003159:32369] [[2374,1],2] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003159:32369] [[2374,1],2] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003159:32369] [[2374,1],2] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003162:31580] procdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/1/4
> [cp003162:31579] procdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/1/5
> [cp003162:31579] jobdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/1
> [cp003162:31579] top: openmpi-sessions-ya06894 at cp003162_0
> [cp003162:31579] tmp: /tmp
> [cp003162:31580] jobdir: /tmp/openmpi-sessions-ya06894 at cp003162_0/2374/1
> [cp003162:31580] top: openmpi-sessions-ya06894 at cp003162_0
> [cp003162:31580] tmp: /tmp
> [cp003162:31579] [[2374,1],5] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003162:31579] [[2374,1],5] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003162:31579] [[2374,1],5] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003162:31579] [[2374,1],5] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003162:31579] [[2374,1],5] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003162:31580] [[2374,1],4] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003162:31580] [[2374,1],4] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003162:31580] [[2374,1],4] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003162:31580] [[2374,1],4] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003162:31580] [[2374,1],4] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003163:31545] procdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/1/6
> [cp003163:31546] procdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/1/7
> [cp003163:31546] jobdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/1
> [cp003163:31546] top: openmpi-sessions-ya06894 at cp003163_0
> [cp003163:31546] tmp: /tmp
> [cp003163:31545] jobdir: /tmp/openmpi-sessions-ya06894 at cp003163_0/2374/1
> [cp003163:31545] top: openmpi-sessions-ya06894 at cp003163_0
> [cp003163:31545] tmp: /tmp
> [cp003163:31545] [[2374,1],6] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003163:31545] [[2374,1],6] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003163:31545] [[2374,1],6] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003163:31545] [[2374,1],6] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003163:31545] [[2374,1],6] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003163:31546] [[2374,1],7] node[0].name cp002860 daemon 0 arch ffc91200
> [cp003163:31546] [[2374,1],7] node[1].name cp003158 daemon 1 arch ffc91200
> [cp003163:31546] [[2374,1],7] node[2].name cp003159 daemon 2 arch ffc91200
> [cp003163:31546] [[2374,1],7] node[3].name cp003162 daemon 3 arch ffc91200
> [cp003163:31546] [[2374,1],7] node[4].name cp003163 daemon 4 arch ffc91200
> [cp003163:31530] sess_dir_finalize: proc session dir not empty - leaving
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 6 with PID 31531 on
> node cp003163 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [cp003163:31530] sess_dir_finalize: job session dir not empty - leaving
> [cp003162:31564] sess_dir_finalize: proc session dir not empty - leaving
> [cp003159:32354] sess_dir_finalize: job session dir not empty - leaving
> [cp003158:17549] sess_dir_finalize: job session dir not empty - leaving
> [cp002860:20714] sess_dir_finalize: job session dir not empty - leaving
> [cp002860:20714] sess_dir_finalize: proc session dir not empty - leaving
> orterun: exiting with status 1
>
> martin
>
> -----Ursprüngliche Nachricht-----
> Von: Utkarsh Ayachit [mailto:utkarsh.ayachit at kitware.com]
> Gesendet: Dienstag, 16. März 2010 15:00
> An: SCHROEDER, Martin
> Cc: ParaView
> Betreff: Re: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server Connection Closed! / Server failed to gather information./cslog
>
> I am not sure why that could be the case. The only thing that happens on setting cslog is that each the server process starts writing out an output log file. Also I am not sure why mpi would hang on attaching a debugger. Try debugging by just running 2 processes. Is it possible you have a  broken MPI?
>
> Utkarsh
>
>
>
> On Tue, Mar 16, 2010 at 9:54 AM, SCHROEDER, Martin <Martin.SCHROEDER at mtu.de> wrote:
>> hm debugging seems more difficult than i thought. mpirun ssem to hang when the debugging opeion is set.
>> i also wonder why this "connection reset by peer" problem doesn't occur when the option "--cslog=somefile" is set...
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: SCHROEDER, Martin
>> Gesendet: Montag, 15. März 2010 14:33
>> An: 'Utkarsh Ayachit'
>> Betreff: AW: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server
>> Connection Closed! / Server failed to gather information./cslog
>>
>> Yes it is possible. I'will try to and send you the output.
>> Meanwhile, mpirun sometimes brought back this message:
>>
>> btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv
>> failed: Connection reset by peer (104)
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Utkarsh Ayachit [mailto:utkarsh.ayachit at kitware.com]
>> Gesendet: Freitag, 12. März 2010 15:41
>> An: SCHROEDER, Martin
>> Cc: paraview at paraview.org
>> Betreff: Re: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server
>> Connection Closed! / Server failed to gather information./cslog
>>
>> Is it possible to attach a debugger to the server processes and see where it crashes?
>>
>> On Fri, Mar 12, 2010 at 7:03 AM, SCHROEDER, Martin <Martin.SCHROEDER at mtu.de> wrote:
>>> Hello
>>> when I'm trying to run paraview (pvserver) on a single host using
>>> mpirun with 4 -8 processes, it works.
>>> The problem is :
>>> when i'm trying to spread pvserver over multiple hosts, using mpirun
>>> and a hostfile, the server processes and the client crash when I
>>> connect the client to the server.
>>>
>>> Im'getting these messages in the client's shell:
>>>
>>> ERROR: In
>>> /yatest/cae/src/Paraview3.6.2/ParaView3/Servers/Common/vtkServerConne
>>> c
>>> tion.cxx,
>>> line 67
>>> vtkServerConnection (0x1140c30): Server Connection Closed!
>>>
>>> ERROR: In
>>> /yatest/cae/src/Paraview3.6.2/ParaView3/Servers/Common/vtkServerConne
>>> c
>>> tion.cxx,
>>> line 345
>>> vtkServerConnection (0x1140c30): Server failed to gather information.
>>>
>>> If I use the option cslog=/home/.../cstream.log when executing
>>> pvserver, it works slowly, but it works on two hosts with 4 processes on each host.
>>>
>>> Paraview Client and Server are the same Version 3.6.2 Open MPI is
>>> 1.4.1
>>>
>>> Has anyone experienced the same ?
>>> Any hint would be great.
>>>
>>> Mit freundlichen Gruessen / Best regards
>>>
>>> Martin Schröder, FIEA
>>> MTU Aero Engines GmbH
>>> Engineering Systems (CAE)
>>> Dachauer Str. 665
>>> 80995 Muenchen
>>> Germany
>>>
>>> Tel  +49 (0)89  14 89 57 20
>>> Fax  +49 (0)89  14 89-96 89 4
>>> mailto:martin.schroeder at mtu.de
>>> http://www.mtu.de
>>>
>>>
>>>
>>> --
>>> MTU Aero Engines GmbH
>>> Geschaeftsfuehrung/Board of Management: Egon W. Behle, Vorsitzender/CEO; Dr.
>>> Rainer Martens, Dr. Stefan Weingartner, Reiner Winkler Vorsitzender
>>> des Aufsichtsrats/Chairman of the Supervisory Board: Klaus Eberhardt
>>> Sitz der Gesellschaft/Registered Office: Muenchen
>>> Handelsregister/Commercial Register: Muenchen HRB 154230
>>>
>>> Diese E-Mail sowie ihre Anhänge enthalten MTU-eigene vertrauliche
>>> oder rechtlich geschützte Informationen.
>>> Wenn Sie nicht der beabsichtigte Empfänger sind, informieren Sie
>>> bitte den Absender und löschen Sie diese E-Mail sowie die Anhänge.
>>> Das unbefugte Speichern, Kopieren oder Weiterleiten ist nicht gestattet.
>>>
>>> This e-mail and any attached documents are proprietary to MTU,
>>> confidential or protected by law.
>>> If you are not the intended recipient, please advise the sender and
>>> delete this message and its attachments.
>>> Any unauthorised storing, copying or distribution is prohibited.
>>>
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at:
>>> http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>
>> --
>> MTU Aero Engines GmbH
>> Geschaeftsfuehrung/Board of Management: Egon W. Behle,
>> Vorsitzender/CEO; Dr. Rainer Martens, Dr. Stefan Weingartner, Reiner
>> Winkler Vorsitzender des Aufsichtsrats/Chairman of the Supervisory
>> Board: Klaus Eberhardt Sitz der Gesellschaft/Registered Office:
>> Muenchen Handelsregister/Commercial Register: Muenchen HRB 154230
>>
>> Diese E-Mail sowie ihre Anhaenge enthalten MTU-eigene vertrauliche oder rechtlich geschuetzte Informationen.
>> Wenn Sie nicht der beabsichtigte Empfaenger sind, informieren Sie
>> bitte den Absender und loeschen Sie diese E-Mail sowie die Anhaenge. Das unbefugte Speichern, Kopieren oder Weiterleiten ist nicht gestattet.
>>
>> This e-mail and any attached documents are proprietary to MTU, confidential or protected by law.
>> If you are not the intended recipient, please advise the sender and delete this message and its attachments.
>> Any unauthorised storing, copying or distribution is prohibited.
>>
>>
> --
> MTU Aero Engines GmbH
> Geschaeftsfuehrung/Board of Management: Egon W. Behle, Vorsitzender/CEO; Dr. Rainer Martens, Dr. Stefan Weingartner, Reiner Winkler
> Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Klaus Eberhardt
> Sitz der Gesellschaft/Registered Office: Muenchen
> Handelsregister/Commercial Register: Muenchen HRB 154230
>
> Diese E-Mail sowie ihre Anhaenge enthalten MTU-eigene vertrauliche oder rechtlich geschuetzte Informationen.
> Wenn Sie nicht der beabsichtigte Empfaenger sind, informieren Sie bitte den Absender und loeschen Sie diese E-Mail
> sowie die Anhaenge. Das unbefugte Speichern, Kopieren oder Weiterleiten ist nicht gestattet.
>
> This e-mail and any attached documents are proprietary to MTU, confidential or protected by law.
> If you are not the intended recipient, please advise the sender and delete this message and its attachments.
> Any unauthorised storing, copying or distribution is prohibited.
>
>


More information about the ParaView mailing list