[Paraview] cluster: works with -np 1, crashes when runs in parallel :-(
Utkarsh Ayachit
utkarsh.ayachit at kitware.com
Fri Nov 4 12:33:36 EDT 2011
I don't think removing the assert or compiling in release is the
solution. The problem is the fact that
vtkPVClientServerSynchronizedRenderers::SlaveEndRender() is being
called on the satellite nodes.
In a debug build, can you try attaching a debugger and giving me the
call stack when it asserts? Also I'm interested in the state of
this->ParallelController when that assert happens. Can you check it's
value as well as it's class name (calling
this->ParallelController->GetClassName() ).
Thanks
Utkarsh
On Fri, Nov 4, 2011 at 11:53 AM, Stéphane Backaert
<stephanebackaert at gmail.com> wrote:
> Thanks for helping me and so quickly!
>
> I applied your patch and here is what I get:
>
> Waiting for client
> Connection URL: cs://localhost:11111
> Client connected.
> 0x1585fa0: 1/2
> 0x1585fa0: Controller: 0 (None)
> 0x1d98770: 0/2
> 0x1d98770: Controller: 0x1cb5d40 vtkSocketController
> pvserver: /home/ucl/tfl/sbackaer/build/ParaView_dev/ParaView/ParaViewCore/ClientServerCore/vtkPVClientServerSynchronizedRenderers.cxx:76: virtual void vtkPVClientServerSynchronizedRenderers::SlaveEndRender(): Assertion `this->ParallelController->IsA("vtkSocketController")' failed.
> [hmem07:32325] *** Process received signal ***
> [hmem07:32325] Signal: Aborted (6)
> [hmem07:32325] Signal code: (-6)
> pvserver: /home/ucl/tfl/sbackaer/build/ParaView_dev/ParaView/ParaViewCore/ClientServerCore/vtkPVClientServerSynchronizedRenderers.cxx:48: virtual void vtkPVClientServerSynchronizedRenderers::MasterEndRender(): Assertion `this->ParallelController->IsA("vtkSocketController")' failed.
> [hmem07:32324] *** Process received signal ***
> [hmem07:32324] Signal: Aborted (6)
> [hmem07:32324] Signal code: (-6)
>
>
> I also tried with another version: 3.12.0-RC3 (sources come from the download page).
> Compiled with Release flag, the server launched with -np 1 works. With -np 2, it does not crash when the client connects to it but the client does not respond at all (client version 3.12.0-RC3 on OSX Lion)... I am recompiling this version with Debug to see if I get the same error message on the server side...
>
> Stephane
>
>
>
>
> On 4 nov. 2011, at 15:12, Utkarsh Ayachit wrote:
>
>> Stephane,
>>
>> Sebastien was referring to a different issue and that problem is not
>> in 3.12-RC3, so we can ignore that response for the time being.
>>
>> I tried to reproduce the problem with the same version as you're using
>> (thanks for reporting the full version number, btw), and things seems
>> to work fine. I've attached a patch that should print out additional
>> debug information that will help us diagnose the problem. Do you mind
>> applying the patch to the server-side and posting the output printed
>> on pvserver?
>>
>> You should get something like follows:
>>
>> mpirun -np 2 ./bin/pvserver --use-offscreen-rendering
>> Waiting for client
>> Connection URL: cs://localhost:11111
>> Client connected.
>> 0x29be7c0: 0/2
>> 0x29be7c0: Controller: 0x28ec0f0 vtkSocketController
>> 0xf3a950: 1/2
>> 0xf3a950: Controller: 0 (None)
>> Exiting...
>> Exiting...
>>
>> Thanks
>> Utkarsh
>>
>> On Fri, Nov 4, 2011 at 9:01 AM, Sebastien Jourdain
>> <sebastien.jourdain at kitware.com> wrote:
>>> Hi Stephane,
>>>
>>> Thanks for reporting it. In fact, that bug has been fixed in
>>> ParaView/next and you can simply avoid it if you compile ParaView in
>>> Release mode not in Debug.The problem come from an invalid test in an
>>> assert().
>>> I will make sure that the fix will be in 3.12.
>>>
>>> Seb
>>>
>>> On Fri, Nov 4, 2011 at 5:43 AM, Stéphane Backaert
>>> <stephanebackaert at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I have installed paraview version 3.12.0-RC3-23-g712c45e on a cluster: compiled with intel compiler, openmpi-1.5.3, no hardware acc. so Mesa-7.9 and --use-offscreen-rendering flag at startup, MPI set to ON in ccmake.
>>>>
>>>> I launch the server with mpirun -np x ./pvserver --use-offscreen-rendering and, through an ssh connection, the client on my laptop (latest version too). That works fine :-) ... but slow :-(.
>>>> So I try to use more procs, for example, mpirun -np 2 ./pvserver --use-offscreen-rendering. There are two processes running on the cluster (I see them with 'ps x' command).
>>>>
>>>> My problem: when I connect my client to this server, the connection is etablished but the server crashed immediately with the message:
>>>>
>>>> Waiting for client
>>>> Connection URL: cs://localhost:11111
>>>> Client connected.
>>>> pvserver: /home/ucl/tfl/sbackaer/build/ParaView/ParaViewCore/ClientServerCore/vtkPVClientServerSynchronizedRenderers.cxx:76: virtual void vtkPVClientServerSynchronizedRenderers::SlaveEndRender(): Assertion `this->ParallelController->IsA("vtkSocketController")' failed.
>>>> [hmem00:08278] *** Process received signal ***
>>>> [hmem00:08278] Signal: Aborted (6)
>>>> [hmem00:08278] Signal code: (-6)
>>>> pvserver: /home/ucl/tfl/sbackaer/build/ParaView/ParaViewCore/ClientServerCore/vtkPVClientServerSynchronizedRenderers.cxx:48: virtual void vtkPVClientServerSynchronizedRenderers::MasterEndRender(): Assertion `this->ParallelController->IsA("vtkSocketController")' failed.
>>>> [hmem00:08277] *** Process received signal ***
>>>> [hmem00:08277] Signal: Aborted (6)
>>>> [hmem00:08277] Signal code: (-6)
>>>> hmem00:08278] [ 0] /lib64/libpthread.so.0() [0x3f8a40f4c0]
>>>> [hmem00:08278] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3f89c329a5]
>>>> [hmem00:08278] [ 2] /lib64/libc.so.6(abort+0x175) [0x3f89c34185]
>>>> [hmem00:08278] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3f89c2b935]
>>>> [hmem00:08278] [ 4] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkPVClientServerCore.so(_ZN38vtkPVClientServerSynchronizedRenderers14SlaveEndRenderEv+0x56) [0x7fa499c32d6e]
>>>> [hmem00:08278] [ 5] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(_ZN24vtkSynchronizedRenderers15HandleEndRenderEv+0xfe) [0x7fa493cc1e4c]
>>>> [hmem00:08278] [ 6] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(_ZN24vtkSynchronizedRenderers15HandleEndRenderEv+0x72) [0x7fa493cc1dc0]
>>>> [hmem00:08278] [ 7] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(+0x26c470) [0x7fa493cc4470]
>>>> [hmem00:08278] [ 8] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkCommon.so.pv3.12(+0x26ae71) [0x7fa48f102e71]
>>>> [hmem00:08278] [ 9] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkCommon.so.pv3.12(_ZN9vtkObject11InvokeEventEmPv+0x41) [0x7fa48f103381]
>>>> [hmem00:08278] [10] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN11vtkRenderer6RenderEv+0xdcb) [0x7fa492c18267]
>>>> [hmem00:08278] [11] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN21vtkRendererCollection6RenderEv+0xca) [0x7fa492c15fb4]
>>>> [hmem00:08278] [12] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN15vtkRenderWindow14DoStereoRenderEv+0xee) [0x7fa492c2c84c]
>>>> [hmem00:08278] [13] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN15vtkRenderWindow10DoFDRenderEv+0x54a) [0x7fa492c2c754]
>>>> [hmem00:08278] [14] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN15vtkRenderWindow10DoAARenderEv+0x7c3) [0x7fa492c2c1ff]
>>>> [hmem00:08278] [15] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkRendering.so.pv3.12(_ZN15vtkRenderWindow6RenderEv+0x868) [0x7fa492c2b7ca]
>>>> [hmem00:08278] [16] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkPVClientServerCore.so(_ZN30vtkPVSynchronizedRenderWindows6RenderEj+0x95) [0x7fa499c9caed]
>>>> [hmem00:08278] [17] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkPVClientServerCore.so(+0x1dab8e) [0x7fa499c9ab8e]
>>>> [hmem00:08278] [18] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(_ZN25vtkMultiProcessController10ProcessRMIEiPvii+0x3cb) [0x7fa493bc99bd]
>>>> [hmem00:08278] [19] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(_ZN25vtkMultiProcessController11ProcessRMIsEii+0x6a8) [0x7fa493bc957a]
>>>> [hmem00:08278] [20] /home/ucl/tfl/sbackaer/build/ParaView-bin/bin/libvtkParallel.so.pv3.12(_ZN25vtkMultiProcessController11ProcessRMIsEv+0x22) [0x7fa493bc8ed0]
>>>> [hmem00:08278] [21] ./bin/pvserver() [0x401a70]
>>>> [hmem00:08278] [22] ./bin/pvserver(main+0x25) [0x401aef]
>>>> [hmem00:08278] [23] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f89c1ec5d]
>>>> [hmem00:08278] [24] ./bin/pvserver() [0x4017c9]
>>>> [hmem00:08278] *** End of error message ***
>>>>
>>>>
>>>>
>>>> I tried with different configuration options in ccmake related to MPI, no changes. The cluster works for other MPI programs...
>>>>
>>>> Any idea?
>>>>
>>>> Thanks!
>>>>
>>>> Stephane
>>>>
>>>>
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.paraview.org/mailman/listinfo/paraview
>>>
>> <PrintDebugTxt.patch>
>
>
More information about the ParaView
mailing list