[Paraview] 3.98 MPI_Finalize out of order in pvbatch
Burlen Loring
bloring at lbl.gov
Fri Dec 7 15:13:31 EST 2012
Hi Kyle et al.
below are stack traces where PV is hung. I'm stumped by this, and can
get no foothold. I still have one chance if we can get valgrind to run
with MPI on nautilus. But it's a long shot, valgrinding pvbatch on my
local system throws many hundreds of errors. I'm not sure which of these
are valid reports.
PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a
change in 3.98 that may account for the new hang?
Burlen
rank 0
#0 0x00002b0762b3f590 in gru_get_next_message () from
/usr/lib64/libgru.so.0
#1 0x00002b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780
#2 0x00002b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93
#3 MPI_SGI_progress () at progress.c:207
#4 0x00002b073a3244eb in MPI_SGI_request_finalize () at req.c:1548
#5 0x00002b073a2b8bee in MPI_SGI_finalize () at adi.c:667
#6 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#7 0x00002b073969d96f in vtkProcessModule::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#8 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#9 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2,
argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#10 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
rank 1
#0 0x00002b07391bde70 in __nanosleep_nocancel () from
/lib64/libpthread.so.0
#1 0x00002b073a32c898 in MPI_SGI_millisleep (milliseconds=<value
optimized out>) at sleep.c:34
#2 0x00002b073a326365 in MPI_SGI_slow_request_wait
(request=0x7fff061959f8, status=0x7fff061959d0, set=0x7fff061959f4,
gen_rc=0x7fff061959f0) at req.c:1460
#3 0x00002b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275
#4 0x00002b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671
#5 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#6 0x00002b073969d96f in vtkProcessModule::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#7 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#8 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2,
argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#9 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
On 12/04/2012 05:15 PM, Burlen Loring wrote:
> Hi Kyle,
>
> I was wrong about MPI_Finalize being invoked twice, I had miss read
> the code. I'm not sure why pvbatch is hanging in MPI_Finalize on
> Nautilus. I haven't been able to find anything in the debugger. This
> is new for 3.98.
>
> Burlen
>
> On 12/03/2012 07:36 AM, Kyle Lutz wrote:
>> Hi Burlen,
>>
>> On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring<bloring at lbl.gov> wrote:
>>> it looks like pvserver is also impacted, hanging after the gui
>>> disconnects.
>>>
>>>
>>> On 11/28/2012 12:53 PM, Burlen Loring wrote:
>>>> Hi All,
>>>>
>>>> some parallel tests have been failing for some time on Nautilus.
>>>> http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614
>>>>
>>>> There are MPI calls made after finalize which cause deadlock issues
>>>> on SGI
>>>> MPT. It affects pvbatch for sure. The following snip-it shows the
>>>> bug, and
>>>> bug report here: http://paraview.org/Bug/view.php?id=13690
>>>>
>>>>
>>>> //----------------------------------------------------------------------------
>>>>
>>>> bool vtkProcessModule::Finalize()
>>>> {
>>>>
>>>> ...
>>>>
>>>> vtkProcessModule::GlobalController->Finalize(1);<-------mpi_finalize
>>>> called here
>> This shouldn't be calling MPI_Finalize() as the finalizedExternally
>> argument is 1 and in vtkMPIController::Finalize():
>>
>> if (finalizedExternally == 0)
>> {
>> MPI_Finalize();
>> }
>>
>> So my guess is that it's being invoked elsewhere.
>>
>>>> ...
>>>>
>>>> #ifdef PARAVIEW_USE_MPI
>>>> if (vtkProcessModule::FinalizeMPI)
>>>> {
>>>> MPI_Barrier(MPI_COMM_WORLD);<-------------------------barrier
>>>> after
>>>> mpi_finalize
>>>> MPI_Finalize();<--------------------------------------second
>>>> mpi_finalize
>>>> }
>>>> #endif
>> I've made a patch which should prevent this second of code from ever
>> being called twice by setting the FinalizeMPI flag to false after
>> calling MPI_Finalize(). Can you take a look here:
>> http://review.source.kitware.com/#/t/1808/ and let me know if that
>> helps the issue.
>>
>> Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
>> get a backtrace of where it gets invoked for the second time? That
>> would be very helpful in tracking down the problem.
>>
>> Thanks,
>> Kyle
>
More information about the ParaView
mailing list