[Paraview-developers] Define NDEBUG in VTK Submodule (WAS: 100% CPU usage once again)

Paul Kapinos kapinos at itc.rwth-aachen.de
Mon Oct 16 11:58:26 EDT 2017


Hi,

(a)
When adding -DCMAKE_BUILD_TYPE=Release to the cmake flags then the build get the
 -DNDEBUG flag instead of -DDEBUG, and VTK is build without debug information.

Hence: adding '-DCMAKE_BUILD_TYPE=Release' is solution/workaround for 'VTK is
build with debug' issue. Maybe you would like to add some defaults to CMAKE
files in ParaView (if building VTK without debug is what by default should happen).

(b)
On the original issue ("100% CPU usage") adding of this flag does not change
anything. Processes of non-debug build just spin on other MPI calls (cf.
'pstack' below, there are no debug informations anymore but we still can see
that there is a PMPI_Bcast() call from
vtkMPICommunicator::BroadcastVoidArray(void*, long long, int, int). It looks
like debug version of VTK just add some more MPI calls but did not change the
general behaviour of software - (busy) waiting on (blocking) MPI calls.

Well, we do not want to start the discussion about 'how to program MPI
applications'. I just cite Jeff  Squyres [8]:
> That being said, if an MPI application is waiting a long time for messages,
> perhaps its message passing algorithm should be re-designed to be a bit more
> efficient in terms of communication and computation overlap.
Yes busy waiting can be avoided at algorithmic level, cf. [12]

(3)
Meanwhile there was some progress on Open MPI:
- it has been told that the 'mpi_yield_when_idle' option *will really help* even
if top show 100%CPU,
> since the process is only yielding, the other running processes will get most
> of their time slices,
So setting this parameter is really recommendable,

- Gilles Gouaillardet developed a patch which solve the 100%-spinning issue for
latest Open MPI, cf. [13] [14], (tested for v3.0.0, patch need some patches to
apply on this version).

There is some uncertainness about this feature '-mca mpi_poll_when_idle true'
will flow into the mainstream Open MPI and if so it will not be prior 3.1.

Maybe you would like to update the WiKi with latest results.

Many thanks for your support
Paul Kapinos

[8] http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress
[12] https://stackoverflow.com/questions/14560714/probe-seems-to-consume-the-cpu
[13] https://www.mail-archive.com/devel@lists.open-mpi.org//msg20408.html
[14] https://www.mail-archive.com/devel@lists.open-mpi.org//msg20413.html


------------------------------------------------------------------------------
Thread 1 (Thread 0x7efe3499c780 (LWP 25519)):
#0  0x00007efe1af78599 in opal_progress () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/libopen-pal.so.13
#1  0x00007efe23d8b865 in ompi_request_default_wait () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/libmpi.so.12
#2  0x00007efe0b57fa20 in ompi_coll_tuned_bcast_intra_generic () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/openmpi/mca_coll_tuned.so
#3  0x00007efe0b57fef7 in ompi_coll_tuned_bcast_intra_binomial () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/openmpi/mca_coll_tuned.so
#4  0x00007efe0b57478c in ompi_coll_tuned_bcast_intra_dec_fixed () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/openmpi/mca_coll_tuned.so
#5  0x00007efe23d9fad0 in PMPI_Bcast () from
/opt/MPI/openmpi-1.10.4/linux/gcc_4.8.5/lib/libmpi.so.12
#6  0x00007efe30a86ee0 in vtkMPICommunicator::BroadcastVoidArray(void*, long
long, int, int) () from
/usr/local_rwth/sw/paraview/5.4.1-gcc_4.8.5-openmpi_1.10.4/lib/paraview-5.4/libvtkParallelMPI-pv5.4.so.1
#7  0x00007efe3084c933 in vtkMultiProcessController::BroadcastProcessRMIs(int,
int) () from
/usr/local_rwth/sw/paraview/5.4.1-gcc_4.8.5-openmpi_1.10.4/lib/paraview-5.4/libvtkParallelCore-pv5.4.so.1
#8  0x00007efe3084cfb5 in vtkMultiProcessController::ProcessRMIs(int, int) ()
from
/usr/local_rwth/sw/paraview/5.4.1-gcc_4.8.5-openmpi_1.10.4/lib/paraview-5.4/libvtkParallelCore-pv5.4.so.1
#9  0x0000000000401751 in main ()
------------------------------------------------------------------------------




On 10/11/2017 06:48 PM, Moreland, Kenneth wrote:
> What happens if you add -DCMAKE_BUILD_TYPE=Release to the cmake flags in your configure line?
> 
> -Ken
> 
> -----Original Message-----
> From: Paul Kapinos [mailto:kapinos at itc.rwth-aachen.de] 
> Sent: Wednesday, October 11, 2017 10:46 AM
> To: Moreland, Kenneth <kmorel at sandia.gov>; paraview-developers at paraview.org; Ayachit, Utkarsh (External Contacts) <utkarsh.ayachit at kitware.com>
> Subject: [EXTERNAL] Re: Define NDEBUG in VTK Submodule (WAS: 100% CPU usage once again)
> 
> We did an 'make VERBOSE=1' and what we see is for example (note the -DDEBUG flag!):
> ------------------------------------------------------------------------------
> cd
> /w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/VTK/ThirdParty/hdf5/vtkhdf5/hl/src
> && /usr/bin/gcc  -DDEBUG -DMPICH_IGNORE_CXX_SEEK -DVTK_IN_VTK -D_BSD_SOURCE
> -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE
> -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=199506L -Dhdf5_hl_EXPORTS
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/VTK/ThirdParty/hdf5/vtkhdf5
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/ParaView-v5.4.1/VTK/ThirdParty/hdf5/vtkhdf5/hl/src
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/ParaView-v5.4.1/VTK/ThirdParty/hdf5/vtkhdf5/src
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/VTK/ThirdParty/hdf5
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/ParaView-v5.4.1/VTK/ThirdParty/hdf5
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/VTK/ThirdParty/zlib
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/ParaView-v5.4.1/VTK/ThirdParty/zlib
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/bin
> -I/w0/tmp/pk224850/PARAVIEW_BUILD/5.4.1-gcc_4.8.5-intelmpi_5.1.3.181/VTK/ThirdParty/hdf5/vtkhdf5/src
>  -w -w -std=c99 -finline-functions -fno-common -w -Wextra -Wundef -Wshadow
> -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings
> -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes
> -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline
> -fmessage-length=0 -g -fPIC   -Wall   -o CMakeFiles/vtkhdf5_hl.dir/H5DS.c.o   -c
> /w0/tmp/pk224850/PARAVIEW_BUILD/ParaView-v5.4.1/VTK/ThirdParty/hdf5/vtkhdf5/hl/src/H5DS.c
> ------------------------------------------------------------------------------
> 
> Configure line was:
> ------------------------------------------------------------------------------
> cmake  -DPARAVIEW_USE_MPI=ON           \
>                      -DPARAVIEW_USE_VISITBRIDGE=ON               \
>                          -DVISIT_BUILD_READER_CGNS=OFF          \
>                          -DPARAVIEW_ENABLE_CGNS=OFF             \
>                      -DPARAVIEW_BUILD_QT_GUI=OFF     \
>                      -DVTK_USE_X=OFF                 \
>    -DOPENGL_INCLUDE_DIR=IGNORE                       \
>    -DOPENGL_xmesa_INCLUDE_DIR=IGNORE                            \
>    -DOPENGL_gl_LIBRARY=IGNORE                        \
>    -DOSMESA_INCLUDE_DIR=${INST}/include        \
>    -DOSMESA_LIBRARY=${INST}/lib/libOSMesa.so   \
>                      -DVTK_OPENGL_HAS_OSMESA=ON                 \
>                      -DVTK_USE_OFFSCREEN=OFF                    \
>                      -DCMAKE_INSTALL_PREFIX=${INST}     ../ParaView-v${VER}
> ${GCCB}  2>&1 | tee log.$START.01.cmake.txt
> ------------------------------------------------------------------------------
> 
> 
> 
> 
> On 10/11/2017 06:39 PM, Moreland, Kenneth wrote:
>> That's weird. I thought CMake was supposed to automatically add the -DNDEBUG flag whenever you compile anything under Release mode.
>>
>> At any rate, I'll hand that question off to Utkarsh or one of the ParaView team at Kitware. They would be better than me at answering build questions like this.
>>
>> -Ken
>>
>> -----Original Message-----
>> From: Paul Kapinos [mailto:kapinos at itc.rwth-aachen.de] 
>> Sent: Wednesday, October 11, 2017 10:19 AM
>> To: Moreland, Kenneth <kmorel at sandia.gov>; paraview-developers at paraview.org
>> Subject: [EXTERNAL] Re: [Paraview-developers] 100% CPU usage once again
>>
>> Kenneth, thank you for your message and the update of the wiki!
>>
>> Meanwhile I was able to reproduce the busy waiting with Open MPI (and other
>> MPIs) on a 10-line-example and asked the Open MPI developers about this:
>> https://www.mail-archive.com/devel@lists.open-mpi.org//msg20407.html
>>
>> The spinning processes stuck at MPI_Barrier call in line 40 of
>> 'ParaView-v5.4.1/VTK/Parallel/MPI/vtkMPICommunicator.cxx' file,
>>>     34 static inline void  vtkMPICommunicatorDebugBarrier(MPI_Comm* handle)
>>>     35 {
>>>     36   // If NDEBUG is defined, do nothing.
>>>     37 #ifdef NDEBUG
>>>     38   (void)handle; // to avoid warning about unused parameter
>>>     39 #else
>>>     40   MPI_Barrier(*handle);
>>>     41 #endif
>>>     42 }
>> It seem that ParaView's auto-build VTK version is build without NDEBUG define,
>> or with other words, *with* debug stuff.
>>
>> Q: Is there a way to tell the ParaView's configure to set the VTK's define
>> 'NDEBUG'? Or, more general, how-to tune VTK parameters? :o)
>>
>> Have a nice day,
>>
>> Paul Kapinos
>>
>>
>>
>>
>>
>> On 10/09/2017 11:27 PM, Moreland, Kenneth wrote:
>>> Paul,
>>>
>>> Thanks for the info. I updated the Wiki page with your information. Hopefully I captured everything.
>>>
>>> I'll let someone from Kitware respond about the possibility of changing the binaries that are distributed.
>>>
>>> -Ken
>>>
>>> -----Original Message-----
>>> From: Paraview-developers [mailto:paraview-developers-bounces at paraview.org] On Behalf Of Paul Kapinos
>>> Sent: Monday, October 9, 2017 1:08 AM
>>> To: paraview-developers at paraview.org
>>> Subject: [EXTERNAL] [Paraview-developers] 100% CPU usage once again
>>>
>>> (cross-post from 'paraview at paraview.org' 10/04/2017)
>>>
>>> Dear ParaView developer,
>>>
>>> in [1] you say
>>>> If you have information on disabling the busy wait using a different 
>>>> MPI implementation, please contribute back by documenting it here.
>>>
>>> Here we go.
>>>
>>> a)
>>> It is possible to build 'pvserver' using Intel MPI (and GCC compilers).
>>> By setting I_MPI_WAIT_MODE  environment variable to 'enable' value you can effectively prevent busy waiting, see [2] p.10 (Tested with Intel MPI 5.1(.3.181)).
>>>
>>> Q/FR.1 Could you please provide a version of 'pvserver' build using Intel MPI in official release of ParaView? [4]
>>>
>>>
>>>
>>> b)
>>> It is (obviously) possible to build 'pvserver' using MPICH (and GCC compilers).
>>> Itself, MPICH can be *configured* with '--with-device=ch3:sock' option. This is described to be 'slower' as '--with-device=ch3:nemesis'
>>> For our experiments it turned out, that 'pvserver' compiled and linked using MPICH with 'ch3:sock' configure option did not have the busy waiting aka 100%-CPU behaviour.
>>> Note that this is *configure-time of MPI library* parameter.
>>>
>>>
>>> (Tested with MPICH  3.2)
>>>
>>> Q/FR.2 Could you please provide a version of 'pvserver' build using MPICH *configured with '--with-device=ch3:sock' option* in official release of ParaView? [4] This binary will be very likely be even compatible with 'standard' MPICH installations; we're able to start it even with IntelMPI's 'mpiexec' with success and no busy waiting. However YES you will need to have a MPICH release be build
>>>
>>>
>>>
>>>
>>> c)
>>> It is possible to build 'pvserver' using Open MPI (and GCC compilers).
>>> In [1] you document how-to 'turn off' the busy waiting behaviour in Open MPI (cf. [5],[6]).
>>> Unfortunately this *did not work* in our environment (InfiniBand QDR and Intel OmniPath clusters, OpenMPI 1.10.4).
>>> Note that likely we're not alone, cf. [7].
>>> Note that the switch likely just move the spinning location from MPI library itself to the fabrics library (cf. the screenshots, without 'mpi_yield_when_idle 1' (=default) the 'pvserver' processes spin with 'green' aka user 100%, and with 'mpi_yield_when_idle 1' the processes stays spinning but now with a lot of 'red'
>>> aka kernel time portion.
>>>
>>> Conclusion: the way to disable busy waiting for Open MPI which is documented in [1], is not useful for us. We do not know this is a speciality of our site or a general issue; we think some survey at this point could be useful.
>>>
>>> Q/FR.3 In case you should want to provide precompiled versions ov 'pvserver' for Open MPI, remember that there are ABI changes in major version changes. So you would likely need to compile+link *three* versions of 'pvserver' with Open MPI 1.10.x (still default in Linux), 2.x (current), 3.x (new).
>>>
>>>
>>>
>>> d) In [10] there is a phrase about how-to disable busy waiting on yet another two MPI implementaions,
>>>> ... IBM MPI has the MP_WAIT_MODE and
>>>> the SUPER-UX MPI/XS library has MPISUSPEND to choose the waiting mode, 
>>>> either polling or sleeping.
>>>
>>> Somebody with access to there MPIs could evaluate this, maybe.
>>>
>>> Have a nice day,
>>>
>>> Paul Kapinos
>>>
>>>
>>>
>>> [1]
>>> https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage
>>>
>>> [2]
>>> https://software.intel.com/sites/products/Whitepaper/Clustertools/amplxe_inspxe_interop_with_mpi.pdf
>>>   (cf. p.10)
>>>
>>> [3]
>>> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_Why_does_my_MPI_program_run_much_slower_when_I_use_more_processes.3F
>>>
>>> [4] https://www.paraview.org/download/
>>>
>>> [5] http://www.open-mpi.org/faq/?category=running#oversubscribing
>>>
>>> [6] http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded
>>>
>>> [7] https://www.paraview.org/pipermail/paraview/2008-December/010349.html
>>>
>>> [8] http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress
>>>
>>> [9] https://www.open-mpi.org/community/lists/users/2010/10/14505.php
>>>
>>> [10] http://comp.parallel.mpi.narkive.com/3oXMDXno/non-busy-waiting-barrier-in-mpi
>>>
>>> [11] http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress
>>>
>>> [12] https://stackoverflow.com/questions/14560714/probe-seems-to-consume-the-cpu
>>>
>>>
>>
>>
> 
> 


-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4891 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://public.kitware.com/pipermail/paraview-developers/attachments/20171016/266a7a28/attachment.bin>


More information about the Paraview-developers mailing list