[Paraview] 4.1.0 release candidate build (UNCLASSIFIED)
Angelini, Richard C (Rick) CIV USARMY ARL (US)
richard.c.angelini.civ at mail.mil
Tue Dec 3 14:44:09 EST 2013
Classification: UNCLASSIFIED
Caveats: NONE
I've built 4.1.0 on a couple of our HPC systems and I'm getting a clean
build, but fails on execution of parallel servers. On both systems (an SGI
Altix/ICE and an IBM iDataPlex) I'm using gcc and openmpi and the same
exact build environment that I used to build 4.0.1. However, both systems
are failing with identical errors that begins with a "Leave Pinned" mpi
feature which is a flag set in our mpirun command environment and works with
4.0.1. Did something change behind that scenes in ParaView 4.1.0 that
impacts the build or runtime parameters?
orterun -x MODULE_VERSION_STACK -x MANPATH -x MPI_VER -x HOSTNAME -x
_MODULESBEGINENV_ -x PBS_ACCOUNT -x HOST -x SHELL -x TMPDIR -x PBS_JOBNAME
-x PBS_ENVIRONMENT -x PBS_O_WORKDIR -x NCPUS -x DAAC_HOME -x GROUP -x
PBS_TASKNUM -x USER -x LD_LIBRARY_PATH -x LS_COLORS -x PBS_O_HOME -x
COMPILER_VER -x HOSTTYPE -x PBS_MOMPORT -x PV_ROOT -x PBS_O_QUEUE -x NLSPATH
-x MODULE_VERSION -x MAIL -x PBS_O_LOGNAME -x PATH -x PBS_O_LANG -x
PBS_JOBCOOKIE -x F90 -x PWD -x _LMFILES_ -x PBS_NODENUM -x LANG -x
MODULEPATH -x LOADEDMODULES -x PBS_JOBDIR -x F77 -x PBS_O_SHELL -x PBS_JOBID
-x MPICC_F77 -x CXX -x ENVIRONMENT -x SHLVL -x HOME -x OSTYPE -x PBS_O_HOST
-x MPIHOME -x FC -x VENDOR -x MACHTYPE -x LOGNAME -x MPICC_CXX -x PBS_QUEUE
-x MPI_HOME -x MODULESHOME -x COMPILER -x LESSOPEN -x OMP_NUM_THREADS -x
PBS_O_MAIL -x CC -x PBS_O_SYSTEM -x MPICC_F90 -x G_BROKEN_FILENAMES -x
PBS_NODEFILE -x MPICC_CC -x PBS_O_PATH -x module -x } -x premode -x premod
-x PBS_HOME -x PBS_GET_IBWINS -x NUM_MPITASKS -np 3 -machinefile
new.1133.machines.txt --prefix
/usr/cta/unsupported/openmpi/gcc/4.4.0/openmpi-1.6.3 -mca orte_rsh_agent ssh
-mca mpi_paffinity_alone 1 -mca maffinity first_use -mca mpi_leave_pinned 1
-mca btl openib,self -mca orte_default_hostname new.1133.machines.txt
pvserver --use-offscreen-rendering --server-port=50481
--client-host=localhost --reverse-connection --timeout=15 --connect-id=30526
[pershing-n0221:01190] Warning: could not find environment variable "}"
--------------------------------------------------------------------------
A process attempted to use the "leave pinned" MPI feature, but no
memory registration hooks were found on the system at run time. This
may be the result of running on a system that does not support memory
hooks or having some other software subvert Open MPI's use of the
memory hooks. You can disable Open MPI's use of memory hooks by
setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
parameters to 0.
Open MPI will disable any transports that are attempting to use the
leave pinned functionality; your job may still run, but may fall back
to a slower network transport (such as TCP).
Mpool name: rdma
Process: [[43622,1],0]
Local host: xxx-n0221
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them). This
is most certainly not what you wanted. Check your cables, subnet
manager configuration, etc. The openib BTL will be ignored for this
job.
Local host: xxx-n0221
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[43622,1],2]) is on host: xxx-n0221
Process 2 ([[43622,1],0]) is on host: xxx-n0221
BTLs attempted: self
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[pershing-n0221:1198] *** An error occurred in MPI_Init
[pershing-n0221:1198] *** on a NULL communicator
[pershing-n0221:1198] *** Unknown error
[pershing-n0221:1198] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly. You should
double check that everything has shut down cleanly.
Reason: Before MPI_INIT completed
Local host: pershing-n0221
PID: 1198
--------------------------------------------------------------------------
--------------------------------------------------------------------------
orterun has exited due to process rank 2 with PID 1198 on
node pershing-n0221 exiting improperly. There are two reasons this could
occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by orterun (as reported here).
--------------------------------------------------------------------------
[pershing-n0221:01190] 2 more processes have sent help message
help-mpool-base.txt / leave pinned failed
[pershing-n0221:01190] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
[pershing-n0221:01190] 2 more processes have sent help message
help-mpi-btl-openib.txt / no active ports found
[pershing-n0221:01190] 2 more processes have sent help message
help-mca-bml-r2.txt / unreachable proc
[pershing-n0221:01190] 2 more processes have sent help message
help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
[pershing-n0221:01190] 2 more processes have sent help message
help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[pershing-n0221:01190] 2 more processes have sent help message
help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
________________________________
Rick Angelini
USArmy Research Laboratory
CISD/HPC Architectures Team
Building 120 Cube 315
Aberdeen Proving Ground, MD
Phone: 410-278-6266
Classification: UNCLASSIFIED
Caveats: NONE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5575 bytes
Desc: not available
URL: <http://www.paraview.org/pipermail/paraview/attachments/20131203/b3783cc6/attachment.bin>
More information about the ParaView
mailing list