[Paraview] 4.1.0 release candidate build (UNCLASSIFIED)
Angelini, Richard C (Rick) CIV USARMY ARL (US)
richard.c.angelini.civ at mail.mil
Wed Dec 4 11:14:02 EST 2013
Classification: UNCLASSIFIED
Caveats: NONE
I'm seeing those errors with both client-server parallel pvserver and with
pvbatch. I'm going to throw this over to our systems people to see if
they have any ideas. But, I'm suspicious that is a ParaView thing since it
happens on two different machines, two different compute platforms.
________________________________
Rick Angelini
USArmy Research Laboratory
CISD/HPC Architectures Team
Building 120 Cube 315
Aberdeen Proving Ground, MD
Phone: 410-278-6266
-----Original Message-----
From: paraview-bounces at paraview.org [mailto:paraview-bounces at paraview.org]
On Behalf Of Utkarsh Ayachit
Sent: Tuesday, December 03, 2013 10:07 PM
To: Angelini, Richard C (Rick) CIV USARMY ARL (US)
Cc: paraview at paraview.org
Subject: Re: [Paraview] 4.1.0 release candidate build (UNCLASSIFIED)
I can't think of anything in particular that changed that could affect this.
Are you trying this with pvserver? Can you try pvbatch? Same problem?
On Tue, Dec 3, 2013 at 12:44 PM, Angelini, Richard C (Rick) CIV USARMY ARL
(US) <richard.c.angelini.civ at mail.mil> wrote:
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> I've built 4.1.0 on a couple of our HPC systems and I'm getting a clean
> build, but fails on execution of parallel servers. On both systems (an
SGI
> Altix/ICE and an IBM iDataPlex) I'm using gcc and openmpi and the same
> exact build environment that I used to build 4.0.1. However, both
systems
> are failing with identical errors that begins with a "Leave Pinned"
> mpi feature which is a flag set in our mpirun command environment and
> works with 4.0.1. Did something change behind that scenes in ParaView
> 4.1.0 that impacts the build or runtime parameters?
>
>
>
> orterun -x MODULE_VERSION_STACK -x MANPATH -x MPI_VER -x HOSTNAME -x
> _MODULESBEGINENV_ -x PBS_ACCOUNT -x HOST -x SHELL -x TMPDIR -x
> PBS_JOBNAME -x PBS_ENVIRONMENT -x PBS_O_WORKDIR -x NCPUS -x DAAC_HOME
> -x GROUP -x PBS_TASKNUM -x USER -x LD_LIBRARY_PATH -x LS_COLORS -x
> PBS_O_HOME -x COMPILER_VER -x HOSTTYPE -x PBS_MOMPORT -x PV_ROOT -x
> PBS_O_QUEUE -x NLSPATH -x MODULE_VERSION -x MAIL -x PBS_O_LOGNAME -x
> PATH -x PBS_O_LANG -x PBS_JOBCOOKIE -x F90 -x PWD -x _LMFILES_ -x
> PBS_NODENUM -x LANG -x MODULEPATH -x LOADEDMODULES -x PBS_JOBDIR -x
> F77 -x PBS_O_SHELL -x PBS_JOBID -x MPICC_F77 -x CXX -x ENVIRONMENT -x
> SHLVL -x HOME -x OSTYPE -x PBS_O_HOST -x MPIHOME -x FC -x VENDOR -x
> MACHTYPE -x LOGNAME -x MPICC_CXX -x PBS_QUEUE -x MPI_HOME -x
> MODULESHOME -x COMPILER -x LESSOPEN -x OMP_NUM_THREADS -x PBS_O_MAIL
> -x CC -x PBS_O_SYSTEM -x MPICC_F90 -x G_BROKEN_FILENAMES -x
> PBS_NODEFILE -x MPICC_CC -x PBS_O_PATH -x module -x } -x premode -x
> premod -x PBS_HOME -x PBS_GET_IBWINS -x NUM_MPITASKS -np 3
> -machinefile new.1133.machines.txt --prefix
> /usr/cta/unsupported/openmpi/gcc/4.4.0/openmpi-1.6.3 -mca
> orte_rsh_agent ssh -mca mpi_paffinity_alone 1 -mca maffinity first_use
> -mca mpi_leave_pinned 1 -mca btl openib,self -mca
> orte_default_hostname new.1133.machines.txt pvserver
> --use-offscreen-rendering --server-port=50481 --client-host=localhost
> --reverse-connection --timeout=15 --connect-id=30526
[pershing-n0221:01190] Warning: could not find environment variable "}"
> ----------------------------------------------------------------------
> ---- A process attempted to use the "leave pinned" MPI feature, but no
> memory registration hooks were found on the system at run time. This
> may be the result of running on a system that does not support memory
> hooks or having some other software subvert Open MPI's use of the
> memory hooks. You can disable Open MPI's use of memory hooks by
> setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
> parameters to 0.
>
> Open MPI will disable any transports that are attempting to use the
> leave pinned functionality; your job may still run, but may fall back
> to a slower network transport (such as TCP).
>
> Mpool name: rdma
> Process: [[43622,1],0]
> Local host: xxx-n0221
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ----
> WARNING: There is at least one OpenFabrics device found but there are
> no active ports detected (or Open MPI was unable to use them). This
> is most certainly not what you wanted. Check your cables, subnet
> manager configuration, etc. The openib BTL will be ignored for this
> job.
>
> Local host: xxx-n0221
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- At least one pair of MPI processes are unable to reach each other
> for MPI communications. This means that no Open MPI device has
> indicated that it can be used to communicate between these processes.
> This is an error; Open MPI requires that all MPI processes be able to
> reach each other. This error can sometimes be the result of
> forgetting to specify the "self" BTL.
>
> Process 1 ([[43622,1],2]) is on host: xxx-n0221
> Process 2 ([[43622,1],0]) is on host: xxx-n0221
> BTLs attempted: self
>
> Your MPI job is now going to abort; sorry.
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- MPI_INIT has failed because at least one MPI process is
> unreachable from another. This *usually* means that an underlying
> communication plugin -- such as a BTL or an MTL -- has either not
> loaded or not allowed itself to be used. Your MPI job will now abort.
>
> You may wish to try to narrow down the problem;
>
> * Check the output of ompi_info to see which BTL/MTL plugins are
> available.
> * Run your application with MPI_THREAD_SINGLE.
> * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
> if using MTL-based communications) to see exactly which
> communication plugins were considered and/or discarded.
> ----------------------------------------------------------------------
> ---- [pershing-n0221:1198] *** An error occurred in MPI_Init
> [pershing-n0221:1198] *** on a NULL communicator [pershing-n0221:1198]
> *** Unknown error [pershing-n0221:1198] *** MPI_ERRORS_ARE_FATAL: your
> MPI job will now abort
> ----------------------------------------------------------------------
> ---- An MPI process is aborting at a time when it cannot guarantee
> that all of its peer processes in the job will be killed properly.
> You should double check that everything has shut down cleanly.
>
> Reason: Before MPI_INIT completed
> Local host: pershing-n0221
> PID: 1198
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- orterun has exited due to process rank 2 with PID 1198 on node
> pershing-n0221 exiting improperly. There are two reasons this could
> occur:
>
> 1. this process did not call "init" before exiting, but others in the
> job did. This can cause a job to hang indefinitely while it waits for
> all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by orterun (as reported here).
> ----------------------------------------------------------------------
> ---- [pershing-n0221:01190] 2 more processes have sent help message
> help-mpool-base.txt / leave pinned failed [pershing-n0221:01190] Set
> MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
> messages [pershing-n0221:01190] 2 more processes have sent help
> message help-mpi-btl-openib.txt / no active ports found
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mca-bml-r2.txt / unreachable proc [pershing-n0221:01190] 2 more
> processes have sent help message help-mpi-runtime /
> mpi_init:startup:pml-add-procs-fail
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
>
> ________________________________
> Rick Angelini
> USArmy Research Laboratory
> CISD/HPC Architectures Team
> Building 120 Cube 315
> Aberdeen Proving Ground, MD
> Phone: 410-278-6266
>
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at:
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>
_______________________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview
Classification: UNCLASSIFIED
Caveats: NONE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5575 bytes
Desc: not available
URL: <http://www.paraview.org/pipermail/paraview/attachments/20131204/8c7be9ef/attachment-0001.bin>
More information about the ParaView
mailing list