[Paraview] 4.1.0 release candidate build (UNCLASSIFIED)

Utkarsh Ayachit utkarsh.ayachit at kitware.com
Tue Dec 3 22:06:49 EST 2013


I can't think of anything in particular that changed that could affect
this. Are you trying this with pvserver? Can you try pvbatch? Same
problem?

On Tue, Dec 3, 2013 at 12:44 PM, Angelini, Richard C (Rick) CIV USARMY
ARL (US) <richard.c.angelini.civ at mail.mil> wrote:
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> I've built 4.1.0 on a couple of our HPC systems and I'm getting a clean
> build, but fails on execution of parallel servers.   On both systems (an SGI
> Altix/ICE and an IBM iDataPlex) I'm using gcc and openmpi  and the same
> exact build environment that I used to build 4.0.1.   However, both systems
> are failing with identical errors that begins with a "Leave Pinned" mpi
> feature which is a flag set in our mpirun command environment and works with
> 4.0.1. Did something change behind that scenes in ParaView 4.1.0 that
> impacts the build or runtime parameters?
>
>
>
> orterun -x MODULE_VERSION_STACK -x MANPATH -x MPI_VER -x HOSTNAME -x
> _MODULESBEGINENV_ -x PBS_ACCOUNT -x HOST -x SHELL -x TMPDIR -x PBS_JOBNAME
> -x PBS_ENVIRONMENT -x PBS_O_WORKDIR -x NCPUS -x DAAC_HOME -x GROUP -x
> PBS_TASKNUM -x USER -x LD_LIBRARY_PATH -x LS_COLORS -x PBS_O_HOME -x
> COMPILER_VER -x HOSTTYPE -x PBS_MOMPORT -x PV_ROOT -x PBS_O_QUEUE -x NLSPATH
> -x MODULE_VERSION -x MAIL -x PBS_O_LOGNAME -x PATH -x PBS_O_LANG -x
> PBS_JOBCOOKIE -x F90 -x PWD -x _LMFILES_ -x PBS_NODENUM -x LANG -x
> MODULEPATH -x LOADEDMODULES -x PBS_JOBDIR -x F77 -x PBS_O_SHELL -x PBS_JOBID
> -x MPICC_F77 -x CXX -x ENVIRONMENT -x SHLVL -x HOME -x OSTYPE -x PBS_O_HOST
> -x MPIHOME -x FC -x VENDOR -x MACHTYPE -x LOGNAME -x MPICC_CXX -x PBS_QUEUE
> -x MPI_HOME -x MODULESHOME -x COMPILER -x LESSOPEN -x OMP_NUM_THREADS -x
> PBS_O_MAIL -x CC -x PBS_O_SYSTEM -x MPICC_F90 -x G_BROKEN_FILENAMES -x
> PBS_NODEFILE -x MPICC_CC -x PBS_O_PATH -x module -x } -x premode -x premod
> -x PBS_HOME -x PBS_GET_IBWINS -x NUM_MPITASKS -np 3 -machinefile
> new.1133.machines.txt --prefix
> /usr/cta/unsupported/openmpi/gcc/4.4.0/openmpi-1.6.3 -mca orte_rsh_agent ssh
> -mca mpi_paffinity_alone 1 -mca maffinity first_use -mca mpi_leave_pinned 1
> -mca btl openib,self -mca orte_default_hostname new.1133.machines.txt
> pvserver --use-offscreen-rendering --server-port=50481
> --client-host=localhost --reverse-connection --timeout=15 --connect-id=30526
> [pershing-n0221:01190] Warning: could not find environment variable "}"
> --------------------------------------------------------------------------
> A process attempted to use the "leave pinned" MPI feature, but no
> memory registration hooks were found on the system at run time.  This
> may be the result of running on a system that does not support memory
> hooks or having some other software subvert Open MPI's use of the
> memory hooks.  You can disable Open MPI's use of memory hooks by
> setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
> parameters to 0.
>
> Open MPI will disable any transports that are attempting to use the
> leave pinned functionality; your job may still run, but may fall back
> to a slower network transport (such as TCP).
>
>   Mpool name: rdma
>   Process:    [[43622,1],0]
>   Local host: xxx-n0221
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: There is at least one OpenFabrics device found but there are
> no active ports detected (or Open MPI was unable to use them).  This
> is most certainly not what you wanted.  Check your cables, subnet
> manager configuration, etc.  The openib BTL will be ignored for this
> job.
>
>   Local host: xxx-n0221
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>
>   Process 1 ([[43622,1],2]) is on host: xxx-n0221
>   Process 2 ([[43622,1],0]) is on host: xxx-n0221
>   BTLs attempted: self
>
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
>
> You may wish to try to narrow down the problem;
>
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>    available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>    if using MTL-based communications) to see exactly which
>    communication plugins were considered and/or discarded.
> --------------------------------------------------------------------------
> [pershing-n0221:1198] *** An error occurred in MPI_Init
> [pershing-n0221:1198] *** on a NULL communicator
> [pershing-n0221:1198] *** Unknown error
> [pershing-n0221:1198] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> double check that everything has shut down cleanly.
>
>   Reason:     Before MPI_INIT completed
>   Local host: pershing-n0221
>   PID:        1198
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> orterun has exited due to process rank 2 with PID 1198 on
> node pershing-n0221 exiting improperly. There are two reasons this could
> occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by orterun (as reported here).
> --------------------------------------------------------------------------
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpool-base.txt / leave pinned failed
> [pershing-n0221:01190] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-btl-openib.txt / no active ports found
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mca-bml-r2.txt / unreachable proc
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
> [pershing-n0221:01190] 2 more processes have sent help message
> help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
>
> ________________________________
> Rick Angelini
> USArmy Research Laboratory
> CISD/HPC Architectures Team
> Building 120 Cube 315
> Aberdeen Proving Ground, MD
> Phone:  410-278-6266
>
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>


More information about the ParaView mailing list