[Paraview] [EXTERNAL] Re: 4.1.0 release candidate build (UNCLASSIFIED)

Scott, W Alan wascott at sandia.gov
Tue Dec 3 22:10:24 EST 2013


Ah, one thing to try - and I bet this isn't it.  If your issue is with a paraview-pvserver remote server run, get rid of your client side .config files.  I have seen times that as we try different development versions, the .config files get messed up.  Grasping at straws here, but ...  Those config files are found in ~/.config/ParaView.

Alan

-----Original Message-----
From: paraview-bounces at paraview.org [mailto:paraview-bounces at paraview.org] On Behalf Of Utkarsh Ayachit
Sent: Tuesday, December 03, 2013 8:07 PM
To: Angelini, Richard C (Rick) CIV USARMY ARL (US)
Cc: paraview at paraview.org
Subject: [EXTERNAL] Re: [Paraview] 4.1.0 release candidate build (UNCLASSIFIED)

I can't think of anything in particular that changed that could affect this. Are you trying this with pvserver? Can you try pvbatch? Same problem?

On Tue, Dec 3, 2013 at 12:44 PM, Angelini, Richard C (Rick) CIV USARMY ARL (US) <richard.c.angelini.civ at mail.mil> wrote:
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> I've built 4.1.0 on a couple of our HPC systems and I'm getting a clean
> build, but fails on execution of parallel servers.   On both systems (an SGI
> Altix/ICE and an IBM iDataPlex) I'm using gcc and openmpi  and the same
> exact build environment that I used to build 4.0.1.   However, both systems
> are failing with identical errors that begins with a "Leave Pinned" 
> mpi feature which is a flag set in our mpirun command environment and 
> works with 4.0.1. Did something change behind that scenes in ParaView 
> 4.1.0 that impacts the build or runtime parameters?
>
>
>
> orterun -x MODULE_VERSION_STACK -x MANPATH -x MPI_VER -x HOSTNAME -x 
> _MODULESBEGINENV_ -x PBS_ACCOUNT -x HOST -x SHELL -x TMPDIR -x 
> PBS_JOBNAME -x PBS_ENVIRONMENT -x PBS_O_WORKDIR -x NCPUS -x DAAC_HOME 
> -x GROUP -x PBS_TASKNUM -x USER -x LD_LIBRARY_PATH -x LS_COLORS -x 
> PBS_O_HOME -x COMPILER_VER -x HOSTTYPE -x PBS_MOMPORT -x PV_ROOT -x 
> PBS_O_QUEUE -x NLSPATH -x MODULE_VERSION -x MAIL -x PBS_O_LOGNAME -x 
> PATH -x PBS_O_LANG -x PBS_JOBCOOKIE -x F90 -x PWD -x _LMFILES_ -x 
> PBS_NODENUM -x LANG -x MODULEPATH -x LOADEDMODULES -x PBS_JOBDIR -x 
> F77 -x PBS_O_SHELL -x PBS_JOBID -x MPICC_F77 -x CXX -x ENVIRONMENT -x 
> SHLVL -x HOME -x OSTYPE -x PBS_O_HOST -x MPIHOME -x FC -x VENDOR -x 
> MACHTYPE -x LOGNAME -x MPICC_CXX -x PBS_QUEUE -x MPI_HOME -x 
> MODULESHOME -x COMPILER -x LESSOPEN -x OMP_NUM_THREADS -x PBS_O_MAIL 
> -x CC -x PBS_O_SYSTEM -x MPICC_F90 -x G_BROKEN_FILENAMES -x 
> PBS_NODEFILE -x MPICC_CC -x PBS_O_PATH -x module -x } -x premode -x 
> premod -x PBS_HOME -x PBS_GET_IBWINS -x NUM_MPITASKS -np 3 
> -machinefile new.1133.machines.txt --prefix
> /usr/cta/unsupported/openmpi/gcc/4.4.0/openmpi-1.6.3 -mca 
> orte_rsh_agent ssh -mca mpi_paffinity_alone 1 -mca maffinity first_use 
> -mca mpi_leave_pinned 1 -mca btl openib,self -mca 
> orte_default_hostname new.1133.machines.txt pvserver 
> --use-offscreen-rendering --server-port=50481 --client-host=localhost 
> --reverse-connection --timeout=15 --connect-id=30526 [pershing-n0221:01190] Warning: could not find environment variable "}"
> ----------------------------------------------------------------------
> ---- A process attempted to use the "leave pinned" MPI feature, but no 
> memory registration hooks were found on the system at run time.  This 
> may be the result of running on a system that does not support memory 
> hooks or having some other software subvert Open MPI's use of the 
> memory hooks.  You can disable Open MPI's use of memory hooks by 
> setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA 
> parameters to 0.
>
> Open MPI will disable any transports that are attempting to use the 
> leave pinned functionality; your job may still run, but may fall back 
> to a slower network transport (such as TCP).
>
>   Mpool name: rdma
>   Process:    [[43622,1],0]
>   Local host: xxx-n0221
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ----
> WARNING: There is at least one OpenFabrics device found but there are 
> no active ports detected (or Open MPI was unable to use them).  This 
> is most certainly not what you wanted.  Check your cables, subnet 
> manager configuration, etc.  The openib BTL will be ignored for this 
> job.
>
>   Local host: xxx-n0221
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- At least one pair of MPI processes are unable to reach each other 
> for MPI communications.  This means that no Open MPI device has 
> indicated that it can be used to communicate between these processes.  
> This is an error; Open MPI requires that all MPI processes be able to 
> reach each other.  This error can sometimes be the result of 
> forgetting to specify the "self" BTL.
>
>   Process 1 ([[43622,1],2]) is on host: xxx-n0221
>   Process 2 ([[43622,1],0]) is on host: xxx-n0221
>   BTLs attempted: self
>
> Your MPI job is now going to abort; sorry.
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- MPI_INIT has failed because at least one MPI process is 
> unreachable from another.  This *usually* means that an underlying 
> communication plugin -- such as a BTL or an MTL -- has either not 
> loaded or not allowed itself to be used.  Your MPI job will now abort.
>
> You may wish to try to narrow down the problem;
>
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>    available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>    if using MTL-based communications) to see exactly which
>    communication plugins were considered and/or discarded.
> ----------------------------------------------------------------------
> ---- [pershing-n0221:1198] *** An error occurred in MPI_Init 
> [pershing-n0221:1198] *** on a NULL communicator [pershing-n0221:1198] 
> *** Unknown error [pershing-n0221:1198] *** MPI_ERRORS_ARE_FATAL: your 
> MPI job will now abort
> ----------------------------------------------------------------------
> ---- An MPI process is aborting at a time when it cannot guarantee 
> that all of its peer processes in the job will be killed properly.  
> You should double check that everything has shut down cleanly.
>
>   Reason:     Before MPI_INIT completed
>   Local host: pershing-n0221
>   PID:        1198
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ---- orterun has exited due to process rank 2 with PID 1198 on node 
> pershing-n0221 exiting improperly. There are two reasons this could
> occur:
>
> 1. this process did not call "init" before exiting, but others in the 
> job did. This can cause a job to hang indefinitely while it waits for 
> all processes to call "init". By rule, if one process calls "init", 
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to 
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be 
> terminated by signals sent by orterun (as reported here).
> ----------------------------------------------------------------------
> ---- [pershing-n0221:01190] 2 more processes have sent help message 
> help-mpool-base.txt / leave pinned failed [pershing-n0221:01190] Set 
> MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages [pershing-n0221:01190] 2 more processes have sent help 
> message help-mpi-btl-openib.txt / no active ports found 
> [pershing-n0221:01190] 2 more processes have sent help message 
> help-mca-bml-r2.txt / unreachable proc [pershing-n0221:01190] 2 more 
> processes have sent help message help-mpi-runtime / 
> mpi_init:startup:pml-add-procs-fail
> [pershing-n0221:01190] 2 more processes have sent help message 
> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle 
> [pershing-n0221:01190] 2 more processes have sent help message 
> help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
>
> ________________________________
> Rick Angelini
> USArmy Research Laboratory
> CISD/HPC Architectures Team
> Building 120 Cube 315
> Aberdeen Proving Ground, MD
> Phone:  410-278-6266
>
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at 
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: 
> http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


More information about the ParaView mailing list