[Midas] Problem with Midas + Batchmake + Condor

Sorina Camarasu Pop sorina.pop at creatis.insa-lyon.fr
Mon Mar 3 11:50:52 EST 2014


Hi Mike,

Thank you for your prompt reply.

Le 03/03/2014 17:27, Michael Grauer a écrit :
> Hi Sorina,
>
> These are tough to track down.

I know, I've spent my afternoon on it...

> Can you tell me more about your environment?  Specifically, the 3 
> machines (possibly all the same machine) that are your condor submit, 
> condor manager, and condor execute nodes?

I use the same machine (virtual machine configured as a dual core) for 
my condor submit, condor manager, and condor execute nodes.

> What operating system is your web server, and what version of Condor 
> are you using?

Fedora 18.
For Condor, I had compiled the latest version available, but had some 
library problems preventing me from launching any job. I finally had it 
work with the version available for yum install :
condor_version
$CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $
$CondorPlatform: X86_64-Fedora_18 $


>  Is your condor submit node the same as your web server (most likely yes)?

yes.

> Are you running your web server as the apache user (most likely yes),

Yes, I even printed out "whoami" to check that it really runs as apache.

> and is it your web server that is calling the php code that results in 
> condor_dag_submit (most likely yes, again) ?

Yes.
I use the "standard" batchmake config, i.e. the condorSubmitDag function 
from KWBatchmakeComponent.php

> Can you show the permissions and ownership of the temporary work 
> directory where the condor_dag_submit command is executed?

ls -la
...
drwxrwxr-x  3 apache apache 4096  3 mars  16:53 45
drwxrwxr-x  3 apache apache 4096  3 mars  17:41 46

-bash-4.2$ cd 46
-bash-4.2$ ls -la
total 92
drwxrwxr-x  3 apache apache 4096  3 mars  17:41 .
drwxr-xr-x 29 apache apache 4096  3 mars  17:40 ..
-rw-r--r--  1 apache apache  140  3 mars  17:40 adminconfig.cfg
-rw-r--r--  1 apache apache    0  3 mars  17:41 bmGrid.0.error.txt
lrwxrwxrwx  1 apache apache   56  3 mars  17:40 challenge.bms -> 
/var/www/miccai4/modules/challenge/library/challenge.bms
...


> When you tested as the apache user, did you do this test from the same 
> temporary work directory that Midas/apache would have tried this from?

Yes, from folder /var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46 
(drwxrwxr-x , owned by apache)

> Is there any more information in the logs or error logs generated by 
> Condor in the temp work directory that you could share?

tail -f challenge.dagjob.condor.sub
# Note: default on_exit_remove expression:
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && 
ExitCode <= 2))
# attempts to ensure that DAGMan is automatically
# requeued by the schedd if it exits abnormally or
# is killed (e.g., during a reboot).
on_exit_remove  = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && 
ExitCode >=0 && ExitCode <= 2))
copy_to_spool   = False
arguments       = "-f -l . -Lockfile challenge.dagjob.lock -AutoRescue 1 
-DoRescueFrom 0 -Dag challenge.dagjob -CsdVersion $CondorVersion:' 
'7.9.1' 'Aug' '24' '2012' 'PRE-RELEASE-UWCS' '$ -Dagman 
/usr/bin/condor_dagman"
environment     = 
_CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
queue

tail -f challenge.0.dagjob
# More information at: http://www.batchmake.org
Universe       = vanilla
Output         = bmGrid.0.out.txt
Error          = bmGrid.0.error.txt
Log            = bmGrid.0.log.txt
Notification   = NEVER
Executable    = /usr/bin/php
Arguments     = "'--version'"
Queue 1

I hope this can help with debugging the problem...

Thank you,
Sorina

> Thanks,
> Mike
>
>
> On Mon, Mar 3, 2014 at 11:16 AM, Sorina Camarasu Pop 
> <sorina.pop at creatis.insa-lyon.fr 
> <mailto:sorina.pop at creatis.insa-lyon.fr>> wrote:
>
>     Dear Midas users and developers,
>
>     I am trying to configure Midas with the Challenge and BatchMake
>     modules, but I encounter problems when executing the
>     condor_submit_dag command.
>
>     The error printed by Condor when executing the condor_submit_dag
>     command using the Batchmake module looks like this :
>     "DC_AUTHENTICATE: authentication of <xxx.xxx.xxx.xxx:59888> did
>     not result in a valid mapped user name, which is required for this
>     command (1112 QMGMT_WRITE_CMD), so aborting."
>
>     Nevertheless, if I execute exactly the same command line as apache
>     in a console, everything works fine. My condor I do not understand
>     where the difference comes from.
>
>     Do you know if there's any special configuration for Condor to
>     work with the Batchmake module ?
>
>     Thank you for your help,
>     Sorina
>
>     -- 
>     Sorina Pop, PhD
>     CNRS Research Engineer
>     CREATIS
>     Tel : +33 (0)4 72 43 72 99 <tel:%2B33%20%280%294%2072%2043%2072%2099>
>
>     _______________________________________________
>     Midas mailing list
>     Midas at public.kitware.com <mailto:Midas at public.kitware.com>
>     http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>
>
>
>
>
>


-- 
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : +33 (0)4 72 43 72 99

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/midas/attachments/20140303/3d37838f/attachment-0002.html>


More information about the Midas mailing list