[Midas] Problem with Midas + Batchmake + Condor
Sorina Camarasu Pop
sorina.pop at creatis.insa-lyon.fr
Mon Mar 3 11:50:52 EST 2014
Hi Mike,
Thank you for your prompt reply.
Le 03/03/2014 17:27, Michael Grauer a écrit :
> Hi Sorina,
>
> These are tough to track down.
I know, I've spent my afternoon on it...
> Can you tell me more about your environment? Specifically, the 3
> machines (possibly all the same machine) that are your condor submit,
> condor manager, and condor execute nodes?
I use the same machine (virtual machine configured as a dual core) for
my condor submit, condor manager, and condor execute nodes.
> What operating system is your web server, and what version of Condor
> are you using?
Fedora 18.
For Condor, I had compiled the latest version available, but had some
library problems preventing me from launching any job. I finally had it
work with the version available for yum install :
condor_version
$CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $
$CondorPlatform: X86_64-Fedora_18 $
> Is your condor submit node the same as your web server (most likely yes)?
yes.
> Are you running your web server as the apache user (most likely yes),
Yes, I even printed out "whoami" to check that it really runs as apache.
> and is it your web server that is calling the php code that results in
> condor_dag_submit (most likely yes, again) ?
Yes.
I use the "standard" batchmake config, i.e. the condorSubmitDag function
from KWBatchmakeComponent.php
> Can you show the permissions and ownership of the temporary work
> directory where the condor_dag_submit command is executed?
ls -la
...
drwxrwxr-x 3 apache apache 4096 3 mars 16:53 45
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 46
-bash-4.2$ cd 46
-bash-4.2$ ls -la
total 92
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 .
drwxr-xr-x 29 apache apache 4096 3 mars 17:40 ..
-rw-r--r-- 1 apache apache 140 3 mars 17:40 adminconfig.cfg
-rw-r--r-- 1 apache apache 0 3 mars 17:41 bmGrid.0.error.txt
lrwxrwxrwx 1 apache apache 56 3 mars 17:40 challenge.bms ->
/var/www/miccai4/modules/challenge/library/challenge.bms
...
> When you tested as the apache user, did you do this test from the same
> temporary work directory that Midas/apache would have tried this from?
Yes, from folder /var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46
(drwxrwxr-x , owned by apache)
> Is there any more information in the logs or error logs generated by
> Condor in the temp work directory that you could share?
tail -f challenge.dagjob.condor.sub
# Note: default on_exit_remove expression:
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 &&
ExitCode <= 2))
# attempts to ensure that DAGMan is automatically
# requeued by the schedd if it exits abnormally or
# is killed (e.g., during a reboot).
on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED &&
ExitCode >=0 && ExitCode <= 2))
copy_to_spool = False
arguments = "-f -l . -Lockfile challenge.dagjob.lock -AutoRescue 1
-DoRescueFrom 0 -Dag challenge.dagjob -CsdVersion $CondorVersion:'
'7.9.1' 'Aug' '24' '2012' 'PRE-RELEASE-UWCS' '$ -Dagman
/usr/bin/condor_dagman"
environment =
_CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
queue
tail -f challenge.0.dagjob
# More information at: http://www.batchmake.org
Universe = vanilla
Output = bmGrid.0.out.txt
Error = bmGrid.0.error.txt
Log = bmGrid.0.log.txt
Notification = NEVER
Executable = /usr/bin/php
Arguments = "'--version'"
Queue 1
I hope this can help with debugging the problem...
Thank you,
Sorina
> Thanks,
> Mike
>
>
> On Mon, Mar 3, 2014 at 11:16 AM, Sorina Camarasu Pop
> <sorina.pop at creatis.insa-lyon.fr
> <mailto:sorina.pop at creatis.insa-lyon.fr>> wrote:
>
> Dear Midas users and developers,
>
> I am trying to configure Midas with the Challenge and BatchMake
> modules, but I encounter problems when executing the
> condor_submit_dag command.
>
> The error printed by Condor when executing the condor_submit_dag
> command using the Batchmake module looks like this :
> "DC_AUTHENTICATE: authentication of <xxx.xxx.xxx.xxx:59888> did
> not result in a valid mapped user name, which is required for this
> command (1112 QMGMT_WRITE_CMD), so aborting."
>
> Nevertheless, if I execute exactly the same command line as apache
> in a console, everything works fine. My condor I do not understand
> where the difference comes from.
>
> Do you know if there's any special configuration for Condor to
> work with the Batchmake module ?
>
> Thank you for your help,
> Sorina
>
> --
> Sorina Pop, PhD
> CNRS Research Engineer
> CREATIS
> Tel : +33 (0)4 72 43 72 99 <tel:%2B33%20%280%294%2072%2043%2072%2099>
>
> _______________________________________________
> Midas mailing list
> Midas at public.kitware.com <mailto:Midas at public.kitware.com>
> http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>
>
>
>
>
>
--
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : +33 (0)4 72 43 72 99
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/midas/attachments/20140303/3d37838f/attachment-0002.html>
More information about the Midas
mailing list