<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix"><br>
<br>
Le 03/03/2014 18:06, Michael Grauer a écrit :<br>
</div>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">Where did you see the message: "<span
style="font-family:arial,sans-serif;font-size:12.800000190734863px">"DC_AUTHENTICATE:
authentication of <xxx.xxx.xxx.xxx:59888> did not result
in a valid mapped user name, which is required for this
command (1112 QMGMT_WRITE_CMD), so aborting."</span></div>
</blockquote>
<br>
In the condor log : /home/condor/localcondor/log/SchedLog<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif">Was there any other output
included there?</font></div>
</div>
</blockquote>
<br>
I copied parts of the log file in the attached file containing the
output printed both when using batchmake and directly condor
commands.<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif">Do you have a "condor" user
on your VM? </font></div>
</div>
</blockquote>
<br>
Yes.<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif"> When you successfully run
jobs by doing "condor_submit_dag" from the command line as
the apache user, </font></div>
</div>
</blockquote>
<br>
apache 22773 29731 0 18:18 ? 00:00:00
condor_scheduniv_exec.26.0 -f -l . -Lockfile challenge.dagjob.lock
-AutoRescue 1 -DoRescueFrom 0 -Dag challenge.dagjob -CsdVersion
$CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $ -Force -Dagman
/bin/condor_dagman<br>
<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif">when you watch your job run
with ps or top, which user runs the actual execution process
(whatever job batchmake will run for you) ? <br>
</font></div>
</div>
</blockquote>
<br>
When launching it with <font face="arial, sans-serif">batchmake
(through the web interface) I do not manage to to get the
corresponding condor process... I only get a httpd process run by
apache....</font><br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="arial, sans-serif">Can you include your
"challenge.bms" script in an email? <br>
</font></div>
</div>
</blockquote>
<br>
Of course, here it is attached.<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><font face="arial, sans-serif"><br>
</font></div>
<div>Can you show me the output of "ls" from a directory where
the submit failed and then again from one where the submit
succeeded, at the end of the job processing run?</div>
</div>
</blockquote>
<br>
Failed :<br>
ls -la 52/<br>
total 56<br>
drwxrwxr-x 3 apache apache 4096 3 mars 18:25 .<br>
drwxr-xr-x 35 apache apache 4096 3 mars 18:25 ..<br>
-rw-r--r-- 1 apache apache 140 3 mars 18:25 adminconfig.cfg<br>
-rw-r--r-- 1 apache apache 355 3 mars 18:25 challenge.0.dagjob<br>
-rw-r--r-- 1 apache apache 332 3 mars 18:25 challenge.1.dagjob<br>
-rw-r--r-- 1 apache apache 564 3 mars 18:25 challenge.2.dagjob<br>
-rw-r--r-- 1 apache apache 355 3 mars 18:25 challenge.3.dagjob<br>
lrwxrwxrwx 1 apache apache 56 3 mars 18:25 challenge.bms ->
/var/www/miccai4/modules/challenge/library/challenge.bms<br>
-rw-r--r-- 1 apache apache 1473 3 mars 18:25 challenge.config.bms<br>
-rw-r--r-- 1 apache apache 1593 3 mars 18:25 challenge.dagjob<br>
-rw-r--r-- 1 apache apache 1043 3 mars 18:25
challenge.dagjob.condor.sub<br>
lrwxrwxrwx 1 apache apache 70 3 mars 18:25
challenge_validator_app.bms ->
/var/www/miccai4/modules/challenge/library/challenge_validator_app.bms<br>
drwxrwxr-x 4 apache apache 4096 3 mars 18:25 data<br>
lrwxrwxrwx 1 apache apache 50 3 mars 18:25 PHP.bmm ->
/var/www/miccai4/modules/challenge/library/PHP.bmm<br>
-rw-r--r-- 1 apache apache 138 3 mars 18:25 userconfig.cfg<br>
lrwxrwxrwx 1 apache apache 67 3 mars 18:25
ValidateImageAveDist.bmm ->
/var/www/miccai4/modules/challenge/library/ValidateImageAveDist.bmm<br>
<br>
<br>
OK (created by matchmake and relaunched by hand):<br>
-bash-4.2$ ls -la 48<br>
total 104<br>
drwxrwxr-x 3 apache apache 4096 3 mars 18:18 .<br>
drwxr-xr-x 35 apache apache 4096 3 mars 18:25 ..<br>
-rw-r--r-- 1 apache apache 140 3 mars 18:09 adminconfig.cfg<br>
-rw-r--r-- 1 apache apache 0 3 mars 18:13 bmGrid.0.error.txt<br>
-rw-r--r-- 1 apache apache 1968 3 mars 18:18 bmGrid.0.log.txt<br>
-rw-r--r-- 1 apache apache 148 3 mars 18:18 bmGrid.0.out.txt<br>
-rw-r--r-- 1 apache apache 355 3 mars 18:09 challenge.0.dagjob<br>
-rw-r--r-- 1 apache apache 332 3 mars 18:09 challenge.1.dagjob<br>
-rw-r--r-- 1 apache apache 564 3 mars 18:09 challenge.2.dagjob<br>
-rw-r--r-- 1 apache apache 355 3 mars 18:09 challenge.3.dagjob<br>
lrwxrwxrwx 1 apache apache 56 3 mars 18:09 challenge.bms ->
/var/www/miccai4/modules/challenge/library/challenge.bms<br>
-rw-r--r-- 1 apache apache 1473 3 mars 18:09
challenge.config.bms<br>
-rw-r--r-- 1 apache apache 1593 3 mars 18:09 challenge.dagjob<br>
-rw-r--r-- 1 apache apache 1042 3 mars 18:18
challenge.dagjob.condor.sub<br>
-rw-r--r-- 1 apache apache 610 3 mars 18:18
challenge.dagjob.dagman.log<br>
-rw-r--r-- 1 apache apache 16074 3 mars 18:18
challenge.dagjob.dagman.out<br>
-rw-r--r-- 1 apache apache 256 3 mars 18:18
challenge.dagjob.dot<br>
-rw-r--r-- 1 apache apache 0 3 mars 18:18
challenge.dagjob.lib.err<br>
-rw-r--r-- 1 apache apache 29 3 mars 18:18
challenge.dagjob.lib.out<br>
-rw-r--r-- 1 apache apache 970 3 mars 18:18
challenge.dagjob.nodes.log<br>
-rw-r--r-- 1 apache apache 243 3 mars 18:18
challenge.dagjob.rescue001<br>
-rw-r--r-- 1 apache apache 243 3 mars 18:13
challenge.dagjob.rescue001.old<br>
lrwxrwxrwx 1 apache apache 70 3 mars 18:09
challenge_validator_app.bms ->
/var/www/miccai4/modules/challenge/library/challenge_validator_app.bms<br>
drwxrwxr-x 4 apache apache 4096 3 mars 18:09 data<br>
lrwxrwxrwx 1 apache apache 50 3 mars 18:09 PHP.bmm ->
/var/www/miccai4/modules/challenge/library/PHP.bmm<br>
-rw-r--r-- 1 apache apache 138 3 mars 18:09 userconfig.cfg<br>
lrwxrwxrwx 1 apache apache 67 3 mars 18:09
ValidateImageAveDist.bmm ->
/var/www/miccai4/modules/challenge/library/ValidateImageAveDist.bmm<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">I'm not sure what is going on, just trying to get
more context...
<div><br>
</div>
<div>I recall I ran into a problem where one machine was the
submitter, and there was a midas user there, with uid 100, and
a midas user on another machine (the execution node) with a
uid 200, and I got what sounded like a similar message--I had
to make sure their uids were the same across machines to deal
with permissions across an NFS mount on both machines. This
sounds nothing like your problem, but I wanted to include it
in case it gives you any ideas.</div>
</div>
</blockquote>
<br>
Thank you for the hint. <br>
My problem seems to be similar, in the sense that it looks like a
user problem. However, I do not manage to find the difference
between the 2 potential users : apache and who else ?...<br>
<br>
I noticed in the condor log (the one attached) the following line :<br>
03/03/14 18:39:34 ATTEMPT_ACCESS: Switching to user uid: 48 gid: 48.<br>
uid 48 does corerspond to apache. What surprises me is that the log
prints out "Switching to user uid: 48". That means that till that
moment it is executed as some other user ?...<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Can you explain more about the library issues you ran into
earlier that prevented you from running jobs?</div>
</div>
</blockquote>
<br>
I don't remember exactly, but I spent quite some time on that one
too.<br>
In that case, jobs were submitted, but stayed idle : if I remember
correctly, there was some library preventing one of the condor
daemons from launching/executing correctly. I really don't think
this could be connected...<br>
<br>
Thank you,<br>
Sorina<br>
<br>
<blockquote
cite="mid:CAKx26+Fz4NhqxGT8KfGKoQuE618xTBjopGKO4hVOfgHo7yhQ2w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Mike</div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Mar 3, 2014 at 11:50 AM, Sorina
Camarasu Pop <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:sorina.pop@creatis.insa-lyon.fr"
target="_blank">sorina.pop@creatis.insa-lyon.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Hi Mike,<br>
<br>
Thank you for your prompt reply.<br>
<br>
Le 03/03/2014 17:27, Michael Grauer a écrit :<br>
</div>
<div class="">
<blockquote type="cite">
<div dir="ltr">Hi Sorina,
<div><br>
</div>
<div>These are tough to track down. <br>
</div>
</div>
</blockquote>
<br>
</div>
I know, I've spent my afternoon on it...
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Can you tell me more about your environment?
Specifically, the 3 machines (possibly all the
same machine) that are your condor submit, condor
manager, and condor execute nodes? <br>
</div>
</div>
</blockquote>
<br>
</div>
I use the same machine (virtual machine configured as a
dual core) for my condor submit, condor manager, and
condor execute nodes.
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>What operating system is your web server, and
what version of Condor are you using? </div>
</div>
</blockquote>
<br>
</div>
Fedora 18.<br>
For Condor, I had compiled the latest version available,
but had some library problems preventing me from launching
any job. I finally had it work with the version available
for yum install :<br>
condor_version<br>
$CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $<br>
$CondorPlatform: X86_64-Fedora_18 $
<div class=""><br>
<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div> Is your condor submit node the same as your
web server (most likely yes)?</div>
</div>
</blockquote>
<br>
</div>
yes.
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Are you running your web server as the apache
user (most likely yes), </div>
</div>
</blockquote>
<br>
</div>
Yes, I even printed out "whoami" to check that it really
runs as apache.
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>and is it your web server that is calling the
php code that results in condor_dag_submit (most
likely yes, again) ? </div>
</div>
</blockquote>
<br>
</div>
Yes. <br>
I use the "standard" batchmake config, i.e. the
condorSubmitDag function from KWBatchmakeComponent.php
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Can you show the permissions and ownership of
the temporary work directory where the
condor_dag_submit command is executed?</div>
</div>
</blockquote>
<br>
</div>
ls -la<br>
...<br>
drwxrwxr-x 3 apache apache 4096 3 mars 16:53 45<br>
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 46<br>
<br>
-bash-4.2$ cd 46<br>
-bash-4.2$ ls -la<br>
total 92<br>
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 .<br>
drwxr-xr-x 29 apache apache 4096 3 mars 17:40 ..<br>
-rw-r--r-- 1 apache apache 140 3 mars 17:40
adminconfig.cfg<br>
-rw-r--r-- 1 apache apache 0 3 mars 17:41
bmGrid.0.error.txt<br>
lrwxrwxrwx 1 apache apache 56 3 mars 17:40
challenge.bms ->
/var/www/miccai4/modules/challenge/library/challenge.bms<br>
...
<div class=""><br>
<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>When you tested as the apache user, did you do
this test from the same temporary work directory
that Midas/apache would have tried this from?</div>
</div>
</blockquote>
<br>
</div>
Yes, from folder
/var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46
(drwxrwxr-x , owned by apache)
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Is there any more information in the logs or
error logs generated by Condor in the temp work
directory that you could share?</div>
</div>
</blockquote>
<br>
</div>
tail -f challenge.dagjob.condor.sub<br>
# Note: default on_exit_remove expression:<br>
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED
&& ExitCode >=0 && ExitCode <= 2))<br>
# attempts to ensure that DAGMan is automatically<br>
# requeued by the schedd if it exits abnormally or<br>
# is killed (e.g., during a reboot).<br>
on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!=
UNDEFINED && ExitCode >=0 && ExitCode
<= 2))<br>
copy_to_spool = False<br>
arguments = "-f -l . -Lockfile challenge.dagjob.lock
-AutoRescue 1 -DoRescueFrom 0 -Dag challenge.dagjob
-CsdVersion $CondorVersion:' '7.9.1' 'Aug' '24' '2012'
'PRE-RELEASE-UWCS' '$ -Dagman /usr/bin/condor_dagman"<br>
environment =
_CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0<br>
queue<br>
<br>
tail -f challenge.0.dagjob<br>
# More information at: <a moz-do-not-send="true"
href="http://www.batchmake.org" target="_blank">http://www.batchmake.org</a><br>
Universe = vanilla<br>
Output = bmGrid.0.out.txt<br>
Error = bmGrid.0.error.txt<br>
Log = bmGrid.0.log.txt<br>
Notification = NEVER<br>
Executable = /usr/bin/php<br>
Arguments = "'--version'"<br>
Queue 1<br>
<br>
I hope this can help with debugging the problem...<br>
<br>
Thank you,<br>
Sorina
<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Thanks,</div>
<div>Mike</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Mar 3, 2014 at
11:16 AM, Sorina Camarasu Pop <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sorina.pop@creatis.insa-lyon.fr"
target="_blank">sorina.pop@creatis.insa-lyon.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Dear Midas users and
developers,<br>
<br>
I am trying to configure Midas with the
Challenge and BatchMake modules, but I
encounter problems when executing the
condor_submit_dag command.<br>
<br>
The error printed by Condor when executing the
condor_submit_dag command using the Batchmake
module looks like this : "DC_AUTHENTICATE:
authentication of
<xxx.xxx.xxx.xxx:59888> did not result
in a valid mapped user name, which is required
for this command (1112 QMGMT_WRITE_CMD), so
aborting."<br>
<br>
Nevertheless, if I execute exactly the same
command line as apache in a console,
everything works fine. My condor I do not
understand where the difference comes from.<br>
<br>
Do you know if there's any special
configuration for Condor to work with the
Batchmake module ?<br>
<br>
Thank you for your help,<br>
Sorina<span><font color="#888888"><br>
<br>
-- <br>
Sorina Pop, PhD<br>
CNRS Research Engineer<br>
CREATIS<br>
Tel : <a moz-do-not-send="true"
href="tel:%2B33%20%280%294%2072%2043%2072%2099"
value="+33472437299" target="_blank">+33
(0)4 72 43 72 99</a><br>
<br>
_______________________________________________<br>
Midas mailing list<br>
<a moz-do-not-send="true"
href="mailto:Midas@public.kitware.com"
target="_blank">Midas@public.kitware.com</a><br>
<a moz-do-not-send="true"
href="http://public.kitware.com/cgi-bin/mailman/listinfo/midas"
target="_blank">http://public.kitware.com/cgi-bin/mailman/listinfo/midas</a><br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
<br>
<br>
</div>
</div>
</blockquote>
<br>
<br>
<pre cols="72">--
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : <a moz-do-not-send="true" href="tel:%2B33%20%280%294%2072%2043%2072%2099" value="+33472437299" target="_blank">+33 (0)4 72 43 72 99</a></pre>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Thanks,<br>
Michael Grauer<br>
R & D Engineer<br>
Kitware, Inc.<br>
919 969 6990 x322<br>
<br>
<br>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : +33 (0)4 72 43 72 99</pre>
</body>
</html>