<div dir="ltr">Where did you see the message: "<span style="font-family:arial,sans-serif;font-size:12.800000190734863px">"DC_AUTHENTICATE: authentication of <xxx.xxx.xxx.xxx:59888> did not result in a valid mapped user name, which is required for this command (1112 QMGMT_WRITE_CMD), so aborting."</span><div>
<span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><font face="arial, sans-serif">Was there any other output included there?</font></div><div><font face="arial, sans-serif"><br>
</font></div><div><font face="arial, sans-serif">Do you have a "condor" user on your VM? When you successfully run jobs by doing "condor_submit_dag" from the command line as the apache user, when you watch your job run with ps or top, which user runs the actual execution process (whatever job batchmake will run for you) ? </font></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Can you include your "challenge.bms" script in an email? </font></div><div><font face="arial, sans-serif"><br></font></div>
<div><font face="arial, sans-serif"><br></font></div><div>Can you show me the output of "ls" from a directory where the submit failed and then again from one where the submit succeeded, at the end of the job processing run?</div>
<div><br></div><div><br></div><div><br></div><div>I'm not sure what is going on, just trying to get more context...</div><div><br></div><div><br></div><div>I recall I ran into a problem where one machine was the submitter, and there was a midas user there, with uid 100, and a midas user on another machine (the execution node) with a uid 200, and I got what sounded like a similar message--I had to make sure their uids were the same across machines to deal with permissions across an NFS mount on both machines. This sounds nothing like your problem, but I wanted to include it in case it gives you any ideas.</div>
<div><br></div><div>Can you explain more about the library issues you ran into earlier that prevented you from running jobs?</div><div><br></div><div><br></div><div><br></div><div>Thanks,</div><div>Mike</div><div><font face="arial, sans-serif"><br>
</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif"><br></font></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 3, 2014 at 11:50 AM, Sorina Camarasu Pop <span dir="ltr"><<a href="mailto:sorina.pop@creatis.insa-lyon.fr" target="_blank">sorina.pop@creatis.insa-lyon.fr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Hi Mike,<br>
<br>
Thank you for your prompt reply.<br>
<br>
Le 03/03/2014 17:27, Michael Grauer a écrit :<br>
</div><div class="">
<blockquote type="cite">
<div dir="ltr">Hi Sorina,
<div><br>
</div>
<div>These are tough to track down. <br>
</div>
</div>
</blockquote>
<br></div>
I know, I've spent my afternoon on it...<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Can you tell me more about your environment? Specifically,
the 3 machines (possibly all the same machine) that are your
condor submit, condor manager, and condor execute nodes? <br>
</div>
</div>
</blockquote>
<br></div>
I use the same machine (virtual machine configured as a dual core)
for my condor submit, condor manager, and condor execute nodes.<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>What operating system is your web server, and what version
of Condor are you using? </div>
</div>
</blockquote>
<br></div>
Fedora 18.<br>
For Condor, I had compiled the latest version available, but had
some library problems preventing me from launching any job. I
finally had it work with the version available for yum install :<br>
condor_version<br>
$CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $<br>
$CondorPlatform: X86_64-Fedora_18 $<div class=""><br>
<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div> Is your condor submit node the same as your web server
(most likely yes)?</div>
</div>
</blockquote>
<br></div>
yes.<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Are you running your web server as the apache user (most
likely yes), </div>
</div>
</blockquote>
<br></div>
Yes, I even printed out "whoami" to check that it really runs as
apache.<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>and is it your web server that is calling the php code that
results in condor_dag_submit (most likely yes, again) ? </div>
</div>
</blockquote>
<br></div>
Yes. <br>
I use the "standard" batchmake config, i.e. the condorSubmitDag
function from KWBatchmakeComponent.php<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Can you show the permissions and ownership of the temporary
work directory where the condor_dag_submit command is
executed?</div>
</div>
</blockquote>
<br></div>
ls -la<br>
...<br>
drwxrwxr-x 3 apache apache 4096 3 mars 16:53 45<br>
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 46<br>
<br>
-bash-4.2$ cd 46<br>
-bash-4.2$ ls -la<br>
total 92<br>
drwxrwxr-x 3 apache apache 4096 3 mars 17:41 .<br>
drwxr-xr-x 29 apache apache 4096 3 mars 17:40 ..<br>
-rw-r--r-- 1 apache apache 140 3 mars 17:40 adminconfig.cfg<br>
-rw-r--r-- 1 apache apache 0 3 mars 17:41 bmGrid.0.error.txt<br>
lrwxrwxrwx 1 apache apache 56 3 mars 17:40 challenge.bms ->
/var/www/miccai4/modules/challenge/library/challenge.bms<br>
...<div class=""><br>
<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>When you tested as the apache user, did you do this test
from the same temporary work directory that Midas/apache would
have tried this from?</div>
</div>
</blockquote>
<br></div>
Yes, from folder /var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46
(drwxrwxr-x , owned by apache)<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Is there any more information in the logs or error logs
generated by Condor in the temp work directory that you could
share?</div>
</div>
</blockquote>
<br></div>
tail -f challenge.dagjob.condor.sub<br>
# Note: default on_exit_remove expression:<br>
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode
>=0 && ExitCode <= 2))<br>
# attempts to ensure that DAGMan is automatically<br>
# requeued by the schedd if it exits abnormally or<br>
# is killed (e.g., during a reboot).<br>
on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED
&& ExitCode >=0 && ExitCode <= 2))<br>
copy_to_spool = False<br>
arguments = "-f -l . -Lockfile challenge.dagjob.lock
-AutoRescue 1 -DoRescueFrom 0 -Dag challenge.dagjob -CsdVersion
$CondorVersion:' '7.9.1' 'Aug' '24' '2012' 'PRE-RELEASE-UWCS' '$
-Dagman /usr/bin/condor_dagman"<br>
environment =
_CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0<br>
queue<br>
<br>
tail -f challenge.0.dagjob<br>
# More information at: <a href="http://www.batchmake.org" target="_blank">http://www.batchmake.org</a><br>
Universe = vanilla<br>
Output = bmGrid.0.out.txt<br>
Error = bmGrid.0.error.txt<br>
Log = bmGrid.0.log.txt<br>
Notification = NEVER<br>
Executable = /usr/bin/php<br>
Arguments = "'--version'"<br>
Queue 1<br>
<br>
I hope this can help with debugging the problem...<br>
<br>
Thank you,<br>
Sorina<div class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Thanks,</div>
<div>Mike</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Mar 3, 2014 at 11:16 AM,
Sorina Camarasu Pop <span dir="ltr"><<a href="mailto:sorina.pop@creatis.insa-lyon.fr" target="_blank">sorina.pop@creatis.insa-lyon.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear
Midas users and developers,<br>
<br>
I am trying to configure Midas with the Challenge and
BatchMake modules, but I encounter problems when executing
the condor_submit_dag command.<br>
<br>
The error printed by Condor when executing the
condor_submit_dag command using the Batchmake module looks
like this : "DC_AUTHENTICATE: authentication of
<xxx.xxx.xxx.xxx:59888> did not result in a valid
mapped user name, which is required for this command (1112
QMGMT_WRITE_CMD), so aborting."<br>
<br>
Nevertheless, if I execute exactly the same command line
as apache in a console, everything works fine. My condor I
do not understand where the difference comes from.<br>
<br>
Do you know if there's any special configuration for
Condor to work with the Batchmake module ?<br>
<br>
Thank you for your help,<br>
Sorina<span><font color="#888888"><br>
<br>
-- <br>
Sorina Pop, PhD<br>
CNRS Research Engineer<br>
CREATIS<br>
Tel : <a href="tel:%2B33%20%280%294%2072%2043%2072%2099" value="+33472437299" target="_blank">+33 (0)4 72 43
72 99</a><br>
<br>
_______________________________________________<br>
Midas mailing list<br>
<a href="mailto:Midas@public.kitware.com" target="_blank">Midas@public.kitware.com</a><br>
<a href="http://public.kitware.com/cgi-bin/mailman/listinfo/midas" target="_blank">http://public.kitware.com/cgi-bin/mailman/listinfo/midas</a><br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
<br>
<br>
</div>
</div>
</blockquote>
<br>
<br>
<pre cols="72">--
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : <a href="tel:%2B33%20%280%294%2072%2043%2072%2099" value="+33472437299" target="_blank">+33 (0)4 72 43 72 99</a></pre>
</div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Thanks,<br>Michael Grauer<br>R & D Engineer<br>Kitware, Inc.<br>919 969 6990 x322<br><br><br>
</div>