[Midas] Problem with Midas + Batchmake + Condor

Michael Grauer michael.grauer at kitware.com
Mon Mar 3 12:06:40 EST 2014


Where did you see the message: ""DC_AUTHENTICATE: authentication of
<xxx.xxx.xxx.xxx:59888> did not result in a valid mapped user name, which
is required for this command (1112 QMGMT_WRITE_CMD), so aborting."

Was there any other output included there?

Do you have a "condor" user on your VM?  When you successfully run jobs by
doing "condor_submit_dag" from the command line as the apache user, when
you watch your job run with ps or top, which user runs the actual execution
process (whatever job batchmake will run for you) ?

Can you include your "challenge.bms" script in an email?


Can you show me the output of "ls" from a directory where the submit failed
and then again from one where the submit succeeded, at the end of the job
processing run?



I'm not sure what is going on, just trying to get more context...


I recall I ran into a problem where one machine was the submitter, and
there was a midas user there, with uid 100, and a midas user on another
machine (the execution node) with a uid 200, and I got what sounded like a
similar message--I had to make sure their uids were the same across
machines to deal with permissions across an NFS mount on both machines.
This sounds nothing like your problem, but I wanted to include it in case
it gives you any ideas.

Can you explain more about the library issues you ran into earlier that
prevented you from running jobs?



Thanks,
Mike





On Mon, Mar 3, 2014 at 11:50 AM, Sorina Camarasu Pop <
sorina.pop at creatis.insa-lyon.fr> wrote:

>  Hi Mike,
>
> Thank you for your prompt reply.
>
> Le 03/03/2014 17:27, Michael Grauer a écrit :
>
> Hi Sorina,
>
>  These are tough to track down.
>
>
> I know, I've spent my afternoon on it...
>
>
>  Can you tell me more about your environment?  Specifically, the 3
> machines (possibly all the same machine) that are your condor submit,
> condor manager, and condor execute nodes?
>
>
> I use the same machine (virtual machine configured as a dual core) for my
> condor submit, condor manager, and condor execute nodes.
>
>
>  What operating system is your web server, and what version of Condor are
> you using?
>
>
> Fedora 18.
> For Condor, I had compiled the latest version available, but had some
> library problems preventing me from launching any job. I finally had it
> work with the version available for yum install :
> condor_version
> $CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $
> $CondorPlatform: X86_64-Fedora_18 $
>
>
>
>   Is your condor submit node the same as your web server (most likely
> yes)?
>
>
> yes.
>
>
>  Are you running your web server as the apache user (most likely yes),
>
>
> Yes, I even printed out "whoami" to check that it really runs as apache.
>
>
>  and is it your web server that is calling the php code that results in
> condor_dag_submit (most likely yes, again) ?
>
>
> Yes.
> I use the "standard" batchmake config, i.e. the condorSubmitDag function
> from KWBatchmakeComponent.php
>
>
>  Can you show the permissions and ownership of the temporary work
> directory where the condor_dag_submit command is executed?
>
>
> ls -la
> ...
> drwxrwxr-x  3 apache apache 4096  3 mars  16:53 45
> drwxrwxr-x  3 apache apache 4096  3 mars  17:41 46
>
> -bash-4.2$ cd 46
> -bash-4.2$ ls -la
> total 92
> drwxrwxr-x  3 apache apache 4096  3 mars  17:41 .
> drwxr-xr-x 29 apache apache 4096  3 mars  17:40 ..
> -rw-r--r--  1 apache apache  140  3 mars  17:40 adminconfig.cfg
> -rw-r--r--  1 apache apache    0  3 mars  17:41 bmGrid.0.error.txt
> lrwxrwxrwx  1 apache apache   56  3 mars  17:40 challenge.bms ->
> /var/www/miccai4/modules/challenge/library/challenge.bms
> ...
>
>
>
>  When you tested as the apache user, did you do this test from the same
> temporary work directory that Midas/apache would have tried this from?
>
>
> Yes, from folder /var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46
> (drwxrwxr-x , owned by apache)
>
>
>  Is there any more information in the logs or error logs generated by
> Condor in the temp work directory that you could share?
>
>
> tail -f challenge.dagjob.condor.sub
> # Note: default on_exit_remove expression:
> # ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 &&
> ExitCode <= 2))
> # attempts to ensure that DAGMan is automatically
> # requeued by the schedd if it exits abnormally or
> # is killed (e.g., during a reboot).
> on_exit_remove  = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED &&
> ExitCode >=0 && ExitCode <= 2))
> copy_to_spool   = False
> arguments       = "-f -l . -Lockfile challenge.dagjob.lock -AutoRescue 1
> -DoRescueFrom 0 -Dag challenge.dagjob -CsdVersion $CondorVersion:' '7.9.1'
> 'Aug' '24' '2012' 'PRE-RELEASE-UWCS' '$ -Dagman /usr/bin/condor_dagman"
> environment     =
> _CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
> queue
>
> tail -f challenge.0.dagjob
> # More information at: http://www.batchmake.org
> Universe       = vanilla
> Output         = bmGrid.0.out.txt
> Error          = bmGrid.0.error.txt
> Log            = bmGrid.0.log.txt
> Notification   = NEVER
> Executable    = /usr/bin/php
> Arguments     = "'--version'"
> Queue 1
>
> I hope this can help with debugging the problem...
>
> Thank you,
> Sorina
>
>
>  Thanks,
> Mike
>
>
> On Mon, Mar 3, 2014 at 11:16 AM, Sorina Camarasu Pop <
> sorina.pop at creatis.insa-lyon.fr> wrote:
>
>> Dear Midas users and developers,
>>
>> I am trying to configure Midas with the Challenge and BatchMake modules,
>> but I encounter problems when executing the condor_submit_dag command.
>>
>> The error printed by Condor when executing the condor_submit_dag command
>> using the Batchmake module looks like this : "DC_AUTHENTICATE:
>> authentication of <xxx.xxx.xxx.xxx:59888> did not result in a valid mapped
>> user name, which is required for this command (1112 QMGMT_WRITE_CMD), so
>> aborting."
>>
>> Nevertheless, if I execute exactly the same command line as apache in a
>> console, everything works fine. My condor I do not understand where the
>> difference comes from.
>>
>> Do you know if there's any special configuration for Condor to work with
>> the Batchmake module ?
>>
>> Thank you for your help,
>> Sorina
>>
>> --
>> Sorina Pop, PhD
>> CNRS Research Engineer
>> CREATIS
>> Tel : +33 (0)4 72 43 72 99 <%2B33%20%280%294%2072%2043%2072%2099>
>>
>> _______________________________________________
>> Midas mailing list
>> Midas at public.kitware.com
>> http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>>
>
>
>
>
>
>
>
> --
> Sorina Pop, PhD
> CNRS Research Engineer
> CREATIS
> Tel : +33 (0)4 72 43 72 99
>
>


-- 
Thanks,
Michael Grauer
R & D Engineer
Kitware, Inc.
919 969 6990 x322
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/midas/attachments/20140303/542846ee/attachment-0002.html>


More information about the Midas mailing list