[Midas] Problem with Midas + Batchmake + Condor

Sorina Camarasu Pop sorina.pop at creatis.insa-lyon.fr
Tue Mar 4 11:44:07 EST 2014


Hi Mike,

Thank you for your reply.
I had also found the first link you sent, but didn't manage to properly 
configure Condor with password authetication.
I will follow your advice and contact the Condor mailing list.

Thank you for your help !
Best regards,
Sorina

Le 04/03/2014 16:21, Michael Grauer a écrit :
> Hi Sorina,
>
> I never had to deal with the SHADOW_ALLOW_UNSAFE_REMOTE_EXEC property, 
> but I had been using Condor 7.4.4, and you are using a newer version.
>
> I Googled around a bit on your error message, and saw a couple posts 
> that might help.
>
> Also, looking at the top of your attached condor.sched.log file I see 
> (but haven't Googled for this)
>
> 3/03/14 18:09:56 authenticate_self_gss: acquiring self credentials 
> failed. Please check your Condor configuration file if this is a 
> server process. Or the user environment variable if this is a user 
> process.
>
>
> And there may be more helpful hints below that message in the file. 
>  Together these suggest it is some authentication configuration 
> problem, perhaps looking at these posts and checking the condor 
> reference for this configuration might help.
>
> https://www-auth.cs.wisc.edu/lists/htcondor-users/2013-February/msg00129.shtml
>
> http://comments.gmane.org/gmane.comp.distributed.condor.user/27728
>
>
> You can also email the condor mailing list, I have done so in the past 
> and the community has been quite helpful.  Before you do this, I 
> suggest you get this down to the simplest possible example, as all of 
> the Midas/BatchMake stuff may just add confusion.  Can you try a very 
> simple example where you take a single PHP file and try to do a 
> condor_submit_dag with it in the same way that the challenge module 
> does?  If you can repeat the problem with that test case, and then 
> doing the same thing successfully as the apache user will give an 
> easier problem to describe and get help with.
>
> Let us know how it goes, and good luck!
>
> Thanks,
> Mike
>
>
>
>
> On Tue, Mar 4, 2014 at 9:46 AM, Sorina Camarasu Pop 
> <sorina.pop at creatis.insa-lyon.fr 
> <mailto:sorina.pop at creatis.insa-lyon.fr>> wrote:
>
>
>     Hello again,
>
>     I've discovered an interesting config option within condor: the
>     SHADOW_ALLOW_UNSAFE_REMOTE_EXEC seems to allow shell calls via the
>     libc 'system()' function. Is it something any of you have already
>     used in order to allow calls with the Midas executor->exec ?
>
>     I tried to put it on and use the Condor shadow daemon, but I get
>     an error saying "Assertion ERROR on (job_ad_file)" at line 166 in
>     file shadow_v61_main.cpp" ...
>
>     So before going into the trouble of trying to solve this error, I
>     was wondering if you know about this "shadow" config option and if
>     you can confirm it is necessary.
>
>     Best regards,
>     Sorina
>
>     Le 03/03/2014 18:44, Sorina Camarasu Pop a écrit :
>>
>>
>>     Le 03/03/2014 18:06, Michael Grauer a écrit :
>>>     Where did you see the message: ""DC_AUTHENTICATE: authentication
>>>     of <xxx.xxx.xxx.xxx:59888> did not result in a valid mapped user
>>>     name, which is required for this command (1112 QMGMT_WRITE_CMD),
>>>     so aborting."
>>
>>     In the condor log : /home/condor/localcondor/log/SchedLog
>>
>>>     Was there any other output included there?
>>
>>     I copied parts of the log file in the attached file containing
>>     the output printed both when using batchmake and directly condor
>>     commands.
>>
>>>     Do you have a "condor" user on your VM?
>>
>>     Yes.
>>
>>>      When you successfully run jobs by doing "condor_submit_dag"
>>>     from the command line as the apache user,
>>
>>     apache   22773 29731  0 18:18 ?        00:00:00
>>     condor_scheduniv_exec.26.0 -f -l . -Lockfile
>>     challenge.dagjob.lock -AutoRescue 1 -DoRescueFrom 0 -Dag
>>     challenge.dagjob -CsdVersion $CondorVersion: 7.9.1 Aug 24 2012
>>     PRE-RELEASE-UWCS $ -Force -Dagman /bin/condor_dagman
>>
>>
>>>     when you watch your job run with ps or top, which user runs the
>>>     actual execution process (whatever job batchmake will run for
>>>     you) ?
>>
>>     When launching it with batchmake (through the web interface) I do
>>     not manage to to get the corresponding condor process... I only
>>     get a httpd process run by apache....
>>
>>>
>>>     Can you include your "challenge.bms" script in an email?
>>
>>     Of course, here it is attached.
>>
>>>
>>>     Can you show me the output of "ls" from a directory where the
>>>     submit failed and then again from one where the submit
>>>     succeeded, at the end of the job processing run?
>>
>>     Failed :
>>     ls -la 52/
>>     total 56
>>     drwxrwxr-x  3 apache apache 4096  3 mars  18:25 .
>>     drwxr-xr-x 35 apache apache 4096  3 mars  18:25 ..
>>     -rw-r--r--  1 apache apache  140  3 mars  18:25 adminconfig.cfg
>>     -rw-r--r--  1 apache apache  355  3 mars  18:25 challenge.0.dagjob
>>     -rw-r--r--  1 apache apache  332  3 mars  18:25 challenge.1.dagjob
>>     -rw-r--r--  1 apache apache  564  3 mars  18:25 challenge.2.dagjob
>>     -rw-r--r--  1 apache apache  355  3 mars  18:25 challenge.3.dagjob
>>     lrwxrwxrwx  1 apache apache   56  3 mars  18:25 challenge.bms ->
>>     /var/www/miccai4/modules/challenge/library/challenge.bms
>>     -rw-r--r--  1 apache apache 1473  3 mars  18:25 challenge.config.bms
>>     -rw-r--r--  1 apache apache 1593  3 mars  18:25 challenge.dagjob
>>     -rw-r--r--  1 apache apache 1043  3 mars  18:25
>>     challenge.dagjob.condor.sub
>>     lrwxrwxrwx  1 apache apache   70  3 mars  18:25
>>     challenge_validator_app.bms ->
>>     /var/www/miccai4/modules/challenge/library/challenge_validator_app.bms
>>     drwxrwxr-x  4 apache apache 4096  3 mars  18:25 data
>>     lrwxrwxrwx  1 apache apache   50  3 mars  18:25 PHP.bmm ->
>>     /var/www/miccai4/modules/challenge/library/PHP.bmm
>>     -rw-r--r--  1 apache apache  138  3 mars  18:25 userconfig.cfg
>>     lrwxrwxrwx  1 apache apache   67  3 mars  18:25
>>     ValidateImageAveDist.bmm ->
>>     /var/www/miccai4/modules/challenge/library/ValidateImageAveDist.bmm
>>
>>
>>     OK (created by matchmake and relaunched by hand):
>>     -bash-4.2$ ls -la 48
>>     total 104
>>     drwxrwxr-x  3 apache apache  4096  3 mars  18:18 .
>>     drwxr-xr-x 35 apache apache  4096  3 mars  18:25 ..
>>     -rw-r--r--  1 apache apache   140  3 mars  18:09 adminconfig.cfg
>>     -rw-r--r--  1 apache apache     0  3 mars  18:13 bmGrid.0.error.txt
>>     -rw-r--r--  1 apache apache  1968  3 mars  18:18 bmGrid.0.log.txt
>>     -rw-r--r--  1 apache apache   148  3 mars  18:18 bmGrid.0.out.txt
>>     -rw-r--r--  1 apache apache   355  3 mars  18:09 challenge.0.dagjob
>>     -rw-r--r--  1 apache apache   332  3 mars  18:09 challenge.1.dagjob
>>     -rw-r--r--  1 apache apache   564  3 mars  18:09 challenge.2.dagjob
>>     -rw-r--r--  1 apache apache   355  3 mars  18:09 challenge.3.dagjob
>>     lrwxrwxrwx  1 apache apache    56  3 mars  18:09 challenge.bms ->
>>     /var/www/miccai4/modules/challenge/library/challenge.bms
>>     -rw-r--r--  1 apache apache  1473  3 mars  18:09 challenge.config.bms
>>     -rw-r--r--  1 apache apache  1593  3 mars  18:09 challenge.dagjob
>>     -rw-r--r--  1 apache apache  1042  3 mars  18:18
>>     challenge.dagjob.condor.sub
>>     -rw-r--r--  1 apache apache   610  3 mars  18:18
>>     challenge.dagjob.dagman.log
>>     -rw-r--r--  1 apache apache 16074  3 mars  18:18
>>     challenge.dagjob.dagman.out
>>     -rw-r--r--  1 apache apache   256  3 mars  18:18 challenge.dagjob.dot
>>     -rw-r--r--  1 apache apache     0  3 mars  18:18
>>     challenge.dagjob.lib.err
>>     -rw-r--r--  1 apache apache    29  3 mars  18:18
>>     challenge.dagjob.lib.out
>>     -rw-r--r--  1 apache apache   970  3 mars  18:18
>>     challenge.dagjob.nodes.log
>>     -rw-r--r--  1 apache apache   243  3 mars  18:18
>>     challenge.dagjob.rescue001
>>     -rw-r--r--  1 apache apache   243  3 mars  18:13
>>     challenge.dagjob.rescue001.old
>>     lrwxrwxrwx  1 apache apache    70  3 mars  18:09
>>     challenge_validator_app.bms ->
>>     /var/www/miccai4/modules/challenge/library/challenge_validator_app.bms
>>     drwxrwxr-x  4 apache apache  4096  3 mars  18:09 data
>>     lrwxrwxrwx  1 apache apache    50  3 mars  18:09 PHP.bmm ->
>>     /var/www/miccai4/modules/challenge/library/PHP.bmm
>>     -rw-r--r--  1 apache apache   138  3 mars  18:09 userconfig.cfg
>>     lrwxrwxrwx  1 apache apache    67  3 mars  18:09
>>     ValidateImageAveDist.bmm ->
>>     /var/www/miccai4/modules/challenge/library/ValidateImageAveDist.bmm
>>
>>>     I'm not sure what is going on, just trying to get more context...
>>>
>>>     I recall I ran into a problem where one machine was the
>>>     submitter, and there was a midas user there, with uid 100, and a
>>>     midas user on another machine (the execution node) with a uid
>>>     200, and I got what sounded like a similar message--I had to
>>>     make sure their uids were the same across machines to deal with
>>>     permissions across an NFS mount on both machines. This sounds
>>>     nothing like your problem, but I wanted to include it in case it
>>>     gives you any ideas.
>>
>>     Thank you for the hint.
>>     My problem seems to be similar, in the sense that it looks like a
>>     user problem. However, I do not manage to find the difference
>>     between the 2 potential users : apache and who else ?...
>>
>>     I noticed in the condor log (the one attached) the following line :
>>     03/03/14 18:39:34 ATTEMPT_ACCESS: Switching to user uid: 48 gid: 48.
>>     uid 48 does corerspond to apache. What surprises me is that the
>>     log prints out "Switching to user uid: 48". That means that till
>>     that moment it is executed as some other user ?...
>>
>>>
>>>     Can you explain more about the library issues you ran into
>>>     earlier that prevented you from running jobs?
>>
>>     I don't remember exactly, but I spent quite some time on that one
>>     too.
>>     In that case, jobs were submitted, but stayed idle : if I
>>     remember correctly, there was some library preventing one of the
>>     condor daemons from launching/executing correctly. I really don't
>>     think this could be connected...
>>
>>     Thank you,
>>     Sorina
>>
>>>
>>>
>>>
>>>     Thanks,
>>>     Mike
>>>
>>>
>>>
>>>
>>>
>>>     On Mon, Mar 3, 2014 at 11:50 AM, Sorina Camarasu Pop
>>>     <sorina.pop at creatis.insa-lyon.fr
>>>     <mailto:sorina.pop at creatis.insa-lyon.fr>> wrote:
>>>
>>>         Hi Mike,
>>>
>>>         Thank you for your prompt reply.
>>>
>>>         Le 03/03/2014 17:27, Michael Grauer a écrit :
>>>>         Hi Sorina,
>>>>
>>>>         These are tough to track down.
>>>
>>>         I know, I've spent my afternoon on it...
>>>
>>>
>>>>         Can you tell me more about your environment?  Specifically,
>>>>         the 3 machines (possibly all the same machine) that are
>>>>         your condor submit, condor manager, and condor execute nodes?
>>>
>>>         I use the same machine (virtual machine configured as a dual
>>>         core) for my condor submit, condor manager, and condor
>>>         execute nodes.
>>>
>>>
>>>>         What operating system is your web server, and what version
>>>>         of Condor are you using?
>>>
>>>         Fedora 18.
>>>         For Condor, I had compiled the latest version available, but
>>>         had some library problems preventing me from launching any
>>>         job. I finally had it work with the version available for
>>>         yum install :
>>>         condor_version
>>>         $CondorVersion: 7.9.1 Aug 24 2012 PRE-RELEASE-UWCS $
>>>         $CondorPlatform: X86_64-Fedora_18 $
>>>
>>>
>>>
>>>>          Is your condor submit node the same as your web server
>>>>         (most likely yes)?
>>>
>>>         yes.
>>>
>>>
>>>>         Are you running your web server as the apache user (most
>>>>         likely yes),
>>>
>>>         Yes, I even printed out "whoami" to check that it really
>>>         runs as apache.
>>>
>>>
>>>>         and is it your web server that is calling the php code that
>>>>         results in condor_dag_submit (most likely yes, again) ?
>>>
>>>         Yes.
>>>         I use the "standard" batchmake config, i.e. the
>>>         condorSubmitDag function from KWBatchmakeComponent.php
>>>
>>>
>>>>         Can you show the permissions and ownership of the temporary
>>>>         work directory where the condor_dag_submit command is executed?
>>>
>>>         ls -la
>>>         ...
>>>         drwxrwxr-x  3 apache apache 4096  3 mars  16:53 45
>>>         drwxrwxr-x  3 apache apache 4096  3 mars  17:41 46
>>>
>>>         -bash-4.2$ cd 46
>>>         -bash-4.2$ ls -la
>>>         total 92
>>>         drwxrwxr-x  3 apache apache 4096  3 mars  17:41 .
>>>         drwxr-xr-x 29 apache apache 4096  3 mars  17:40 ..
>>>         -rw-r--r--  1 apache apache  140  3 mars  17:40 adminconfig.cfg
>>>         -rw-r--r--  1 apache apache    0  3 mars  17:41
>>>         bmGrid.0.error.txt
>>>         lrwxrwxrwx  1 apache apache   56  3 mars  17:40
>>>         challenge.bms ->
>>>         /var/www/miccai4/modules/challenge/library/challenge.bms
>>>         ...
>>>
>>>
>>>
>>>>         When you tested as the apache user, did you do this test
>>>>         from the same temporary work directory that Midas/apache
>>>>         would have tried this from?
>>>
>>>         Yes, from folder
>>>         /var/www/miccai4/tmp/misc/batchmake/tmp/SSP/7/46 (drwxrwxr-x
>>>         , owned by apache)
>>>
>>>
>>>>         Is there any more information in the logs or error logs
>>>>         generated by Condor in the temp work directory that you
>>>>         could share?
>>>
>>>         tail -f challenge.dagjob.condor.sub
>>>         # Note: default on_exit_remove expression:
>>>         # ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode
>>>         >=0 && ExitCode <= 2))
>>>         # attempts to ensure that DAGMan is automatically
>>>         # requeued by the schedd if it exits abnormally or
>>>         # is killed (e.g., during a reboot).
>>>         on_exit_remove  = ( ExitSignal =?= 11 || (ExitCode =!=
>>>         UNDEFINED && ExitCode >=0 && ExitCode <= 2))
>>>         copy_to_spool   = False
>>>         arguments       = "-f -l . -Lockfile challenge.dagjob.lock
>>>         -AutoRescue 1 -DoRescueFrom 0 -Dag challenge.dagjob
>>>         -CsdVersion $CondorVersion:' '7.9.1' 'Aug' '24' '2012'
>>>         'PRE-RELEASE-UWCS' '$ -Dagman /usr/bin/condor_dagman"
>>>         environment     =
>>>         _CONDOR_DAGMAN_LOG=challenge.dagjob.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
>>>         queue
>>>
>>>         tail -f challenge.0.dagjob
>>>         # More information at: http://www.batchmake.org
>>>         Universe       = vanilla
>>>         Output         = bmGrid.0.out.txt
>>>         Error          = bmGrid.0.error.txt
>>>         Log            = bmGrid.0.log.txt
>>>         Notification   = NEVER
>>>         Executable    = /usr/bin/php
>>>         Arguments     = "'--version'"
>>>         Queue 1
>>>
>>>         I hope this can help with debugging the problem...
>>>
>>>         Thank you,
>>>         Sorina
>>>
>>>
>>>>         Thanks,
>>>>         Mike
>>>>
>>>>
>>>>         On Mon, Mar 3, 2014 at 11:16 AM, Sorina Camarasu Pop
>>>>         <sorina.pop at creatis.insa-lyon.fr
>>>>         <mailto:sorina.pop at creatis.insa-lyon.fr>> wrote:
>>>>
>>>>             Dear Midas users and developers,
>>>>
>>>>             I am trying to configure Midas with the Challenge and
>>>>             BatchMake modules, but I encounter problems when
>>>>             executing the condor_submit_dag command.
>>>>
>>>>             The error printed by Condor when executing the
>>>>             condor_submit_dag command using the Batchmake module
>>>>             looks like this : "DC_AUTHENTICATE: authentication of
>>>>             <xxx.xxx.xxx.xxx:59888> did not result in a valid
>>>>             mapped user name, which is required for this command
>>>>             (1112 QMGMT_WRITE_CMD), so aborting."
>>>>
>>>>             Nevertheless, if I execute exactly the same command
>>>>             line as apache in a console, everything works fine. My
>>>>             condor I do not understand where the difference comes from.
>>>>
>>>>             Do you know if there's any special configuration for
>>>>             Condor to work with the Batchmake module ?
>>>>
>>>>             Thank you for your help,
>>>>             Sorina
>>>>
>>>>             -- 
>>>>             Sorina Pop, PhD
>>>>             CNRS Research Engineer
>>>>             CREATIS
>>>>             Tel : +33 (0)4 72 43 72 99
>>>>             <tel:%2B33%20%280%294%2072%2043%2072%2099>
>>>>
>>>>             _______________________________________________
>>>>             Midas mailing list
>>>>             Midas at public.kitware.com <mailto:Midas at public.kitware.com>
>>>>             http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>         -- 
>>>         Sorina Pop, PhD
>>>         CNRS Research Engineer
>>>         CREATIS
>>>         Tel :+33 (0)4 72 43 72 99  <tel:%2B33%20%280%294%2072%2043%2072%2099>
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Thanks,
>>>     Michael Grauer
>>>     R & D Engineer
>>>     Kitware, Inc.
>>>     919 969 6990 x322 <tel:919%20969%206990%20x322>
>>>
>>>
>>
>>
>>     -- 
>>     Sorina Pop, PhD
>>     CNRS Research Engineer
>>     CREATIS
>>     Tel :+33 (0)4 72 43 72 99  <tel:%2B33%20%280%294%2072%2043%2072%2099>
>>
>>
>>     _______________________________________________
>>     Midas mailing list
>>     Midas at public.kitware.com  <mailto:Midas at public.kitware.com>
>>     http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>
>
>     -- 
>     Sorina Pop, PhD
>     CNRS Research Engineer
>     CREATIS
>     Tel :+33 (0)4 72 43 72 99  <tel:%2B33%20%280%294%2072%2043%2072%2099>
>
>
>     _______________________________________________
>     Midas mailing list
>     Midas at public.kitware.com <mailto:Midas at public.kitware.com>
>     http://public.kitware.com/cgi-bin/mailman/listinfo/midas
>
>
>
>
>
>


-- 
Sorina Pop, PhD
CNRS Research Engineer
CREATIS
Tel : +33 (0)4 72 43 72 99

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/midas/attachments/20140304/de810580/attachment-0002.html>


More information about the Midas mailing list