[CMake] cmake on multicore interix'en

Markus Duft markus.duft at salomon.at
Wed Feb 17 11:29:26 EST 2010


Brad King wrote:
> Markus Duft wrote:
>> cmakes implementation of how child processes are handled doesn't work
>> reliably on multicore interix. it seems that every other SIGCHLD is lost
> 
> Is this a known problem on that platform, independent of CMake?

it is independant of cmake, yes. it is not widely known, as (i guess)
i'm one of the very few people really _using_ this platform for
something productive (cross compiling to native win32 - hahaha - i know
- don't tell me that cmake supports win32, we have a huge bunch of auto*
based stuff that needs a POSIX env...). doing the bits to make cmake
cross compile from interix to win32 using parity (parity.sf.net) is next
on my agenda...

> 
> The ProcessUNIX.c implementation is for POSIX platforms, which clearly
> define SIGCHLD semantics.

yeah - interix is (supposed to be) POSIX compliant, and hey - it works
_most of the time_. what's the cause of my headaches is the few times it
doesn't ... and all this (both of my problems) is only on multi-core
machines. i am in the process of reporting those issues currently, but
M$ support is something soo .... you know the deal ;)

> 
>> somewhere on the way. i (printf-)debugged cmake a little during
>> bootstrap, and it seems that at random points in time, SIGCHLD is lost,
> 
> Can you print out the state of signal masks?

how can i do that? i'm not really into that topic that much :) but i'll
read some man pages to figure it out.

> 
>> and cmake locks up in a select() call on the signal pipe (SIGCHLD is
>> lost, so nobody will write on the signal pipe).
> 
> The "signal pipe" approach is a standard way to implement race-free
> handling of SIGCHLD while blocking in select().
> 
>> i thought of introducing some lame timeout when select()ing the signal
>> pipe, then checking whether the process is still alive (wait()), and
>> again selecting if it is. what do you think?
> 
> If select() is broken (your second problem) then there is no point
> in pursuing this code path further.  Instead modify the polling
> code path to use a non-blocking waitpid() instead of looking at
> the signal pipe.

it seems that i'm not hit by the select problem, as there is already a
"select has lied" path somewhere in that code path that catches exactly
my select() problem.

but yes, maybe it would be easier to implement the waitpid() stuff in
the non-blocking code path. i'll have a look at that.

> 
>> the second problem i have is regarding a broken select(). i tried to
>> work around it by setting KWSYSPE_USE_SELECT, which initially didn't
>> work, because the code seems b0rked. it seems that there is a wrong
>> timeout check in that code path.
> 
> IIRC that path was contributed for BeOS support which AFAIK is not
> really tested anymore.  However, it looks correct at a quick glance.
> 
>> first kwsysProcessGetTimeoutLeft is
>> called, like in the select() code path, but directly after that, the
>> timeoutLength members are checkd seperately once more.
> 
> The call to GetTimeoutLeft fills the members of timeoutLength.
> It also returns whether or not the timeout has already expired.
> The caller is supposed to use timeoutLength after the call.
> 
>> with this check it seems that all sub-processes "time out" immediately.
> 
> At process start time we store an absolute TimeoutTime using the
> starting wall clock time plus the process timeout length.  Later
> the GetTimeoutLeft subtracts the current time from the TimeoutTime.
> Print out the starting time, the computed TimeoutTime, and the
> timeoutLength that gets computed for each poll.

i'll have a look at that one too. thanks for all the work :)

Cheers, Markus

> 
> -Brad



More information about the CMake mailing list