[CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files

Alan W. Irwin irwin at beluga.phys.uvic.ca
Wed Feb 6 22:44:38 EST 2008


On 2008-02-06 21:04-0500 Brad King wrote:

> Alan W. Irwin wrote:
>> On 2007-12-14 09:53-0800 Alan W. Irwin wrote:
>> 
>>> On 2007-12-14 10:32-0500 Brad King wrote:
>> 
>>>> CMake employs a 2-level make recursion system that is independent of 
>>>> the
>>>> directory structure.  The first level never builds anything...it just
>>>> evaluates target-level dependencies with phony targets.  That 
>>>> determines
>>>> the order in which targets must be built.  The second level is the
>>>> build.make for each target.  This is where file-level dependencies are
>>>> evaluated.
>>>> 
>>>> In your example the file1...fileN rules are showing up in target1's
>>>> build.make and target2's build.make but they should never be evaluated
>>>> in the second target.  They are pulled in through the additional_file
>>>> rule's dependencies on them (see below), but they should always be up 
>>>> to
>>>> date if target2 doesn't build until after target1 finishes.  Then only
>>>> the additional_file rule will be invoked.  However if there is no
>>>> dependency from target2->target1 then both build.make files may be 
>>>> built
>>>> simultaneously and you get race conditions causing the double 
>>>> evaluations.
>>>> 
>>>> CMake traces through the dependencies of custom commands in each 
>>>> target.
>>>> When it is constructing target2 it doesn't know that target1 will also
>>>> provide rules for the files.  If you place the targets in different
>>>> directories it would not be able to make this extra connection, but 
>>>> then
>>>> the build would not work correctly unless you add the target-level
>>>> dependency.  Any further explanation here will just duplicate my
>>>> previous message so I'll stop.
>>> 
>>> That's fine.  Your combined explanation now makes sense and completely
>>> confirms my working hypothesis that the make recursion system of CMake 
>>> is
>>> responsible for the parallel build issues I was encountering.  I hope I 
>>> can
>>> work around these PLplot parallel build issues (note the double copy 
>>> issue
>>> was only the most obvious one) by using extra target dependencies.  The
>>> problem is that parallel build issues tend to appear and disappear 
>>> depending
>>> on load, the N level (for -j N), and hardware.  Thus, even if a whole 
>>> flock
>>> of PLplot developers confirm success for parallel builds, there could be
>>> some subtle dependency issue left that we have missed, and some user 
>>> down
>>> the road is going to come up with a combination of load, N, and hardware
>>> that triggers the parallel build problem because of that dependency 
>>> issue.
>>> As a PLplot developer, I don't like being in such an uncertain position!
>> 
>> I thought it important to resurrect this two-month old thread because 
>> today
>> I _finally_ got success (at least no obvious issues, see comment below) 
>> with
>> parallel builds of PLplot on my particular platform.  That's the good 
>> news.
>> 
>> The bad news is it took so much effort.  Plplot is not that big a piece of
>> software, but there are a large number of different components with 
>> complex
>> dependencies between them.  Therefore I had several tries in the two 
>> months
>> to get parallel builds to work that failed miserably.  This last 
>> successful
>> effort of getting "make -J N" to work for many different N values took at
>> least several days of isolating the problem by enabling/disabling various
>> PLplot components until I was finally able to find and fix the last two
>> dependency issues that showed up on my system.
>> 
>> Even worse news is I caught the last problem only by accident. That 
>> problem
>> only showed up intermittently for N = 4 for a very specific PLplot
>> configuration.  N=2 and N=8 never showed any problems for that 
>> configuration
>> for my two-processor hardware!  So from that experience it is unlikely I
>> caught all issues.
>> 
>> To help to sort out such difficult dependency issues with CMake (which
>> affect parallel builds on Unix system and I understand also certain kinds 
>> of
>> builds on Windows), I have a feature request I would like to discuss here
>> before I make a formal feature request on the kitware bug system.
>
> I already made one for this:
>
> http://public.kitware.com/Bug/view.php?id=6285

That is great that you are considering automatically putting in target
depends if two targets depend on the same file.  That new feature would
address the original issue that started this thread, and I am all in favour
of this feature for that reason.

However, during my dependency hell I discovered other issues with the PLplot
depends such as missing dependencies between custom commands.  Those missing
dependencies didn't matter for the non-parallel build case because the order
of the custom commands was deliberately chosen (back in our autotools days
and simply copied to our CMake build system without much thought) so that
the files were built in the correct order, but of course that doesn't happen
for parallel builds.  So some sort of output that emphasizes targets or
files without many depends (which mean they are suspects for missing
dependencies) is needed as well.  Bill's idea of adding file depends to the
graphviz output file would probably satisfy that need since isolated
files/targets would really stand out.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________


More information about the CMake mailing list