MantisBT - CMake
View Issue Details
0012904CMakeCTestpublic2012-01-20 11:542013-05-06 09:32
Casey B Goodlett 
David Cole 
normalmajoralways
closedfixed 
 
CMake 2.8.11CMake 2.8.11 
0012904: Parallel ctest starts too many tests when a test depends on another test that uses the RUN_SERIAL property
Parallel ctest can start too many tests (unbounded number) when a test that can be run in parallel depends on a test that uses the RUN_SERIAL property. This can cause an unbounded number of tests to spawn which will hang the test machine if not caught right away.

Here is the case I debugged

Parallel level for ctest = 4
Current # tests running = 1

Try to start a new test A which should take only 1 processor so it should be able to start in parallel with the current 1 processors test.
However when trying to start test A the dependencies are checked; instead of starting A the dependent test B is started which takes 4 processors becuse it is set to RUN_SERIAL=true.

The check on whether the test can be started only checks the number of processors required by test A (cmCTestMultiProcessHandler.cxx:265) so test B is allowed to start in place of test A.

As a result test B is started and now there are 5 processors in use even though the parallel level is supposed to be 4. This causes unsigned overflow when looking for the number of tests to run in the next step ( cmCTestMultiProcessHandler::StartNextTests() first line)
I do not have a simple CMakeLists to reproduce the problem. I can only reproduce this in cmake 2.8.4 before the switch from sort -> stable_sort as the tests get reordered which prevents this bug from being triggered. Based on code inspection, it looks like this bug still exists in master although I no longer have a reproducing case.
No tags attached.
Issue History
2012-01-20 11:54Casey B GoodlettNew Issue
2012-08-11 21:09David ColeStatusnew => backlog
2012-08-11 21:09David ColeNote Added: 0030353
2012-12-18 13:33David ColeAssigned To => David Cole
2012-12-18 13:33David ColeStatusbacklog => assigned
2012-12-18 13:34David ColeTarget Version => CMake 2.8.11
2012-12-18 14:02David ColeNote Added: 0031923
2012-12-26 16:12David ColeNote Added: 0031943
2012-12-27 17:39David ColeNote Added: 0031951
2012-12-27 17:39David ColeStatusassigned => resolved
2012-12-27 17:39David ColeFixed in Version => CMake 2.8.11
2012-12-27 17:39David ColeResolutionopen => fixed
2013-05-06 09:32Robert MaynardNote Added: 0032993
2013-05-06 09:32Robert MaynardStatusresolved => closed

Notes
(0030353)
David Cole   
2012-08-11 21:09   
Sending old, never assigned issues to the backlog.

(The age of the bug, plus the fact that it's never been assigned to anyone means that nobody is actively working on it...)

If an issue you care about is sent to the backlog when you feel it should have been addressed in a different manner, please bring it up on the CMake mailing list for discussion. Sign up for the mailing list here, if you're not already on it: http://www.cmake.org/mailman/listinfo/cmake [^]

It's easy to re-activate a bug here if you can find a CMake developer who has the bandwidth to take it on, and ferry a fix through to our 'next' branch for dashboard testing.
(0031923)
David Cole   
2012-12-18 14:02   
We have resolved the "unbounded test launching" part of this problem by fixing the overflow problem with this commit, now in CMake 'next':

  http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=324780697c5020a027efdc24bd9cc2fc926a3546 [^]

Another commit remains to be done to prevent using too many processors in the first place... That one should be coming soon, along with some test code that may be used to verify that the issue still occurs, and that these commits fix the problems.
(0031943)
David Cole   
2012-12-26 16:12   
Added test (and then fixed a problem with the test itself, for Visual Studio builds) with these two commits today:

  http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=6de7854c40ef9581b5efd5815ecc09b19bd0ce9e [^]
  http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=bf51a9310dadaa9e31b2ffa455c82a2318cb99e6 [^]

If this test passes everywhere on all the Nightly dashboards tonight, I'll mark this issue as resolved tomorrow.
(0031951)
David Cole   
2012-12-27 17:39   
See previous notes for commit details.
(0032993)
Robert Maynard   
2013-05-06 09:32   
Closing resolved issues that have not been updated in more than 4 months.