[Insight-developers] Dashboard: Normalizing TIMEOUTS

Tue Jan 19 20:31:48 EST 2010

Hi Brad,

A) The problem:

      Maintainer of dashboard machines do not have a
      clear reference for how to set up their TIMEOUT
      values.

      Is 10 minutes the right timeout value ?
      or is it 20 minutes ? or 2 minutes ?

      Of course it all depends of the machine capabilities,
      (number of cores, RAM, disk speed), the OS, the
      compiler, and the compilation flags used (e.g. Debug
      Release...)

     If they under-estimate, many test may fail just because
     they are not allowed to run long enough for them to complete.
     As it was happening in some RogueResearch machines.

     If they over-estimate, then their machines may get to
     waste time running tests that are trapped in infinite
     loops (like the itkStatisticsAlgorithmTests in llvm), or
     that take unrealistic times to complete, like running the
     large-memory write test in a machine that has low RAM
     and end up swapping to disk.

B)  A single TIMEOUT value for all test doesn't represents
      the very large range of time that different individual test
      in ITK require.

      E.g. some test are expected to finish in 0.1 seconds
      while others require 20 minutes.

      However, we control them all with a single TIMEOUT.

      One size for all 1,742 different tests...

C)   I apologize for not having explained the idea clearly
       enough. I'm not suggesting that for anyone to "manually"
       define timeout factors for the tests.   There is no need to
       do this manually. The process can certainly be automated.

       As you pointed out, one possible mechanism is to simply
       harvest the statistical data that CDash has already stored
       for every machine that contributes to the Dashboard.

       CDash already computes a mean and standard deviation
       for the amount of time that it take to run every test, and this
       is done independently for every machine that contributes
       to the Dashboard. This is how CDash can evaluate and
       report TIMING failures per tests.

       We could simply use that information in order to compute
       better-adjusted timeouts on a test-by-test basis.

      Luis

-----------------------------------------------------------
On Tue, Jan 19, 2010 at 1:26 PM, Bradley Lowekamp
<blowekamp at mail.nih.gov> wrote:
> Hello Luis,
>
> Why? What is wrong with the current system and what are we trying to fix or accomplish?
>
> Having to manually define timeouts for all or even many tests sounds like a lot of maintenance, and it should be avoided. It would only work if it could be done automatically. It could be done as a batch analysis of the cdash records to solve the linear system of equations proposed by Luis.
>
> There are already some existing resources used in testing the could be better automatically handled. The memory used of the program, and the number of threads. These can play into the number of parallel tests that can be run. Just last week, my system with 32GB, was running out of memory running ctest in parallel.
>
> Brad
>
> On Jan 18, 2010, at 11:32 AM, Luis Ibanez wrote:
>
>> As you may have noticed, the standard practice
>> of using a single TIMEOUT number for all the
>> ~1,700 test in ITK brings up the challenge of
>> defining what a good timeout value is for each
>> machine (and configuration: eg. Release/Debug).
>>
>> The following proposal was raised in the past,
>> but we have not acted upon:
>>
>> 1) Add to ITK one or two test that can be considered
>>    a good benchmark for:
>>
>>        a) computation power
>>        b) input / output speed
>>
>> 2) Run those tests and use their timings as a
>>     base value that characterize this machine.
>>
>> 3)  Define timeout for all tests that are based
>>     on the values found in (2), multiplied by
>>     a factor.
>>
>>
>> Let's say that the computation benchmark takes
>> 2 seconds to run in the machine  foobar.kitware,
>> then we can tell that the DiffeomorphicDemons
>> registration test in the same machine should take
>>
>>           153 x (time of benchmark1 )
>>
>> (where the number "153" is a factor that we
>> will have to estimate for each test).
>>
>> CDash already does a similar thing with the
>> historical record of the computation time that
>> it takes to run every test on a given machine,
>> although this is done on the CDash server,
>> and therefore it happens too late to be used
>> as a TIMEOUT mark.
>>
>> An interesting option as well, could be for
>> a machine to get access to the historical
>> record that CDash has computed, and then
>> use those values as a base for computing
>> TIMEOUT at the moment of running ctest.
>>
>>
>>   What do people think of these options ?
>>
>>
>>         Luis
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Kitware offers ITK Training Courses, for more information visit:
>> http://kitware.com/products/protraining.html
>>
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-developers
>
>