[Cdash] [Insight-developers] Dashboard: Normalizing TIMEOUTS

Luis Ibanez luis.ibanez at kitware.com
Wed Jan 20 19:12:58 UTC 2010


Hi Julien,


I added a feature request entry
to the MANTIS bug tracker:

http://public.kitware.com/Bug/view.php?id=10170


     Thanks


              Luis


---------------------------------------------------------
On Wed, Jan 20, 2010 at 10:14 AM, Julien Jomier
<julien.jomier at kitware.com> wrote:
> Luis,
>
> I like the idea very much and it seems that the work Zack (in CC) did for
> running CTest in parallel might help. Basically newer version of CTest
> generates a file with the time each test took to run, then on the next
> submission CTest uses the times to schedule the parallel runs efficiently.
>
> The plan in the future is to have CTest download the test time history from
> CDash which is consistent with your idea. I think it would be good to log a
> feature request (if not already documented) either in CMake or CDash bug
> tracker and maybe point to a wiki page.
>
> Julien
>
> On 1/19/2010 8:31 PM, Luis Ibanez wrote:
>>
>> Hi Brad,
>>
>>
>> A) The problem:
>>
>>
>>       Maintainer of dashboard machines do not have a
>>       clear reference for how to set up their TIMEOUT
>>       values.
>>
>>       Is 10 minutes the right timeout value ?
>>       or is it 20 minutes ? or 2 minutes ?
>>
>>       Of course it all depends of the machine capabilities,
>>       (number of cores, RAM, disk speed), the OS, the
>>       compiler, and the compilation flags used (e.g. Debug
>>       Release...)
>>
>>
>>      If they under-estimate, many test may fail just because
>>      they are not allowed to run long enough for them to complete.
>>      As it was happening in some RogueResearch machines.
>>
>>      If they over-estimate, then their machines may get to
>>      waste time running tests that are trapped in infinite
>>      loops (like the itkStatisticsAlgorithmTests in llvm), or
>>      that take unrealistic times to complete, like running the
>>      large-memory write test in a machine that has low RAM
>>      and end up swapping to disk.
>>
>>
>> B)  A single TIMEOUT value for all test doesn't represents
>>       the very large range of time that different individual test
>>       in ITK require.
>>
>>       E.g. some test are expected to finish in 0.1 seconds
>>       while others require 20 minutes.
>>
>>       However, we control them all with a single TIMEOUT.
>>
>>       One size for all 1,742 different tests...
>>
>>
>> C)   I apologize for not having explained the idea clearly
>>        enough. I'm not suggesting that for anyone to "manually"
>>        define timeout factors for the tests.   There is no need to
>>        do this manually. The process can certainly be automated.
>>
>>        As you pointed out, one possible mechanism is to simply
>>        harvest the statistical data that CDash has already stored
>>        for every machine that contributes to the Dashboard.
>>
>>        CDash already computes a mean and standard deviation
>>        for the amount of time that it take to run every test, and this
>>        is done independently for every machine that contributes
>>        to the Dashboard. This is how CDash can evaluate and
>>        report TIMING failures per tests.
>>
>>        We could simply use that information in order to compute
>>        better-adjusted timeouts on a test-by-test basis.
>>
>>
>>
>>       Luis
>>
>>
>> -----------------------------------------------------------
>> On Tue, Jan 19, 2010 at 1:26 PM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov>  wrote:
>>>
>>> Hello Luis,
>>>
>>> Why? What is wrong with the current system and what are we trying to fix
>>> or accomplish?
>>>
>>> Having to manually define timeouts for all or even many tests sounds like
>>> a lot of maintenance, and it should be avoided. It would only work if it
>>> could be done automatically. It could be done as a batch analysis of the
>>> cdash records to solve the linear system of equations proposed by Luis.
>>>
>>> There are already some existing resources used in testing the could be
>>> better automatically handled. The memory used of the program, and the number
>>> of threads. These can play into the number of parallel tests that can be
>>> run. Just last week, my system with 32GB, was running out of memory running
>>> ctest in parallel.
>>>
>>> Brad
>>>
>>> On Jan 18, 2010, at 11:32 AM, Luis Ibanez wrote:
>>>
>>>> As you may have noticed, the standard practice
>>>> of using a single TIMEOUT number for all the
>>>> ~1,700 test in ITK brings up the challenge of
>>>> defining what a good timeout value is for each
>>>> machine (and configuration: eg. Release/Debug).
>>>>
>>>> The following proposal was raised in the past,
>>>> but we have not acted upon:
>>>>
>>>> 1) Add to ITK one or two test that can be considered
>>>>    a good benchmark for:
>>>>
>>>>        a) computation power
>>>>        b) input / output speed
>>>>
>>>> 2) Run those tests and use their timings as a
>>>>     base value that characterize this machine.
>>>>
>>>> 3)  Define timeout for all tests that are based
>>>>     on the values found in (2), multiplied by
>>>>     a factor.
>>>>
>>>>
>>>> Let's say that the computation benchmark takes
>>>> 2 seconds to run in the machine  foobar.kitware,
>>>> then we can tell that the DiffeomorphicDemons
>>>> registration test in the same machine should take
>>>>
>>>>           153 x (time of benchmark1 )
>>>>
>>>> (where the number "153" is a factor that we
>>>> will have to estimate for each test).
>>>>
>>>> CDash already does a similar thing with the
>>>> historical record of the computation time that
>>>> it takes to run every test on a given machine,
>>>> although this is done on the CDash server,
>>>> and therefore it happens too late to be used
>>>> as a TIMEOUT mark.
>>>>
>>>> An interesting option as well, could be for
>>>> a machine to get access to the historical
>>>> record that CDash has computed, and then
>>>> use those values as a base for computing
>>>> TIMEOUT at the moment of running ctest.
>>>>
>>>>
>>>>   What do people think of these options ?
>>>>
>>>>
>>>>         Luis
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://kitware.com/products/protraining.html
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>
>>>
>> _______________________________________________
>> Cdash mailing list
>> Cdash at public.kitware.com
>> http://public.kitware.com/cgi-bin/mailman/listinfo/cdash
>>
>



More information about the CDash mailing list