[vtk-developers] cdash/gerrit emails about failing tests...

Wed Jan 30 14:44:34 EST 2013

On Wed, Jan 30, 2013 at 11:31 AM, Kyle Lutz <kyle.lutz at kitware.com> wrote:
> On Mon, Jan 28, 2013 at 3:41 PM, Marcus D. Hanwell
> <marcus.hanwell at kitware.com> wrote:
>> On Mon, Jan 28, 2013 at 11:43 AM, Sean McBride <sean at rogue-research.com> wrote:
>>> On Mon, 28 Jan 2013 10:44:08 -0500, David Cole said:
>>>
>>>>We could write tests that pass.
>>>>
>>>>Just sayin'
>
> +1 * inf
>
>>> That'd be nice too, but, practically speaking, given the state of the VTK dashboard over the years, I thought my suggestion might be easier, and of course not mutually exclusive to yours.
>>
>> It would be easier to make them green than try to devise some
>> "baseline of failing tests".
>
> I disagree. It took me about 20 minutes to create a list of tests that
> fail on my machine and exclude them in my dashboard script. For some
> of the rendering test failures I have no idea where to even start. And
> then fixing each one and pushing it through gerrit would most
> definitely take much more time.

It would take minutes to exclude them on each machine, no argument
here. If these tests are failing in most places is there any value in
keeping them and excluding them on each machine though? I know some of
the failing tests fall into that category, and should probably just be
removed rather than maintaining machine specific exclude lists.

If a test is very flaky, why should we keep it? What you did and what
Sean proposed are quite different though. Sean asked for what several
have asked for, a baseline of failing tests compared to the merge root
of the topic so that I only see tests I made start failing with my
commit...
>
>> This is on my list of things to get to,
>> but I have gotten behind on some of these tasks. Patches are certainly
>> welcome, and if the test can't reliably pass it would be great to
>> rethink the test than devise complex solutions to ignore it.
>>
>> If anyone has input on the tests that are failing (or patches to fix
>> them) that would be great. I will coordinate with Dave DeMarle and
>> others on setting aside some time for greening.
>
> I've attached the list of failing tests on my machine (Ubuntu 12.10,
> GCC-4.7) along with a bit of text about why they failed. It's a simple
> cmake file which creates a variable named FAILING_TESTS which can then
> be passed to ctest_test() as follows:
>
> # loads the FAILING_TESTS variable
> include(vtk_failing_tests.cmake)
>
> # build regex for failing tests
> set(FAILING_TESTS_STRING "")
> string(REPLACE ";" "|" FAILING_TESTS_STRING "${FAILING_TESTS}")
>
> # run tests
> ctest_test(APPEND EXCLUDE "${FAILING_TESTS_STRING}")
>
> As these tests get fixed and start passing I will remove them from the
> list. Hopefully, over time, the the list can be removed entirely.
> Until then, this allows for a "green" dashboard that is *much* more
> useful for identifying regressions.
>
If there is a common set wouldn't it be more useful to add them to a
group of flaky tests, or remove them altogether? If we just want to
exclude tests on each dashboard we can certainly do that pretty
quickly, but we could also make that list smaller by figuring out
which ones fail on most dashboard and remove them.

For the 3-4 CDash at Home hosts we could add some exclude lists, but how
different is that to simply removing or disabling. At least the
developer doesn't have the false sense of security that there is a
test for that feature, but we excluded it on most dashboards.

I would prefer to remove them or fix them, but admittedly don't have
time right now and leave on a trip tomorrow. We should discourage any
developer from adding, or leaving tests enabled that are very fragile.
They serve to mask real bugs introduced.

Marcus