MantisBT - CDash
View Issue Details
0008760CDashpublic2009-03-18 16:142009-03-30 15:02
Roscoe A. Bartlett 
Julien Jomier 
normalmajorhave not tried
closedfixed 
 
1.4 
0008760: CDash Dev: Cdash build failures not being sent out for all dashboard reported build errors




From: Bartlett, Roscoe A
Sent: Wednesday, March 18, 2009 1:00 AM
To: Julien Jomier; Bill Hoffman
Cc: Perschbacher, Brent M; Willenbring, James M
Subject: Cdash email build failures not corresponding with dashboard reported build errors

Hello guys,

I am going over all of the 21 regular failed builds (not my build-test test builds) reported on the dashboard at:

    http://trilinos-dev.sandia.gov/cdash/index.php?project=Trilinos&date=2009-03-18&display=project [^]

which are:

s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0.2 59 266 10.2 2009-03-17T02:43:10 MDT Sundance
s909348.sandia.gov Darwin-Nightly-MPI_RELEASE 0 0 0 0 0.2 56 196 6.1 2009-03-17T01:25:59 MDT Sundance
godel.sandia.gov Linux-Nightly-MPI_OPT_SHARED 0 0 0 0 0.1 54 294 4.1 0 0 47 0 2009-03-17T02:48:49 MDT Sundance
godel.sandia.gov Linux-Nightly-MPI_OPT 0 0 0 0 0.1 6 294 4.5 0 0 47 1.4 2009-03-17T00:52:40 MDT Sundance
godel.sandia.gov Linux-Nightly-MPI_DEBUG 0 0 0 0 0.1 5 294 10.1 0 4 43 11.5 2009-03-17T04:31:26 MDT Sundance
godel.sandia.gov Linux-Nightly-SERIAL_DEBUG 0 0 0 0 0.1 4 306 9 0 4 56 9.3 2009-03-17T01:48:40 MDT Sundance
s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0 3 0 0.1 3 0 0 0 2009-03-17T02:24:03 MDT ThreadPool
godel.sandia.gov Linux-Nightly-SERIAL_RELEASE 0 0 0 0 0.1 2 0 1.1 1 0 49 0 2009-03-17T02:59:42 MDT Teuchos
godel.sandia.gov Linux-Nightly-SERIAL_RELEASE 0 0 0 0 0 2 78 0.5 1 0 0 0 2009-03-17T03:26:06 MDT Moertel
godel.sandia.gov Linux-Nightly-SERIAL_DEBUG_ICPC 0 0 0 0 0.2 2 46 0.7 1 1 48 0.1 2009-03-17T05:07:04 MDT Teuchos
godel.sandia.gov Linux-Nightly-SERIAL_DEBUG 0 0 0 0 0 2 48 0.4 1 0 0 0 2009-03-17T01:41:55 MDT Moertel
godel.sandia.gov Linux-Nightly-SERIAL_DEBUG 0 0 0 0 0.1 2 0 0.7 1 0 49 0.1 2009-03-17T01:05:51 MDT Teuchos
godel.sandia.gov Linux-Nightly-MPI_OPT_SHARED 0 0 0 0 0.1 2 0 1.1 0 0 53 0.2 2009-03-17T02:11:36 MDT Teuchos
godel.sandia.gov Linux-Nightly-MPI_DEBUG 0 0 0 0 0.1 2 0 0.7 0 0 53 0.1 2009-03-17T03:49:06 MDT Teuchos
s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0 2 47 1 1 0 0 0 2009-03-17T02:31:58 MDT Moertel
s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0.2 2 263 3.5 2009-03-17T02:39:35 MDT MOOCHO
s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0.2 2 0 1.7 1 0 49 0 2009-03-17T01:35:50 MDT Teuchos
s909348.sandia.gov Darwin-Nightly-MPI_RELEASE 0 0 0 0 0 2 9 1 0 0 16 0.1 2009-03-17T00:21:35 MDT Sacado
godel.sandia.gov Linux-Nightly-SERIAL_RELEASE 0 0 0 0 0.1 1 306 4 0 0 60 0.5 2009-03-17T03:32:36 MDT Sundance
godel.sandia.gov Linux-Nightly-MPI_OPT_SHARED 0 0 0 0 0 1 2 0.1 0 0 1 0 2009-03-17T02:24:21 MDT Komplex
s909348.sandia.gov Darwin-Nightly-MPI_RELEASE 0 0 0 0 0 1 0 0.1 0 0 3 0 2009-03-17T01:04:00 MDT ThreadPool

and comparing that with the 13 notification emails I got this morning which are:

FAILED (b=6, w=294): Trilinos/Sundance - Linux-Nightly-MPI_OPT - Nightly Tue 1:02 AM 16 KB
FAILED (b=54, w=294): Trilinos/Sundance - Linux-Nightly-MPI_OPT_SHARED - Nightly Tue 2:57 AM 16 KB
FAILED (b=5, w=294, t=4): Trilinos/Sundance - Linux-Nightly-MPI_DEBUG - Nightly Tue 4:58 AM 17 KB
FAILED (b=4, w=306, t=4): Trilinos/Sundance - Linux-Nightly-SERIAL_DEBUG - Nightly Tue 2:09 AM 17 KB
FAILED (b=3): Trilinos/ThreadPool - Darwin-Nightly-SERIAL_DEBUG - Nightly Tue 2:24 AM 15 KB
FAILED (b=2, w=9): Trilinos/Sacado - Darwin-Nightly-MPI_RELEASE - Nightly Tue 12:23 AM 16 KB
FAILED (b=2, w=78): Trilinos/Moertel - Linux-Nightly-SERIAL_RELEASE - Nightly Tue 3:27 AM 17 KB
FAILED (b=2, w=48): Trilinos/Moertel - Linux-Nightly-SERIAL_DEBUG - Nightly Tue 1:43 AM 17 KB
FAILED (b=2, w=47): Trilinos/Moertel - Darwin-Nightly-SERIAL_DEBUG - Nightly Tue 2:33 AM 18 KB
FAILED (b=2): Trilinos/Teuchos - Darwin-Nightly-SERIAL_DEBUG - Nightly Tue 1:38 AM 14 KB
FAILED (b=1, w=306): Trilinos/Sundance - Linux-Nightly-SERIAL_RELEASE - Nightly Tue 3:42 AM 16 KB
FAILED (b=1, w=2): Trilinos/Komplex - Linux-Nightly-MPI_OPT_SHARED - Nightly Tue 2:25 AM 14 KB
FAILED (b=1): Trilinos/ThreadPool - Darwin-Nightly-MPI_RELEASE - Nightly Tue 1:04 AM 14 KB

This means that there are 8 out of 21 notification emails that are not being sent out correctly.

For example, the following Sundance build error reported on the dahsboard:

s909348.sandia.gov Darwin-Nightly-SERIAL_DEBUG 0 0 0 0 0.1 59 266 10.2 2009-03-16T02:37:28 MDT Sundance

does not seem to have a corresponding notification email sent out.

One would expect this testing and error reporting system to have the following properties:

1) The testing system will correctly flag all configure, build, and test failures and report them on the dasbhoard (I can't verify this one way or another right now but I have no reason to say that it does not)

2) All of the configure, build, and test failures flagged by on the dashboard will result in corresponding error notification emails. If there are X build attempts reported on the dashboard that have failed builds, then X notification emails should go out (all evidence suggests that this is not happening currently)

3) If the dashboard reports Y failed configures/builds/tests then the corresponding nodification email should report Y failed configures/builds/tests (this appears to be true for the emails that I see above).

I suspect this is related to the issue we have seen where build failures are getting ignored by the email notification system. By the way, I added 'update' the the CTEST_SUBMIT(...) command as suggested and did a test build this morning and it still did not send an email for the failed build. Therefore, this problem will not be fixed tomorrow either.

Also, I quickly checked the test failures reported on the dashboard and the emails I got and there were 12 builds reported on the dashboard that had failing tests but I got only 10 notification emails that reported test failures. Therefore, it looks like two emails did not get sent out correctly.

Is this something that can be fixed very soon? We need to have this testing and reporting system working flawlessly for at least a solid week with lots of verification before we can even consider turning off our current testing system and offically switching to CMake as the default build and testing system. We can't start that clock until we have at least one day where everything seems to be working correctly.

Fixing these email notification problems continues to be the most crtical issue for Trilinos above all others and must be fixed with all available resources.

I will be in the Trilinos Spring Developers Meeting almost all day tomorrow so I may not be very responsive until the afternoon.

Thanks,

- Ross
No tags attached.
Issue History
2009-03-18 16:14Roscoe A. BartlettNew Issue
2009-03-27 23:20Julien JomierStatusnew => assigned
2009-03-27 23:20Julien JomierAssigned To => Julien Jomier
2009-03-27 23:20Julien JomierStatusassigned => resolved
2009-03-27 23:20Julien JomierFixed in Version => 1.4
2009-03-27 23:20Julien JomierResolutionopen => fixed
2009-03-30 15:02Julien JomierStatusresolved => closed

There are no notes attached to this issue.