[Cdash] CDash causing high load and errors when deleting builds

Martin Apel martin.apel at simpack.de
Wed Oct 12 04:44:34 EDT 2011


Hi all,

I have been chasing a strange phenomenon with CDash for some weeks now
and have gathered a few facts by now:

The visible effects were, that CDash was regularly (but not
reproducably) not accepting submissions from CMake. This mostly happened
such, that
the Build.xml file was accepted, but the subsequently submitted
Configure.xml and Test.xml were not. The clients had messages in their
log files such as "Operation too slow. Less than 1 bytes/sec transfered
the last 120 seconds".

Searching for the cause I found out that the server machine (a Linux box
with Debian Lenny running CDash 1.8.2 on Apache 2.2.9) had a CPU load of
100 % for about three hours while deleting old builds during the night.
The process causing the load is mysqld, which performs nearly no I/O,
but consumes a full CPU. During this time, it seems that network packets
were dropped.

After some more investigation I found the CDash configuration option
CDASH_ASYNCHRONOUS_SUBMISSION, which was previously set to false.
I set this to true a few days ago and since then, no submissions were
lost anymore. Anyhow the high load caused by deleting those builds still
remains and tonight the CDash log showed the following messages:

[2011-10-12T03:50:38][INFO][pid=24652](removeFirstBuilds): about to query for builds to remove
[2011-10-12T03:50:38][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23097 projectid: 1
[2011-10-12T03:58:05][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23099 projectid: 1
[2011-10-12T04:07:33][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23092 projectid: 1
[2011-10-12T04:17:00][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23094 projectid: 1
[2011-10-12T04:24:28][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23103 projectid: 1
[2011-10-12T04:33:52][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23098 projectid: 1
[2011-10-12T04:34:00][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23119 projectid: 1
[2011-10-12T04:43:27][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23120 projectid: 1
[2011-10-12T04:52:58][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23121 projectid: 1
[2011-10-12T05:02:03][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23123 projectid: 1
[2011-10-12T05:11:40][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23122 projectid: 1
[2011-10-12T05:19:13][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23124 projectid: 1
[2011-10-12T05:26:39][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23131 projectid: 1
[2011-10-12T05:40:03][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23127 projectid: 1
[2011-10-12T05:55:18][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23126 projectid: 1
[2011-10-12T06:00:03][ERROR][pid=23419](taking lock: projectid=1, other processor pid='22563' apparently stalled, lastupdated='2011-10-12 03:51:34'): AcquireProcessingLock
[2011-10-12T06:00:03][ERROR][pid=23419](1 submission records assumed stalled, reset to status=0): ResetApparentlyStalledSubmissions
[2011-10-12T06:00:41][INFO][pid=22563](pid '22563' does not own lock anymore: abandoning loop...): ProcessSubmissions
[2011-10-12T06:00:41][ERROR][pid=22563](lock not released, unexpected pid mismatch: pid='23419' mypid='22563' - attempt to unlock a lock we don't own...): ReleaseProcessingLock
[2011-10-12T06:02:46][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23128 projectid: 1
[2011-10-12T06:12:08][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23125 projectid: 1
[2011-10-12T06:12:15][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24891 projectid: 1
[2011-10-12T06:13:22][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24893 projectid: 1
[2011-10-12T06:14:08][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24892 projectid: 1
[2011-10-12T06:14:53][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24894 projectid: 1
[2011-10-12T06:15:40][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24896 projectid: 1
[2011-10-12T06:16:26][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24897 projectid: 1
[2011-10-12T06:17:34][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24899 projectid: 1
[2011-10-12T06:18:28][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24898 projectid: 1
[2011-10-12T06:19:17][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24901 projectid: 1
[2011-10-12T06:20:03][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24900 projectid: 1
[2011-10-12T06:21:11][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24902 projectid: 1
[2011-10-12T06:21:58][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24908 projectid: 1
[2011-10-12T06:22:49][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24903 projectid: 1
[2011-10-12T06:23:34][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24904 projectid: 1
[2011-10-12T06:24:41][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24906 projectid: 1
[2011-10-12T06:25:31][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24905 projectid: 1
[2011-10-12T06:26:16][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24907 projectid: 1
[2011-10-12T06:27:24][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24909 projectid: 1
[2011-10-12T06:28:14][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24912 projectid: 1
[2011-10-12T06:29:10][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24911 projectid: 1
[2011-10-12T06:30:17][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24910 projectid: 1
[2011-10-12T06:31:02][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24914 projectid: 1
[2011-10-12T06:31:53][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24913 projectid: 1
[2011-10-12T06:33:02][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24915 projectid: 1
[2011-10-12T06:33:49][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24917 projectid: 1
[2011-10-12T06:34:38][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24919 projectid: 1
[2011-10-12T06:35:29][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24918 projectid: 1
[2011-10-12T06:36:39][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24916 projectid: 1
[2011-10-12T06:37:25][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24920 projectid: 1
[2011-10-12T06:38:19][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24921 projectid: 1
[2011-10-12T06:39:04][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24922 projectid: 1
[2011-10-12T06:40:13][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24923 projectid: 1
[2011-10-12T06:40:59][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24924 projectid: 1
[2011-10-12T06:42:01][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24927 projectid: 1
[2011-10-12T06:43:01][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24926 projectid: 1
[2011-10-12T06:44:09][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24925 projectid: 1
[2011-10-12T06:44:55][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24930 projectid: 1
[2011-10-12T06:46:28][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24928 projectid: 1
[2011-10-12T06:47:14][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24929 projectid: 1
[2011-10-12T06:48:46][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24931 projectid: 1
[2011-10-12T06:49:57][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24934 projectid: 1
[2011-10-12T06:51:26][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24932 projectid: 1
[2011-10-12T06:53:23][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24936 projectid: 1
[2011-10-12T06:55:20][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24933 projectid: 1
[2011-10-12T06:56:54][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24937 projectid: 1
[2011-10-12T06:58:47][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24949 projectid: 1
[2011-10-12T06:59:55][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24951 projectid: 1
[2011-10-12T07:00:43][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24950 projectid: 1
[2011-10-12T07:01:38][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24952 projectid: 1

The time of these messages corresponds exactly to the time with
continuously high load.
Does anybody have an idea, what is going on there, and why it takes ages
to delete those old builds?

Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cdash/attachments/20111012/8ed24707/attachment.htm>


More information about the Cdash mailing list