[Cdash] CDash causing high load and errors when deleting builds

Julien Jomier julien.jomier at kitware.com
Thu Oct 13 10:22:14 UTC 2011


Hi Martin,

We actually noticed the same things on a couple of CDash instances. The 
main issue is that deleting a build requires a lot of table locks 
because a lot of rows have to be deleted from many tables. The current 
CDash SVN is using a multi-delete mechanism which takes more memory but 
shoudl increase the speed of the delete.

Do you know if you are using MyISAM or InnoDB (or others) as your SQL 
storage engine?

Julien

On 12/10/2011 10:44, Martin Apel wrote:
> Hi all,
>
> I have been chasing a strange phenomenon with CDash for some weeks now
> and have gathered a few facts by now:
>
> The visible effects were, that CDash was regularly (but not
> reproducably) not accepting submissions from CMake. This mostly happened
> such, that
> the Build.xml file was accepted, but the subsequently submitted
> Configure.xml and Test.xml were not. The clients had messages in their
> log files such as "Operation too slow. Less than 1 bytes/sec transfered
> the last 120 seconds".
>
> Searching for the cause I found out that the server machine (a Linux box
> with Debian Lenny running CDash 1.8.2 on Apache 2.2.9) had a CPU load of
> 100 % for about three hours while deleting old builds during the night.
> The process causing the load is mysqld, which performs nearly no I/O,
> but consumes a full CPU. During this time, it seems that network packets
> were dropped.
>
> After some more investigation I found the CDash configuration option
> CDASH_ASYNCHRONOUS_SUBMISSION, which was previously set to false.
> I set this to true a few days ago and since then, no submissions were
> lost anymore. Anyhow the high load caused by deleting those builds still
> remains and tonight the CDash log showed the following messages:
>
> [2011-10-12T03:50:38][INFO][pid=24652](removeFirstBuilds): about to query for builds to remove
> [2011-10-12T03:50:38][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23097 projectid: 1
> [2011-10-12T03:58:05][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23099 projectid: 1
> [2011-10-12T04:07:33][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23092 projectid: 1
> [2011-10-12T04:17:00][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23094 projectid: 1
> [2011-10-12T04:24:28][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23103 projectid: 1
> [2011-10-12T04:33:52][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23098 projectid: 1
> [2011-10-12T04:34:00][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23119 projectid: 1
> [2011-10-12T04:43:27][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23120 projectid: 1
> [2011-10-12T04:52:58][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23121 projectid: 1
> [2011-10-12T05:02:03][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23123 projectid: 1
> [2011-10-12T05:11:40][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23122 projectid: 1
> [2011-10-12T05:19:13][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23124 projectid: 1
> [2011-10-12T05:26:39][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23131 projectid: 1
> [2011-10-12T05:40:03][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23127 projectid: 1
> [2011-10-12T05:55:18][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23126 projectid: 1
> [2011-10-12T06:00:03][ERROR][pid=23419](taking lock: projectid=1, other processor pid='22563' apparently stalled, lastupdated='2011-10-12 03:51:34'): AcquireProcessingLock
> [2011-10-12T06:00:03][ERROR][pid=23419](1 submission records assumed stalled, reset to status=0): ResetApparentlyStalledSubmissions
> [2011-10-12T06:00:41][INFO][pid=22563](pid '22563' does not own lock anymore: abandoning loop...): ProcessSubmissions
> [2011-10-12T06:00:41][ERROR][pid=22563](lock not released, unexpected pid mismatch: pid='23419' mypid='22563' - attempt to unlock a lock we don't own...): ReleaseProcessingLock
> [2011-10-12T06:02:46][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23128 projectid: 1
> [2011-10-12T06:12:08][INFO][pid=24652](removeFirstBuilds): removing old buildid: 23125 projectid: 1
> [2011-10-12T06:12:15][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24891 projectid: 1
> [2011-10-12T06:13:22][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24893 projectid: 1
> [2011-10-12T06:14:08][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24892 projectid: 1
> [2011-10-12T06:14:53][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24894 projectid: 1
> [2011-10-12T06:15:40][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24896 projectid: 1
> [2011-10-12T06:16:26][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24897 projectid: 1
> [2011-10-12T06:17:34][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24899 projectid: 1
> [2011-10-12T06:18:28][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24898 projectid: 1
> [2011-10-12T06:19:17][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24901 projectid: 1
> [2011-10-12T06:20:03][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24900 projectid: 1
> [2011-10-12T06:21:11][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24902 projectid: 1
> [2011-10-12T06:21:58][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24908 projectid: 1
> [2011-10-12T06:22:49][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24903 projectid: 1
> [2011-10-12T06:23:34][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24904 projectid: 1
> [2011-10-12T06:24:41][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24906 projectid: 1
> [2011-10-12T06:25:31][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24905 projectid: 1
> [2011-10-12T06:26:16][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24907 projectid: 1
> [2011-10-12T06:27:24][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24909 projectid: 1
> [2011-10-12T06:28:14][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24912 projectid: 1
> [2011-10-12T06:29:10][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24911 projectid: 1
> [2011-10-12T06:30:17][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24910 projectid: 1
> [2011-10-12T06:31:02][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24914 projectid: 1
> [2011-10-12T06:31:53][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24913 projectid: 1
> [2011-10-12T06:33:02][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24915 projectid: 1
> [2011-10-12T06:33:49][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24917 projectid: 1
> [2011-10-12T06:34:38][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24919 projectid: 1
> [2011-10-12T06:35:29][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24918 projectid: 1
> [2011-10-12T06:36:39][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24916 projectid: 1
> [2011-10-12T06:37:25][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24920 projectid: 1
> [2011-10-12T06:38:19][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24921 projectid: 1
> [2011-10-12T06:39:04][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24922 projectid: 1
> [2011-10-12T06:40:13][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24923 projectid: 1
> [2011-10-12T06:40:59][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24924 projectid: 1
> [2011-10-12T06:42:01][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24927 projectid: 1
> [2011-10-12T06:43:01][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24926 projectid: 1
> [2011-10-12T06:44:09][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24925 projectid: 1
> [2011-10-12T06:44:55][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24930 projectid: 1
> [2011-10-12T06:46:28][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24928 projectid: 1
> [2011-10-12T06:47:14][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24929 projectid: 1
> [2011-10-12T06:48:46][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24931 projectid: 1
> [2011-10-12T06:49:57][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24934 projectid: 1
> [2011-10-12T06:51:26][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24932 projectid: 1
> [2011-10-12T06:53:23][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24936 projectid: 1
> [2011-10-12T06:55:20][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24933 projectid: 1
> [2011-10-12T06:56:54][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24937 projectid: 1
> [2011-10-12T06:58:47][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24949 projectid: 1
> [2011-10-12T06:59:55][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24951 projectid: 1
> [2011-10-12T07:00:43][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24950 projectid: 1
> [2011-10-12T07:01:38][INFO][pid=24652](removeBuildsGroupwise): removing old buildid: 24952 projectid: 1
>
> The time of these messages corresponds exactly to the time with
> continuously high load.
> Does anybody have an idea, what is going on there, and why it takes ages
> to delete those old builds?
>
> Martin
>
>
> _______________________________________________
> Cdash mailing list
> Cdash at public.kitware.com
> http://public.kitware.com/cgi-bin/mailman/listinfo/cdash



More information about the CDash mailing list