[CDash] Optimizing build2test table

Matějů Miroslav, Ing. Mateju.Miroslav at azd.cz
Tue Aug 9 12:38:52 UTC 2016


Hi,

I have finally found the bottleneck of my CDash server. It is related to the build2test table as I expected. The performance limitation is caused by a suboptimal index.

Most types of queries to build2test table that can be found in CDash sources do filter the rows using WHERE buildid=… AND testid=… Such queries cannot be supported by existing separate indexes buildid and testid but a multi-column index buildid+testid will cover them. (See https://www.percona.com/blog/2009/09/19/multi-column-indexes-vs-index-merge/)

So I changed the buildid index to include also testid column on my testing CDash installation. The index size of build2test table rose by approx. 1 GiB to 5.6 GiB (while data size is 3.3 GiB). From that point, there are much less slow_queries<http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html> and any CDash submissions are handled much faster: The real time<http://stackoverflow.com/a/556411/711006> of my CTest run (with several builds and thousands of tests) using synchronous submission decreased from 152 min to 69 min and no submission exceeds the processing timeout (unlike before). The processing delay of an asynchronous submission series slightly decreased (max. 0.53 hours, compared to more than one hour).

Should I create a patch or is anyone more experienced in the CDash source able to take care of it?

I also have a few side notes which arose when I was examining the problem above:

There is no tag corresponding to the current release 2.2.3 in the official Git repo. I tried master branch but it looks very different from the 2.2.x releases.

The file monitor.php is causing most of the remaining slow queries. The function echo_project_data_sizes() contains pretty complicated queries and makes monitor.php practically useless (i.e. unresponsive) or even harmful when used on a busy DB: It’s slowing down even more and possibly causes exceeding of the processing timeout. This problem disappears when I comment out the call to echo_project_data_sizes(). This function should have its own page and possibly a warning message that it may take long to load and reduce performance.

Best regards,

Miroslav


Ing. Miroslav Matějů
Programmer Analyst

AŽD Praha s.r.o.
Technology Division
Research and Development
Žirovnická 2/3146, 106 17  Prague
Czech Republic
Phone: +420 267 287 476
Web: www.azd.cz<http://www.azd.cz/>

From: Zack Galbreath [mailto:zack.galbreath at kitware.com]
Sent: Thursday, May 19, 2016 4:33 PM
To: Matějů Miroslav, Ing.
Cc: cdash at public.kitware.com<mailto:cdash at public.kitware.com>
Subject: Re: [CDash] Duplicate rows in build2test

On Thu, May 19, 2016 at 9:40 AM, Matějů Miroslav, Ing. <Mateju.Miroslav at azd.cz<mailto:Mateju.Miroslav at azd.cz>> wrote:
Hi,

I am sending a few updates regarding the build2test problem. I copied my DB to a virtual machine and performed a few experiments to reduce the size of build2test table. I was experimenting with an older backup version with the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently while the single table build2test has 11.7 GiB.)

I tried to remove the duplicates using the method from https://stackoverflow.com/a/3312066/711006. First of all, I had to convert the table to MyISAM because of an InnoDB bug (https://stackoverflow.com/a/8053812/711006). The MyISAM version has about one half of the size of InnoDB version.
There were 1.9 million duplicate rows out of 67.6 M and the data size decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to convert the table back to InnoDB yet but I would expect similar inflation of index. So I would not recommend to create the unique index anymore, at least until I check it with InnoDB.

I also found that the column status of type varchar(10) contains just three values. Even the command
    SELECT status FROM build2test PROCEDURE ANALYSE()\G
recommended to change the type to
    ENUM('failed','notrun','passed') NOT NULL

I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing the size of both data and index, at least for InnoDB. However, the MyISAM version grew slightly (from 4.4 GiB to 5 GiB).

I am going to try more optimization methods if time allows.


Thanks for digging into this.  I certainly appreciate it.

Often times I find that you need to run OPTIMIZE TABLE to reclaim disk space after making big changes to the database.  Consider giving that a try if you haven't been doing so already.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cdash/attachments/20160809/bb0b9cdb/attachment.htm>


More information about the CDash mailing list