[CDash] [EXTERNAL] RE: slow label query

Wolfenbarger, Paul R prwolfe at sandia.gov
Mon May 8 16:56:22 UTC 2017


I found some time and set up a copy of our installation, modified the build2test table as referenced in your mail thread and noted 2 things


1.       I stopped getting duplicate entries in this table.  Since our size is in the 10’s of millions anyway that is a welcome change.

2.       Query times are not changed.  Running explain on that query still shows a write to a temporary table.

I will get a pull request together as I see this change as somewhat helpful, but it does not solve my initial problem

----------------------------------------------------------------------------------------
At the heart of science is an essential tension between two seemingly
contradictory attitudes -- an openness to new ideas, no matter how bizarre
or counterintuitive they may be, and the most ruthless skeptical scrutiny
of all ideas, old and new.  This is how deep truths are winnowed from deep
nonsense.  Of course, scientists make mistakes in trying to understand the
world, but there is a built-in error-correcting mechanism:  The collective
enterprise of creative thinking and skeptical thinking together keeps the
field on track.
-- Carl Sagan, "The Fine Art of Baloney Detection"

Paul R. Wolfenbarger, P.E., M.S.
Computational Simulation Infrastructure
DevOps Product Owner
Sandia National Laboratories
P.O. Box 5800 MS 0845
Albuquerque, NM  87185-0845
505-844-5458 phone
prwolfe at sandia.gov<mailto:prwolfe at sandia.gov>
----------------------------------------------------------------------------------------

From: "Matějů Miroslav, Ing." <Mateju.Miroslav at azd.cz>
Date: Monday, February 27, 2017 at 12:31 AM
To: "Wolfenbarger, Paul R" <prwolfe at sandia.gov>
Cc: "cdash at public.kitware.com" <cdash at public.kitware.com>
Subject: [EXTERNAL] RE: slow label query

Hi Paul,

As far as I know (but I am rather a C++ programmer like you than a DB expert), it is better to let MySQL do as much as possible and avoid the overhead of transfers and conversions between MySQL and PHP.

I think your issue is related to my problem with build2test table performance that I described in the mail thread Optimizing build2test table<http://public.kitware.com/pipermail/cdash/2016-August/001692.html>. Unfortunately, I haven’t been able to test and push the solution yet but it is quite easy to implement in an existing installation: Just add the testid column as the second column to the buildid index for the build2test table. I have done it using PhpMyAdmin on my production machine. It took about 2 hours (for more than 100 million rows) and the submissions (which were the main issue for me) begun to be processed about 3 times faster after that.

Hope this helps,

Ing. Miroslav Matějů
Programmer Analyst

AŽD Praha s.r.o.
Technology Division
Research and Development
Žirovnická 2/3146, 106 17  Prague
Czech Republic
Phone: +420 267 287 476
Web: www.azd.cz<http://www.azd.cz/>

From: CDash [mailto:cdash-bounces at public.kitware.com] On Behalf Of Wolfenbarger, Paul R
Sent: Thursday, February 23, 2017 5:35 PM
To: cdash at public.kitware.com
Subject: [CDash] slow label query

I am hoping for someone with more sql understanding than I have to help with an issue. In api/v1/index.php at line 1004 there is a label query that is causing a bottleneck in my installation

                $label_query_base =
                    "SELECT b2t.status, b2t.newstatus
                    FROM build2test AS b2t
                    INNER JOIN label2test AS l2t ON
                    (l2t.testid=b2t.testid AND l2t.buildid=b2t.buildid)
                    WHERE b2t.buildid = '$buildid' AND
                    l2t.labelid IN $label_ids";
                $label_filter_query = $label_query_base . $limit_sql;
                $labels_result = pdo_query($label_filter_query);

This is returning two non-indexed fields from build2test and joining on indexed fields and since my build2test and label2test tables are each about 43 million records, this takes 20 to 300 seconds each. Multipled by at least 40 builds in the loop and you can see my issue.

As a c++ programmer my first instinct was to split the query and make all the operations work on indexes. I did some queries at the command line and got sub-second returns from each query and the php would now look something like

                $testids = pdo_query(
                  “select testid from label2test
                   where label2test.buildid=$buildid
                   and label2test.labelid in $label_ids") ;
                $test_ids = implode("','", $testids) ;
                $label_filter_query =
                  "select status, newstatus
                   from build2test where buildid = $buildid
                   and testid in $test_ids" ;

However, from my reading it seems that multiple queries of this sort are frowned on in the sql world.  Before I set up a test instance to check this out do any of you have any insight into how to re-structure the query to get the kind of performance I need?

Thanks!

----------------------------------------------------------------------------------------
"If you understand what you're doing, you're not learning anything."
  -- A. L.

Paul R. Wolfenbarger, P.E., M.S.
Computational Simulation Infrastructure
DevOps Product Owner
Sandia National Laboratories
P.O. Box 5800 MS 0845
Albuquerque, NM  87185-0845
505-844-5458 phone
prwolfe at sandia.gov<mailto:prwolfe at sandia.gov>
----------------------------------------------------------------------------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cdash/attachments/20170508/adf2050e/attachment.htm>


More information about the CDash mailing list