From Mateju.Miroslav at azd.cz Wed May 11 07:46:18 2016 From: Mateju.Miroslav at azd.cz (=?iso-8859-2?Q?Mat=ECj=F9_Miroslav=2C_Ing=2E?=) Date: Wed, 11 May 2016 11:46:18 +0000 Subject: [CDash] Duplicate rows in build2test Message-ID: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Hi, I have discovered duplicate rows in the build2test table which is the biggest table in my CDash database. The query SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; returned many entries with the highest reported count of 11. Shouldn't the pair buildid + testid be unique? I tried to check data of a few reported entries and all the rows appeared to have same values in all columns. The duplicates may come from a wrongly configured submission processing: The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash tried to asynchronously process big submissions again and again. However, the duplication could be prevented most likely if the table had a PRIMARY KEY (or UNIQUE constraint) defined. Currently it has none. Could you recommend any actions I could perform? Is the best to let Autoremove take care of it (which is currently disabled in my case)? Thanks in advance! Miroslav Mat?j? From zack.galbreath at kitware.com Thu May 12 14:06:01 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 12 May 2016 14:06:01 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Wed, May 11, 2016 at 7:46 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > I have discovered duplicate rows in the build2test table which is the > biggest table in my CDash database. The query > SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP > BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; > returned many entries with the highest reported count of 11. Shouldn't the > pair buildid + testid be unique? I tried to check data of a few reported > entries and all the rows appeared to have same values in all columns. > I think you're right that a unique constraint would be appropriate here. I'll run some experiments to see what the resulting Test.xml file looks like when using ctest's --repeat-until-fail command-line option. That's the only potential case I can think of where it might make sense for a single build to contain multiple copies of the same test(s). > The duplicates may come from a wrongly configured submission processing: > The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash > tried to asynchronously process big submissions again and again. However, > the duplication could be prevented most likely if the table had a PRIMARY > KEY (or UNIQUE constraint) defined. Currently it has none. > You're probably right. If the same Test.xml gets processed more than once, then you'll likely end up with duplicate rows in build2test (as you are experiencing). Another way this could occur is if you're running the same Nightly build multiple times per day. CDash used to automatically delete a nightly build when a new one with the same name and timestamp was received. This was a headache for parallel submission processing, so instead you now have to mark such a build as 'done' for it to automatically be removed when a new copy is submitted. > Could you recommend any actions I could perform? Is the best to let > Autoremove take care of it (which is currently disabled in my case)? > Auto-remove will certainly help keep your database from growing without bounds. If you're feeling adventurous, you could experiment with a unique constraint on the build2test table. Otherwise I'll try to look into this as time permits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mateju.Miroslav at azd.cz Thu May 19 09:40:09 2016 From: Mateju.Miroslav at azd.cz (=?utf-8?B?TWF0xJtqxa8gTWlyb3NsYXYsIEluZy4=?=) Date: Thu, 19 May 2016 13:40:09 +0000 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: Hi, I am sending a few updates regarding the build2test problem. I copied my DB to a virtual machine and performed a few experiments to reduce the size of build2test table. I was experimenting with an older backup version with the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently while the single table build2test has 11.7 GiB.) I tried to remove the duplicates using the method from https://stackoverflow.com/a/3312066/711006. First of all, I had to convert the table to MyISAM because of an InnoDB bug (https://stackoverflow.com/a/8053812/711006). The MyISAM version has about one half of the size of InnoDB version. There were 1.9 million duplicate rows out of 67.6 M and the data size decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to convert the table back to InnoDB yet but I would expect similar inflation of index. So I would not recommend to create the unique index anymore, at least until I check it with InnoDB. I also found that the column status of type varchar(10) contains just three values. Even the command SELECT status FROM build2test PROCEDURE ANALYSE()\G recommended to change the type to ENUM('failed','notrun','passed') NOT NULL I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing the size of both data and index, at least for InnoDB. However, the MyISAM version grew slightly (from 4.4 GiB to 5 GiB). I am going to try more optimization methods if time allows. Best regards, Miroslav Mat?j? -------------- next part -------------- An HTML attachment was scrubbed... URL: From zack.galbreath at kitware.com Thu May 19 10:32:50 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 19 May 2016 10:32:50 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Thu, May 19, 2016 at 9:40 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > > > I am sending a few updates regarding the build2test problem. I copied my > DB to a virtual machine and performed a few experiments to reduce the size > of build2test table. I was experimenting with an older backup version with > the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently > while the single table build2test has 11.7 GiB.) > > > > I tried to remove the duplicates using the method from > https://stackoverflow.com/a/3312066/711006. First of all, I had to > convert the table to MyISAM because of an InnoDB bug ( > https://stackoverflow.com/a/8053812/711006). The MyISAM version has about > one half of the size of InnoDB version. > > There were 1.9 million duplicate rows out of 67.6 M and the data size > decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique > index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to > convert the table back to InnoDB yet but I would expect similar inflation > of index. So I would not recommend to create the unique index anymore, at > least until I check it with InnoDB. > > > > I also found that the column status of type varchar(10) contains just > three values. Even the command > > SELECT status FROM build2test PROCEDURE ANALYSE()\G > > recommended to change the type to > > ENUM('failed','notrun','passed') NOT NULL > > > > I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing > the size of both data and index, at least for InnoDB. However, the MyISAM > version grew slightly (from 4.4 GiB to 5 GiB). > > > > I am going to try more optimization methods if time allows. > Thanks for digging into this. I certainly appreciate it. Often times I find that you need to run OPTIMIZE TABLE to reclaim disk space after making big changes to the database. Consider giving that a try if you haven't been doing so already. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mateju.Miroslav at azd.cz Wed May 11 11:46:18 2016 From: Mateju.Miroslav at azd.cz (=?iso-8859-2?Q?Mat=ECj=F9_Miroslav=2C_Ing=2E?=) Date: Wed, 11 May 2016 11:46:18 +0000 Subject: [CDash] Duplicate rows in build2test Message-ID: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Hi, I have discovered duplicate rows in the build2test table which is the biggest table in my CDash database. The query SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; returned many entries with the highest reported count of 11. Shouldn't the pair buildid + testid be unique? I tried to check data of a few reported entries and all the rows appeared to have same values in all columns. The duplicates may come from a wrongly configured submission processing: The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash tried to asynchronously process big submissions again and again. However, the duplication could be prevented most likely if the table had a PRIMARY KEY (or UNIQUE constraint) defined. Currently it has none. Could you recommend any actions I could perform? Is the best to let Autoremove take care of it (which is currently disabled in my case)? Thanks in advance! Miroslav Mat?j? From zack.galbreath at kitware.com Thu May 12 18:06:01 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 12 May 2016 14:06:01 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Wed, May 11, 2016 at 7:46 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > I have discovered duplicate rows in the build2test table which is the > biggest table in my CDash database. The query > SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP > BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; > returned many entries with the highest reported count of 11. Shouldn't the > pair buildid + testid be unique? I tried to check data of a few reported > entries and all the rows appeared to have same values in all columns. > I think you're right that a unique constraint would be appropriate here. I'll run some experiments to see what the resulting Test.xml file looks like when using ctest's --repeat-until-fail command-line option. That's the only potential case I can think of where it might make sense for a single build to contain multiple copies of the same test(s). > The duplicates may come from a wrongly configured submission processing: > The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash > tried to asynchronously process big submissions again and again. However, > the duplication could be prevented most likely if the table had a PRIMARY > KEY (or UNIQUE constraint) defined. Currently it has none. > You're probably right. If the same Test.xml gets processed more than once, then you'll likely end up with duplicate rows in build2test (as you are experiencing). Another way this could occur is if you're running the same Nightly build multiple times per day. CDash used to automatically delete a nightly build when a new one with the same name and timestamp was received. This was a headache for parallel submission processing, so instead you now have to mark such a build as 'done' for it to automatically be removed when a new copy is submitted. > Could you recommend any actions I could perform? Is the best to let > Autoremove take care of it (which is currently disabled in my case)? > Auto-remove will certainly help keep your database from growing without bounds. If you're feeling adventurous, you could experiment with a unique constraint on the build2test table. Otherwise I'll try to look into this as time permits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mateju.Miroslav at azd.cz Thu May 19 13:40:09 2016 From: Mateju.Miroslav at azd.cz (=?utf-8?B?TWF0xJtqxa8gTWlyb3NsYXYsIEluZy4=?=) Date: Thu, 19 May 2016 13:40:09 +0000 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: Hi, I am sending a few updates regarding the build2test problem. I copied my DB to a virtual machine and performed a few experiments to reduce the size of build2test table. I was experimenting with an older backup version with the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently while the single table build2test has 11.7 GiB.) I tried to remove the duplicates using the method from https://stackoverflow.com/a/3312066/711006. First of all, I had to convert the table to MyISAM because of an InnoDB bug (https://stackoverflow.com/a/8053812/711006). The MyISAM version has about one half of the size of InnoDB version. There were 1.9 million duplicate rows out of 67.6 M and the data size decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to convert the table back to InnoDB yet but I would expect similar inflation of index. So I would not recommend to create the unique index anymore, at least until I check it with InnoDB. I also found that the column status of type varchar(10) contains just three values. Even the command SELECT status FROM build2test PROCEDURE ANALYSE()\G recommended to change the type to ENUM('failed','notrun','passed') NOT NULL I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing the size of both data and index, at least for InnoDB. However, the MyISAM version grew slightly (from 4.4 GiB to 5 GiB). I am going to try more optimization methods if time allows. Best regards, Miroslav Mat?j? -------------- next part -------------- An HTML attachment was scrubbed... URL: From zack.galbreath at kitware.com Thu May 19 14:32:50 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 19 May 2016 10:32:50 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Thu, May 19, 2016 at 9:40 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > > > I am sending a few updates regarding the build2test problem. I copied my > DB to a virtual machine and performed a few experiments to reduce the size > of build2test table. I was experimenting with an older backup version with > the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently > while the single table build2test has 11.7 GiB.) > > > > I tried to remove the duplicates using the method from > https://stackoverflow.com/a/3312066/711006. First of all, I had to > convert the table to MyISAM because of an InnoDB bug ( > https://stackoverflow.com/a/8053812/711006). The MyISAM version has about > one half of the size of InnoDB version. > > There were 1.9 million duplicate rows out of 67.6 M and the data size > decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique > index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to > convert the table back to InnoDB yet but I would expect similar inflation > of index. So I would not recommend to create the unique index anymore, at > least until I check it with InnoDB. > > > > I also found that the column status of type varchar(10) contains just > three values. Even the command > > SELECT status FROM build2test PROCEDURE ANALYSE()\G > > recommended to change the type to > > ENUM('failed','notrun','passed') NOT NULL > > > > I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing > the size of both data and index, at least for InnoDB. However, the MyISAM > version grew slightly (from 4.4 GiB to 5 GiB). > > > > I am going to try more optimization methods if time allows. > Thanks for digging into this. I certainly appreciate it. Often times I find that you need to run OPTIMIZE TABLE to reclaim disk space after making big changes to the database. Consider giving that a try if you haven't been doing so already. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mateju.Miroslav at azd.cz Wed May 11 11:46:18 2016 From: Mateju.Miroslav at azd.cz (=?iso-8859-2?Q?Mat=ECj=F9_Miroslav=2C_Ing=2E?=) Date: Wed, 11 May 2016 11:46:18 +0000 Subject: [CDash] Duplicate rows in build2test Message-ID: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Hi, I have discovered duplicate rows in the build2test table which is the biggest table in my CDash database. The query SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; returned many entries with the highest reported count of 11. Shouldn't the pair buildid + testid be unique? I tried to check data of a few reported entries and all the rows appeared to have same values in all columns. The duplicates may come from a wrongly configured submission processing: The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash tried to asynchronously process big submissions again and again. However, the duplication could be prevented most likely if the table had a PRIMARY KEY (or UNIQUE constraint) defined. Currently it has none. Could you recommend any actions I could perform? Is the best to let Autoremove take care of it (which is currently disabled in my case)? Thanks in advance! Miroslav Mat?j? From zack.galbreath at kitware.com Thu May 12 18:06:01 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 12 May 2016 14:06:01 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Wed, May 11, 2016 at 7:46 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > I have discovered duplicate rows in the build2test table which is the > biggest table in my CDash database. The query > SELECT `buildid`, `testid`, COUNT(`buildid`) c FROM `build2test` GROUP > BY `buildid`, `testid` HAVING c > 1 ORDER BY c DESC; > returned many entries with the highest reported count of 11. Shouldn't the > pair buildid + testid be unique? I tried to check data of a few reported > entries and all the rows appeared to have same values in all columns. > I think you're right that a unique constraint would be appropriate here. I'll run some experiments to see what the resulting Test.xml file looks like when using ctest's --repeat-until-fail command-line option. That's the only potential case I can think of where it might make sense for a single build to contain multiple copies of the same test(s). > The duplicates may come from a wrongly configured submission processing: > The default CDASH_SUBMISSION_PROCESSING_TIME_LIMIT was too low and CDash > tried to asynchronously process big submissions again and again. However, > the duplication could be prevented most likely if the table had a PRIMARY > KEY (or UNIQUE constraint) defined. Currently it has none. > You're probably right. If the same Test.xml gets processed more than once, then you'll likely end up with duplicate rows in build2test (as you are experiencing). Another way this could occur is if you're running the same Nightly build multiple times per day. CDash used to automatically delete a nightly build when a new one with the same name and timestamp was received. This was a headache for parallel submission processing, so instead you now have to mark such a build as 'done' for it to automatically be removed when a new copy is submitted. > Could you recommend any actions I could perform? Is the best to let > Autoremove take care of it (which is currently disabled in my case)? > Auto-remove will certainly help keep your database from growing without bounds. If you're feeling adventurous, you could experiment with a unique constraint on the build2test table. Otherwise I'll try to look into this as time permits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mateju.Miroslav at azd.cz Thu May 19 13:40:09 2016 From: Mateju.Miroslav at azd.cz (=?utf-8?B?TWF0xJtqxa8gTWlyb3NsYXYsIEluZy4=?=) Date: Thu, 19 May 2016 13:40:09 +0000 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: Hi, I am sending a few updates regarding the build2test problem. I copied my DB to a virtual machine and performed a few experiments to reduce the size of build2test table. I was experimenting with an older backup version with the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently while the single table build2test has 11.7 GiB.) I tried to remove the duplicates using the method from https://stackoverflow.com/a/3312066/711006. First of all, I had to convert the table to MyISAM because of an InnoDB bug (https://stackoverflow.com/a/8053812/711006). The MyISAM version has about one half of the size of InnoDB version. There were 1.9 million duplicate rows out of 67.6 M and the data size decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to convert the table back to InnoDB yet but I would expect similar inflation of index. So I would not recommend to create the unique index anymore, at least until I check it with InnoDB. I also found that the column status of type varchar(10) contains just three values. Even the command SELECT status FROM build2test PROCEDURE ANALYSE()\G recommended to change the type to ENUM('failed','notrun','passed') NOT NULL I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing the size of both data and index, at least for InnoDB. However, the MyISAM version grew slightly (from 4.4 GiB to 5 GiB). I am going to try more optimization methods if time allows. Best regards, Miroslav Mat?j? -------------- next part -------------- An HTML attachment was scrubbed... URL: From zack.galbreath at kitware.com Thu May 19 14:32:50 2016 From: zack.galbreath at kitware.com (Zack Galbreath) Date: Thu, 19 May 2016 10:32:50 -0400 Subject: [CDash] Duplicate rows in build2test In-Reply-To: References: <856010006cd14dd5be99779cc2d04dce@azdexchstore3.AZD.LOCAL> Message-ID: On Thu, May 19, 2016 at 9:40 AM, Mat?j? Miroslav, Ing. < Mateju.Miroslav at azd.cz> wrote: > Hi, > > > > I am sending a few updates regarding the build2test problem. I copied my > DB to a virtual machine and performed a few experiments to reduce the size > of build2test table. I was experimenting with an older backup version with > the size of 9.3 GiB. (The DB on my production server has 17.5 GiB currently > while the single table build2test has 11.7 GiB.) > > > > I tried to remove the duplicates using the method from > https://stackoverflow.com/a/3312066/711006. First of all, I had to > convert the table to MyISAM because of an InnoDB bug ( > https://stackoverflow.com/a/8053812/711006). The MyISAM version has about > one half of the size of InnoDB version. > > There were 1.9 million duplicate rows out of 67.6 M and the data size > decreased slightly from 1.3 GiB to 1.2 GiB while creation of the new unique > index caused the index grow from 2.3 GiB to 3.2 GiB. I have not tried to > convert the table back to InnoDB yet but I would expect similar inflation > of index. So I would not recommend to create the unique index anymore, at > least until I check it with InnoDB. > > > > I also found that the column status of type varchar(10) contains just > three values. Even the command > > SELECT status FROM build2test PROCEDURE ANALYSE()\G > > recommended to change the type to > > ENUM('failed','notrun','passed') NOT NULL > > > > I tried it and the table size decreased from 9.3 GiB to 8.5 GiB reducing > the size of both data and index, at least for InnoDB. However, the MyISAM > version grew slightly (from 4.4 GiB to 5 GiB). > > > > I am going to try more optimization methods if time allows. > Thanks for digging into this. I certainly appreciate it. Often times I find that you need to run OPTIMIZE TABLE to reclaim disk space after making big changes to the database. Consider giving that a try if you haven't been doing so already. -------------- next part -------------- An HTML attachment was scrubbed... URL: