[Insight-users] Kd-tree implementation in ITK is buggy and unreliable
Luis Ibanez
luis.ibanez at kitware.com
Mon Apr 28 08:21:49 EDT 2008
Hi Ali,
A quick update on the status of this problem:
Bugs were identified and fixed at the level of the
StatisticsAlgorithms:
* Partition
* QuickSelect
http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date
http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date
Explicit tests were added for
* QuickSelect
* KdTree
* KdTreeGenerator
(about 12 new tests).
These new tests are passing.
There are still side effects in the following (now failing) tests
* BayesianClassifierInitializeTest
* itkSampleMeanSjhiftClusteringFilterTest
* RBFTest1
* ScalarImageKmeansClassifierTest
They are currently timing-out, (probably due to infinite-loops)
We suspect that they are related to the WeightedCentroidKdTreeGenerator
(but that is only early speculation).
We are now tracking those issues...
Luis
--------------------
Luis Ibanez wrote:
>
> Hi Bill,
>
> Yeap, the problem was in the combination of QuickSelect and Partition.
>
> Both of them are defined in
>
> Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
>
> Sorting the list before QuickSelect would only work when the measurement
> vector has a single dimension. Otherwise, a single sort can not possibly
> organize the list in all dimensions. We could call sort just before
> Partition, to sort the activeDimension, but this is an overkill, since
> Partition is indeed intended for parforming a partial sorting.
>
>
> I have replaced both the QuickSelect and Partition functions with
> implementations based on the description of the Partition and
> QuickSelect algorithms in the Wikipedia:
>
> http://en.wikipedia.org/wiki/Quickselect
>
> This works fine for KdTreeTest1 and KdTreeTest2.
>
>
> However, the test for the StatisticsAlgorithms itself fails (timesout)
> due to the fact that its list contains multiple entries with
> values equal to the partition value. A case that is not considered
> in the current algorithm. The current algorithm enters into an
> infinite loop in this case.
>
> I'm modifying the implementation for covering that case.
>
> --
>
> BTW, about 10 test more were added, for exercising the KdTree
> at different bucket sizes.
>
>
> To be revisited: The tests in Borland are producing floating-point
> overflows... not a good sign. There are possible
> other issues in the computation.
>
>
>
> Luis
>
>
>
> ---------------------
> Bill Lorensen wrote:
>
>> Luis,
>>
>> I think the main problem is that the samples need to be sorted. I added:
>> HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
>> endIndex);
>> just before the QuickSelect call and itkKdTreeTest1 passes
>> consistently.
>>
>> However, this is not a good fix. I think the QuickSelect code should
>> work. Probably the error is in Partition. It should group a list into
>> two parts, one less than the partition value and another greater than
>> or equal the value. Currently it looks like Partition assumes the list
>> is sorted.
>>
>> BTW, itkKdTreeTest2 produces a tree different from the one in wikipedia.
>>
>> HTH,
>> Bill
>>
>> On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez
>> <luis.ibanez at kitware.com> wrote:
>>
>>> Hi Ali,
>>>
>>>
>>> An update on the KdTree segmentation fault
>>> (that may or may not be related to the problem that you observe).
>>>
>>>
>>> This test is running the example tree provided in the Wikipedia
>>> http://en.wikipedia.org/wiki/Kdtree
>>> http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
>>>
>>> We are feeding the set of points:
>>>
>>> (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
>>>
>>> and expecting to get the tree:
>>> http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
>>>
>>> (7,2)
>>> /\
>>> / \
>>> / \
>>> (5,4) (9,6)
>>> /\ \
>>> / \ \
>>> / \ \
>>> (2,3) (4,7) (8,1)
>>>
>>>
>>>
>>> Here is what we currently observe as output for different
>>> bucket sizes:
>>>
>>> -----
>>>
>>> Case BucketSize == 1:
>>>
>>> The segmentation fault of itkKdTreeTest2 with BucketSize==1
>>> happens because the recursive function:
>>>
>>> KdTreeGenerator::GenerateTreeLoop() (line 179)
>>>
>>> ends up calling itself indifinitely, until it overflows the stack.
>>> It enter in an infinite (recursive) loop calling:
>>>
>>> GenerateTreeLoop( 0, 0, level++)
>>> GenerateTreeLoop( 0, 2, level++)
>>> GenerateTreeLoop( 0, 0, level++)
>>> GenerateTreeLoop( 0, 2, level++)
>>> GenerateTreeLoop( 0, 0, level++)
>>> GenerateTreeLoop( 0, 2, level++)
>>> ....
>>>
>>> where "0" and "2" here are the indexes of the points in the sample.
>>>
>>> -----
>>>
>>> Case BucketSize == 2:
>>>
>>> The segmentation fault of itkKdTreeTest2 with BucketSize==2
>>> happens also because the recursive function:
>>>
>>> KdTreeGenerator::GenerateTreeLoop() (line 179)
>>>
>>> ends up calling itself indifinitely, until it overflows the stack.
>>> It enter in an infinite (recursive) loop calling:
>>>
>>> GenerateTreeLoop( 3, 3, level++)
>>> GenerateTreeLoop( 3, 6, level++)
>>> GenerateTreeLoop( 3, 3, level++)
>>> GenerateTreeLoop( 3, 6, level++)
>>> GenerateTreeLoop( 3, 3, level++)
>>> GenerateTreeLoop( 3, 6, level++)
>>> ....
>>>
>>> where "3" and "6" here are the indexes of the points in the sample.
>>>
>>> -----
>>>
>>> Case BucketSize == 3:
>>>
>>> When using a bucket size of 3, the function GenerateTreeLoop()
>>> is being called only once, but it generates the following tree
>>>
>>> NTN
>>> /\
>>> / \
>>> / \
>>> [(2,3),(4,7)] NTN
>>> /\
>>> / \
>>> / \
>>> [(8,1)] [ (7,2) (5,4) (9,6)]
>>>
>>>
>>> Where "NTN" stands for "non-terminal node".
>>> and the groups with parenthesis "[ ]" represent buckets.
>>>
>>>
>>> That is a first plane dividing the space at X=4.5
>>> One bucket on the left for points with X values < 4.5
>>> and on the right, a second plane dividing the space
>>> at Y = 1.5, to create on bucket on the left with (8,1)
>>> and on the right another bucket with points with Y values
>>> larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
>>>
>>>
>>> -----
>>>
>>> Case BucketSize == 4:
>>>
>>> When using a bucket size of 4 , the function GenerateTreeLoop()
>>> is being called three times, and it generates the following tree
>>>
>>>
>>> NTN
>>> /\
>>> / \
>>> / \
>>> [(2,3),(4,7)] [(8,1) (7,2) (5,4) (9,6)]
>>>
>>>
>>> That is a single plane dividing the space at X=4.5
>>>
>>>
>>> ----------
>>>
>>> In either case,
>>> the partition is not what we could expect from a KdTree...
>>>
>>>
>>> Just to clarify, the real problem seems to be in the
>>> implementation of the algorithm in the KdTreeGenerator,
>>> not in the KdTree class itself...
>>>
>>>
>>>
>>> Luis
>>>
>>>
>>>
>>> -------------------
>>>
>>>
>>> Luis Ibanez wrote:
>>>
>>>> Hi Ali,
>>>>
>>>> Thanks for the additinonal information.
>>>>
>>>> Yeap, the algorithm should work regardless
>>>> of the bucket size. That one was a false alarm.
>>>>
>>>>
>>>> I'm tracking now the cases where itkKdTreeTest2 segfaults
>>>> with a bucket size == 2. Hopefully this will illuminate
>>>> at least part of the problem.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Luis
>>>>
>>>>
>>>> ---------------
>>>> Ali - wrote:
>>>>
>>>>
>>>>
>>>>> Luis,
>>>>>
>>>>> I used a bucket size of 16 with about 100 particles. Like you
>>>>> mentioned,
>>>
>>>
>>> the algorithm must work fine independent of these parameters.
>>>
>>>>>
>>>>> -Ali
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi Ali,
>>>>>>
>>>>>>
>>>>>> Could you please report
>>>>>>
>>>>>>
>>>>>> 1) How many points (measurement vectors) are you
>>>>>> currently feeding into the KdTreeGenerator ?
>>>>>>
>>>>>> and
>>>>>>
>>>>>>
>>>>>> 2) What value of "BucketSize" did you use ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> The problem reported in the bug
>>>>>> http://public.kitware.com/Bug/view.php?id=5082
>>>>>>
>>>>>> Seems to be simply the result of a poor choice
>>>>>> on the value of the bucket size parameter.
>>>>>> For the test case reported in that bug, increasing
>>>>>> the value from 16 to 30 solves the problem.
>>>>>>
>>>>>> We are wondering if the problem that you are
>>>>>> experiencing is also the result of poor parameter
>>>>>> setting.
>>>>>>
>>>>>>
>>>>>> Please let us know,
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> Luis
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------
>>>>>> Ali - wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> A quick search in the mailing list shows that, during the past half
>>>
>>>
>>> a decade, it has been reported many times that the kd-tree classes in
>>> ITK
>>> are buggy. Writing some library based on ITK, it took me a few days
>>> to find
>>> out that the bug in my library is actually introduced by the kd-tree
>>> implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
>>>
>>>>>>> I have no idea, against many warnings from the users, why the bug
>>>
>>>
>>> has not been resolved yet. In the case it is difficult to address the
>>> bug,
>>> one option is to wrap many other existing implementations, personally I
>>> switched to libkdtree++.
>>>
>>>>>>> _________________________________________________________________
>>>>>>> 100's of prizes to be won at BigSnapSearch.com
>>>
>>>
>>> http://www.bigsnapsearch.com
>>>
>>>>>>> _______________________________________________
>>>>>>> Insight-users mailing list
>>>>>>> Insight-users at itk.org
>>>>>>> http://www.itk.org/mailman/listinfo/insight-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _________________________________________________________________
>>>>> 100's of prizes to be won at BigSnapSearch.com
>>>
>>>
>>> http://www.bigsnapsearch.com
>>>
>>>>
>>>>
>>> _______________________________________________
>>> Insight-users mailing list
>>> Insight-users at itk.org
>>> http://www.itk.org/mailman/listinfo/insight-users
>>>
>>
>>
>
More information about the Insight-users
mailing list