[vtk-developers] Garbage collection slowness
Clinton Stimpson
clinton at elemtech.com
Tue May 13 10:39:35 EDT 2008
Could this also be related to some garbage collection related
performance I've seen in the vtkPainter architecture when rendering a
single multiblock data set with thousands of polydatas? Adding a
push/pop around the whole render process speeds it up some.
Clint
Berk Geveci wrote:
> Hi Hank,
>
> I think you are on the right track. I have been suspecting the same
> thing for a while now but have not had the time to profile it. So I am
> glad you are tracking it down :-) I suggest using a real profiler. I
> have been very happy with Shark on the Mac.
>
> I had a conversation with Ken Martin this morning about this. If the
> garbage collection is the problem, he suggested that we deal with the
> reference loop in the pipeline (between producer and consumer) without
> the garbage collector. We can probably do that the old fashioned way
> by overwriting Unregister(). This way, when the garbage collector
> runs, it will only process the other cycles and that should be very
> fast. So the final code may look like:
>
> vtkGarbageCollector::Push()
> for ( 20000 )
> filt->SetInput(input[i]); // input[i-1] goes away immediately here
> filt->Update();
> output[i] = filt->GetOutput()->NewInstance();
> output[i]->ShallowCopy(filt->GetOutput());
> input[i]->Delete();
> vtkGarbageCollector::Pop() // this should not do much
>
> -berk
>
> On Tue, May 13, 2008 at 9:47 AM, Hank Childs <childs3 at llnl.gov> wrote:
>
>> Hi Berk & Brad,
>>
>> Thanks very much for your responses; your interest is much appreciated.
>>
>> To answer one of Brad's question, I am having a problem reproducing the
>> problem with a minimal test program, which probably means something. I will
>> continue working on this.
>>
>> I got interested in the garbage collector because I was doing "poor man's
>> profiling" (see footnote [1] below for more explanation) and I kept
>> observing that garbage collection was dominating.
>>
>> To answer Berk's question, yes, I was unclear. Here is the general setup:
>> We are very memory conscious, so we dereference the inputs to a filter after
>> the filter has executed (see footnote [2] below for more explanation). So:
>>
>> for ( 20000 )
>> filt->SetInput(input[i]);
>> filt->Update();
>> output[i] = filt->GetOutput()->NewInstance();
>> output[i]->ShallowCopy(filt->GetOutput());
>>
>> for (20000)
>> input[i]->Delete();
>>
>> I found that the Delete() calls were taking a huge amount of time. So I
>> wrapped them with:
>>
>> vtkGarbageCollector::Push()
>> for (20000)
>> input[i]->Delete();
>> vtkGarbageCollector::Pop()
>>
>> and found that that improved the situation (supporting Brad's claim that it
>> should be faster), but it was still taking a long time. I have also found
>> variation in the run times, meaning that my OS may have a role here too,
>> likely in terms of lots of small allocations and deallocations.
>>
>> I don't think that I was very clear in my last email, so I want to try
>> again. My only evidence against garbage collection is that my poor man's
>> profiling shows that my program is doing garbage collection the majority of
>> the time.
>>
>> Instead of raising the garbage collection issue, the tack I should have
>> taken is to ask why it was taking 47s to delete the output, when it took
>> only 20s to create it, as well as executing the filters. Of course, I would
>> greatly help my cause here if I could get a simple reproducer that people
>> could sink their teeth into. So, again, I'll continue pursuing that...
>>
>> Best,
>> Hank
>>
>> [1] poor man's profiling means connecting with a debugger regularly and
>> seeing where the work was happening ... I have had problems getting
>> profilers to run on big software projects and I find this to be somewhat
>> effective.
>>
>> [2] As you all know, there is a tradeoff between reusing cached results
>> (what VTK does by default) and keeping memory low (what I am doing
>> manually). Of course, VTK does a good job of minimizing the overhead for
>> reusing cached results by often sharing references between input and output.
>> Regardless, there is often memory associated with the input that is not
>> needed in the output. For what I'm working on, harvesting that memory is
>> worthwhile. Also, I mitigate the loss of reusing cached results somewhat by
>> keeping a cache for all I/O (... and I have found that I/O is often the
>> bottleneck).
>>
>>
>>
>> On May 13, 2008, at 6:15 AM, Berk Geveci wrote:
>>
>>
>>
>>> Hi Hank,
>>>
>>> Where is the big loop over 20000 items happening? Around the Push/Pop
>>> or inside them?
>>>
>>> -berk
>>>
>>> On Mon, May 12, 2008 at 6:35 PM, Hank Childs <childs3 at llnl.gov> wrote:
>>>
>>>
>>>> Hello VTK Developers!
>>>>
>>>> I am running in serial and am setting up about 20000 pipelines on my
>>>>
>> serial
>>
>>>> process for about 20000 chunks of data.
>>>>
>>>> The runtime has gotten disproportionately large with the large number
>>>>
>> of
>>
>>>> chunks and I believe that garbage collection is at least partly to
>>>>
>> blame.
>>
>>>> For example, if I:
>>>> 1) call vtkGarbageCollector::DeferredCollectionPush()
>>>> 2) execute three filters (filters that find external faces and remove
>>>>
>> ghost
>>
>>>> data) and,
>>>> 3) call vtkGarbageCollector::DeferredCollectionPop()
>>>>
>>>> then: the three filters take about 20s total and the
>>>>
>> DeferredCollectionPop
>>
>>>> takes about 47s.
>>>>
>>>> One conclusion that I drew from the fast execution of the three
>>>>
>> filters, is
>>
>>>> that iterating through the data is relatively quickly. Restated, I
>>>>
>> ruled
>>
>>>> out thrashing through memory as the reason the garbage collector is
>>>>
>> taking
>>
>>>> 47s.
>>>>
>>>> Also, I should disclose that I am managing the execution manually. The
>>>> best way to describe it would be that I have one instance of filter A,
>>>>
>> one
>>
>>>> instance of filter B, and one instance of filter C and that I route all
>>>>
>> 20K
>>
>>>> data sets through filter A, to make 20K new data sets, then route those
>>>>
>> 20K
>>
>>>> new data sets through B, and so on. Also, I know that the alternative
>>>>
>> is to
>>
>>>> call "Update()" 20K times, one for each chunk, but I'd prefer not to go
>>>>
>> down
>>
>>>> that route, for reasons I can explain if necessary.
>>>>
>>>> So: can anyone point me to some words of wisdom about a way to manage
>>>>
>> my
>>
>>>> data objects so that garbage collection is faster?
>>>>
>>>> Best regards,
>>>> Hank_______________________________________________
>>>> vtk-developers mailing list
>>>> vtk-developers at vtk.org
>>>> http://www.vtk.org/mailman/listinfo/vtk-developers
>>>>
>>>>
>>>>
>>
> _______________________________________________
> vtk-developers mailing list
> vtk-developers at vtk.org
> http://www.vtk.org/mailman/listinfo/vtk-developers
>
More information about the vtk-developers
mailing list