[vtk-developers] Garbage collection slowness

Tue May 13 10:39:35 EDT 2008

Could this also be related to some garbage collection related 
performance I've seen in the vtkPainter architecture when rendering a 
single multiblock data set with thousands of polydatas?  Adding a 
push/pop around the whole render process speeds it up some.

Clint

Berk Geveci wrote:
> Hi Hank,
>
> I think you are on the right track. I have been suspecting the same
> thing for a while now but have not had the time to profile it. So I am
> glad you are tracking it down :-) I suggest using a real profiler. I
> have been very happy with Shark on the Mac.
>
> I had a conversation with Ken Martin this morning about this. If the
> garbage collection is the problem, he suggested that we deal with the
> reference loop in the pipeline (between producer and consumer) without
> the garbage collector. We can probably do that the old fashioned way
> by overwriting Unregister(). This way, when the garbage collector
> runs, it will only process the other cycles and that should be very
> fast. So the final code may look like:
>
> vtkGarbageCollector::Push()
> for ( 20000 )
>   filt->SetInput(input[i]); // input[i-1] goes away immediately here
>   filt->Update();
>   output[i] = filt->GetOutput()->NewInstance();
>   output[i]->ShallowCopy(filt->GetOutput());
>   input[i]->Delete();
> vtkGarbageCollector::Pop() // this should not do much
>
> -berk
>
> On Tue, May 13, 2008 at 9:47 AM, Hank Childs <childs3 at llnl.gov> wrote:
>   
>>  Hi Berk & Brad,
>>
>>  Thanks very much for your responses; your interest is much appreciated.
>>
>>  To answer one of Brad's question, I am having a problem reproducing the
>> problem with a minimal test program, which probably means something.  I will
>> continue working on this.
>>
>>  I got interested in the garbage collector because I was doing "poor man's
>> profiling" (see footnote [1] below for more explanation) and I kept
>> observing that garbage collection was dominating.
>>
>>  To answer Berk's question, yes, I was unclear.  Here is the general setup:
>> We are very memory conscious, so we dereference the inputs to a filter after
>> the filter has executed (see footnote [2] below for more explanation).  So:
>>
>>  for ( 20000 )
>>    filt->SetInput(input[i]);
>>    filt->Update();
>>    output[i] = filt->GetOutput()->NewInstance();
>>    output[i]->ShallowCopy(filt->GetOutput());
>>
>>  for (20000)
>>    input[i]->Delete();
>>
>>  I found that the Delete() calls were taking a huge amount of time.  So I
>> wrapped them with:
>>
>>  vtkGarbageCollector::Push()
>>  for (20000)
>>    input[i]->Delete();
>>  vtkGarbageCollector::Pop()
>>
>>  and found that that improved the situation (supporting Brad's claim that it
>> should be faster), but it was still taking a long time.  I have also found
>> variation in the run times, meaning that my OS may have a role here too,
>> likely in terms of lots of small allocations and deallocations.
>>
>>  I don't think that I was very clear in my last email, so I want to try
>> again.  My only evidence against garbage collection is that my poor man's
>> profiling shows that my program is doing garbage collection the majority of
>> the time.
>>
>>  Instead of raising the garbage collection issue, the tack I should have
>> taken is to ask why it was taking 47s to delete the output, when it took
>> only 20s to create it, as well as executing the filters.  Of course, I would
>> greatly help my cause here if I could get a simple reproducer that people
>> could sink their teeth into.  So, again, I'll continue pursuing that...
>>
>>  Best,
>>  Hank
>>
>>  [1] poor man's profiling means connecting with a debugger regularly and
>> seeing where the work was happening ... I have had problems getting
>> profilers to run on big software projects and I find this to be somewhat
>> effective.
>>
>>  [2] As you all know, there is a tradeoff between reusing cached results
>> (what VTK does by default) and keeping memory low (what I am doing
>> manually).  Of course, VTK does a good job of minimizing the overhead for
>> reusing cached results by often sharing references between input and output.
>> Regardless, there is often memory associated with the input that is not
>> needed in the output.  For what I'm working on, harvesting that memory is
>> worthwhile.  Also, I mitigate the loss of reusing cached results somewhat by
>> keeping a cache for all I/O (... and I have found that I/O is often the
>> bottleneck).
>>
>>
>>
>>  On May 13, 2008, at 6:15 AM, Berk Geveci wrote:
>>
>>
>>     
>>> Hi Hank,
>>>
>>> Where is the big loop over 20000 items happening? Around the Push/Pop
>>> or inside them?
>>>
>>> -berk
>>>
>>> On Mon, May 12, 2008 at 6:35 PM, Hank Childs <childs3 at llnl.gov> wrote:
>>>
>>>       
>>>>  Hello VTK Developers!
>>>>
>>>>  I am running in serial and am setting up about 20000 pipelines on my
>>>>         
>> serial
>>     
>>>> process for about 20000 chunks of data.
>>>>
>>>>  The runtime has gotten disproportionately large with the large number
>>>>         
>> of
>>     
>>>> chunks and I believe that garbage collection is at least partly to
>>>>         
>> blame.
>>     
>>>>  For example, if I:
>>>>  1) call vtkGarbageCollector::DeferredCollectionPush()
>>>>  2) execute three filters (filters that find external faces and remove
>>>>         
>> ghost
>>     
>>>> data) and,
>>>>  3) call vtkGarbageCollector::DeferredCollectionPop()
>>>>
>>>>  then: the three filters take about 20s total and the
>>>>         
>> DeferredCollectionPop
>>     
>>>> takes about 47s.
>>>>
>>>>  One conclusion that I drew from the fast execution of the three
>>>>         
>> filters, is
>>     
>>>> that iterating through the data is relatively quickly.  Restated, I
>>>>         
>> ruled
>>     
>>>> out thrashing through memory as the reason the garbage collector is
>>>>         
>> taking
>>     
>>>> 47s.
>>>>
>>>>  Also, I should disclose that I am managing the execution manually.  The
>>>> best way to describe it would be that I have one instance of filter A,
>>>>         
>> one
>>     
>>>> instance of filter B, and one instance of filter C and that I route all
>>>>         
>> 20K
>>     
>>>> data sets through filter A, to make 20K new data sets, then route those
>>>>         
>> 20K
>>     
>>>> new data sets through B, and so on.  Also, I know that the alternative
>>>>         
>> is to
>>     
>>>> call "Update()" 20K times, one for each chunk, but I'd prefer not to go
>>>>         
>> down
>>     
>>>> that route, for reasons I can explain if necessary.
>>>>
>>>>  So: can anyone point me to some words of wisdom about a way to manage
>>>>         
>> my
>>     
>>>> data objects so that garbage collection is faster?
>>>>
>>>>  Best regards,
>>>>  Hank_______________________________________________
>>>>  vtk-developers mailing list
>>>>  vtk-developers at vtk.org
>>>>  http://www.vtk.org/mailman/listinfo/vtk-developers
>>>>
>>>>
>>>>         
>>     
> _______________________________________________
> vtk-developers mailing list
> vtk-developers at vtk.org
> http://www.vtk.org/mailman/listinfo/vtk-developers
>