[vtk-developers] Garbage collection slowness

Berk Geveci berk.geveci at kitware.com
Tue May 13 10:21:43 EDT 2008


Hi Hank,

I think you are on the right track. I have been suspecting the same
thing for a while now but have not had the time to profile it. So I am
glad you are tracking it down :-) I suggest using a real profiler. I
have been very happy with Shark on the Mac.

I had a conversation with Ken Martin this morning about this. If the
garbage collection is the problem, he suggested that we deal with the
reference loop in the pipeline (between producer and consumer) without
the garbage collector. We can probably do that the old fashioned way
by overwriting Unregister(). This way, when the garbage collector
runs, it will only process the other cycles and that should be very
fast. So the final code may look like:

vtkGarbageCollector::Push()
for ( 20000 )
  filt->SetInput(input[i]); // input[i-1] goes away immediately here
  filt->Update();
  output[i] = filt->GetOutput()->NewInstance();
  output[i]->ShallowCopy(filt->GetOutput());
  input[i]->Delete();
vtkGarbageCollector::Pop() // this should not do much

-berk

On Tue, May 13, 2008 at 9:47 AM, Hank Childs <childs3 at llnl.gov> wrote:
>
>  Hi Berk & Brad,
>
>  Thanks very much for your responses; your interest is much appreciated.
>
>  To answer one of Brad's question, I am having a problem reproducing the
> problem with a minimal test program, which probably means something.  I will
> continue working on this.
>
>  I got interested in the garbage collector because I was doing "poor man's
> profiling" (see footnote [1] below for more explanation) and I kept
> observing that garbage collection was dominating.
>
>  To answer Berk's question, yes, I was unclear.  Here is the general setup:
> We are very memory conscious, so we dereference the inputs to a filter after
> the filter has executed (see footnote [2] below for more explanation).  So:
>
>  for ( 20000 )
>    filt->SetInput(input[i]);
>    filt->Update();
>    output[i] = filt->GetOutput()->NewInstance();
>    output[i]->ShallowCopy(filt->GetOutput());
>
>  for (20000)
>    input[i]->Delete();
>
>  I found that the Delete() calls were taking a huge amount of time.  So I
> wrapped them with:
>
>  vtkGarbageCollector::Push()
>  for (20000)
>    input[i]->Delete();
>  vtkGarbageCollector::Pop()
>
>  and found that that improved the situation (supporting Brad's claim that it
> should be faster), but it was still taking a long time.  I have also found
> variation in the run times, meaning that my OS may have a role here too,
> likely in terms of lots of small allocations and deallocations.
>
>  I don't think that I was very clear in my last email, so I want to try
> again.  My only evidence against garbage collection is that my poor man's
> profiling shows that my program is doing garbage collection the majority of
> the time.
>
>  Instead of raising the garbage collection issue, the tack I should have
> taken is to ask why it was taking 47s to delete the output, when it took
> only 20s to create it, as well as executing the filters.  Of course, I would
> greatly help my cause here if I could get a simple reproducer that people
> could sink their teeth into.  So, again, I'll continue pursuing that...
>
>  Best,
>  Hank
>
>  [1] poor man's profiling means connecting with a debugger regularly and
> seeing where the work was happening ... I have had problems getting
> profilers to run on big software projects and I find this to be somewhat
> effective.
>
>  [2] As you all know, there is a tradeoff between reusing cached results
> (what VTK does by default) and keeping memory low (what I am doing
> manually).  Of course, VTK does a good job of minimizing the overhead for
> reusing cached results by often sharing references between input and output.
> Regardless, there is often memory associated with the input that is not
> needed in the output.  For what I'm working on, harvesting that memory is
> worthwhile.  Also, I mitigate the loss of reusing cached results somewhat by
> keeping a cache for all I/O (... and I have found that I/O is often the
> bottleneck).
>
>
>
>  On May 13, 2008, at 6:15 AM, Berk Geveci wrote:
>
>
> > Hi Hank,
> >
> > Where is the big loop over 20000 items happening? Around the Push/Pop
> > or inside them?
> >
> > -berk
> >
> > On Mon, May 12, 2008 at 6:35 PM, Hank Childs <childs3 at llnl.gov> wrote:
> >
> > >
> > >  Hello VTK Developers!
> > >
> > >  I am running in serial and am setting up about 20000 pipelines on my
> serial
> > > process for about 20000 chunks of data.
> > >
> > >  The runtime has gotten disproportionately large with the large number
> of
> > > chunks and I believe that garbage collection is at least partly to
> blame.
> > >
> > >  For example, if I:
> > >  1) call vtkGarbageCollector::DeferredCollectionPush()
> > >  2) execute three filters (filters that find external faces and remove
> ghost
> > > data) and,
> > >  3) call vtkGarbageCollector::DeferredCollectionPop()
> > >
> > >  then: the three filters take about 20s total and the
> DeferredCollectionPop
> > > takes about 47s.
> > >
> > >  One conclusion that I drew from the fast execution of the three
> filters, is
> > > that iterating through the data is relatively quickly.  Restated, I
> ruled
> > > out thrashing through memory as the reason the garbage collector is
> taking
> > > 47s.
> > >
> > >  Also, I should disclose that I am managing the execution manually.  The
> > > best way to describe it would be that I have one instance of filter A,
> one
> > > instance of filter B, and one instance of filter C and that I route all
> 20K
> > > data sets through filter A, to make 20K new data sets, then route those
> 20K
> > > new data sets through B, and so on.  Also, I know that the alternative
> is to
> > > call "Update()" 20K times, one for each chunk, but I'd prefer not to go
> down
> > > that route, for reasons I can explain if necessary.
> > >
> > >  So: can anyone point me to some words of wisdom about a way to manage
> my
> > > data objects so that garbage collection is faster?
> > >
> > >  Best regards,
> > >  Hank_______________________________________________
> > >  vtk-developers mailing list
> > >  vtk-developers at vtk.org
> > >  http://www.vtk.org/mailman/listinfo/vtk-developers
> > >
> > >
> >
>
>



More information about the vtk-developers mailing list