[vtk-developers] zero-copy mixed language support in vtkDataArray

Burlen Loring burlen.loring at gmail.com
Thu Jan 23 14:56:36 EST 2014


Hi Berk,

I'm happy that you're working on this too! I think it's an important, 
language agnostic, issue for interfacing to VTK in general.

Assuming that your solution is perfect, one downside I see is that its 
very python specific. I think that the benefit to adding a 
PointerFreeEvent that users could respond to would solve the memory 
management issues associated with passing data to VTK by pointer in a 
very general language agnostic way. It also fits nicely in VTK's 
event/observer pattern, it's a small change requiring no new api, and 
the way I implemented it doesn't impact performance for those not 
passing pointers through SetArray api. Those are my main points.

> My opinion is that passing ownership of an array allocated in a 
> different code to VTK is usually a dangerous thing - the developers of 
> the other code can easily add a deallocation method somewhere without 
> noticing that VTK owns it. They are usually not aware of what VTK 
> does, specially in in situ type application.
I'd say that it's a dangerous thing given the way VTK is currently 
implemented. I don't think it has to be though. The issue is that doing 
this safely will require some coordination between VTK and the owner of 
the memory. Currently there's no way for the owner to know when VTK is 
finished with the data and it's safe to deallocate. Adding 
PointerFreeEvent, which could be used to alert the owner of the memory 
that VTK is finished and the memory could be safely deallocated, 
provides the path for the coordination needed to resolve this issue.

For example, the PointerFreeEvent and event/observer pattern allows VTK 
to interface to any arbitrary reference counting implementation through 
a small piece of user provided glue code, namely their specific 
implementation of vtkCommand. The python numpy case makes a great 
illustration of this, and I wrote a small application to accompany the 
VTK patch for the purposes of this discussion. eg see 
vtkPointerFreeEventObserver in vtkTestZeroCopyPython.cxx and its use in 
the addScalar function exposed to python. 
https://github.com/burlen/TestZeroCopyPython/tree/vtk-command-callback

Here's how I see this in detail:

* internal to VTK what makes passing data directly by pointer dangerous 
is that the memory backing the data can disappear/go out of scope/etc 
before VTK is done with it. external to VTK, aside from some very simple 
cases, it's not easy to know when VTK is finished with the data and thus 
safe to deallocate. These issues are language agnostic. Leveraging VTK's 
event/observer pattern with a new PointerFreeEvent provides a language 
agnostic notification path allowing these issues to be managed effectively.

* Another factor of importance when dealing with large data is that you 
want to release what ever data you can as soon as you can to reduce 
memory pressure. You don't want to have large vtk data arrays around for 
the life of the application when they're not needed. When passing a 
pointer in to VTK, the underlying data should be release in response to 
3 events 1) the vtk array is resized, 2) the vtk array is deleted, 3) 
SetArray is called again with a new pointer. The new PointerFreeEvent, 
invoked at these 3 spots, allows the owner to deallocate at the right 
time, and not sooner, in a language agnostic way.

* the owner of the data passed may not have direct access to VTK objects 
or even know about VTK. eg an in-situ library that doesn't expose VTK 
objects in its api. The event/observer pattern with the new 
PointerFreeEvent provides a flexible path way for the owner to be 
notified that VTK is finished. It's easy to use this in the glue code 
hiding VTK's implementation from the owner.

* I also think that in the python numpy case, VTK should manage the py 
object ref count invisible to the user in it's glue code so that it's 
inherently safe no matter what the user does, or if the numpy obect goes 
out of scope. I've been looking at VTK's python wrappings and this and 
would be straight forward to implement using an observer to the new 
PointerFreeEvent (that's potentially the topic of a follow up patch 
assuming this ever gets accepted).

I'm not familiar with your PV solution, but for now I'll assume that it 
addresses all of the above correctly. however these issues are not 
specific to python. Given that the event/observer pattern and 
PointerFreeEvent is a small change adding no new api, doesn't impact 
performance, and would solve the issue in a language agnostic manner, 
would it not be a better choice?

> I am a little confused about why this is needed for Fortran arrays. 
> Why couldn't you set VTK to not delete the array and leave it to the 
> owner of the array to take care of deletion?
the primary issue is that, in non-trivial cases, the owner doesn't know 
when VTK is finished with the data. This issue is not limited to python 
and fortran, it's an issue any time data is passed to VTK by pointer and 
should not be free'd with free or delete[]. One could even imagine a 
high performance c++ app internally managing a pool of memory on its own 
where free/delete[] would not be used yet memory passed to VTK would 
still need to be reclaimed in a timely manner.

> I already have this working in ParaView's Python using observers and 
> the fact that a Python observer holds on to the function object, which 
> though a closure can hold a reference to the array. This works as long 
> as Python is finalized after the VTK array is deleted.
Cool. Where can I find the implementation? In your solution what happens 
when the array is resized? what happens when a new pointer is passed 
through SetVoidArray? Your last statement makes me worry that data may 
be held in memory until python is finalized, if so that would be a show 
stopper, as we have a large number of very large arrays being exchanged 
over a potentially long run time and need them to be released as quickly 
as possible.Even if your solution were otherwise perfect, I still think 
a non-python specific solution would be better.

Burlen

On 01/23/2014 05:49 AM, Berk Geveci wrote:
> Hey guys,
>
> Please let me take a look at this before merging it. I have already 
> done similar things. In fact, this is not needed for Python. I already 
> have this working in ParaView's Python using observers and the fact 
> that a Python observer holds on to the function object, which though a 
> closure can hold a reference to the array. This works as long as 
> Python is finalized after the VTK array is deleted.
>
> Also, I am a little confused about why this is needed for Fortran 
> arrays. Why couldn't you set VTK to not delete the array and leave it 
> to the owner of the array to take care of deletion? My opinion is that 
> passing ownership of an array allocated in a different code to VTK is 
> usually a dangerous thing - the developers of the other code can 
> easily add a deallocation method somewhere without noticing that VTK 
> owns it. They are usually not aware of what VTK does, specially in in 
> situ type application.
>
> Best,
> -berk
>
>
>
> On Wed, Jan 22, 2014 at 6:00 PM, Burlen Loring 
> <burlen.loring at gmail.com <mailto:burlen.loring at gmail.com>> wrote:
>
>     Hi Guys,
>
>     I've refactored the patch as David suggested adding
>     vtkCommand::PointerFreeEvent which is fired when the pointer
>     passed to SetArray is no longer needed and the delete method is
>     VTK_DATA_ARRAY_CALLBACK. Would you mind taking another look at it?
>     http://review.source.kitware.com/#/c/14072/5
>
>
>     Burlen
>
>     On 01/19/2014 03:43 PM, David Gobbi wrote:
>
>         I meant vtkCommand, not vtkCallback.  Invoke a FreeEvent and
>         catch it
>         on the other side with a vtkCommand.
>
>         On Sun, Jan 19, 2014 at 4:41 PM, David Gobbi
>         <david.gobbi at gmail.com <mailto:david.gobbi at gmail.com>> wrote:
>
>             Hi Burlen,
>
>             If that trick won't work, then maybe the easiest thing is
>             to make the
>             vtkDataArray generate a specific event (e.g. create a new
>             event called
>             a FreeEvent) that is called when the memory is freed.
>              That way you
>             are taking advantage of the existing vtkCallback support,
>             which is
>             already wrapped by all of the wrapper languages.
>
>                David
>
>             On Sun, Jan 19, 2014 at 4:22 PM, Burlen Loring
>             <burlen.loring at gmail.com <mailto:burlen.loring at gmail.com>>
>             wrote:
>
>                 thanks, that's a neat trick!
>
>                 it certainly keeps one safe : the numpy object is
>                 around while the array is.
>                 I don't think that this will do the right thing if the
>                 vtk array is resized,
>                 or if SetVoidPointer is called again. In both cases,
>                 the vtk data array is
>                 still alive and well, but the numpy array's ref count
>                 should be decremented.
>
>                 a python only solution wont work for me here, because
>                 I'm working in the
>                 C/C++ "glue layer" sitting in between the python
>                 app(s) (which aren't mine)
>                 and a legacy VTK based app. VTK objects aren't used in
>                 the app's python API
>                 at all,  only lists, tuples, and I'm adding support
>                 for numpy arrays. Using
>                 VTK objects in the API is not an option.
>
>
>                 On 1/19/2014 8:43 AM, David Gobbi wrote:
>
>                     Hi Burlen,
>
>                     There might be a simpler way to get VTK to free
>                     the numpy array,
>                     that won't require any changes to the VTK code at
>                     all: just add
>                     the numpy array to the VTK array as an attribute.
>                      Here is some
>                     example code:
>
>                     == BEGIN EXAMPLE ==
>
>                     # An example of using numpy arrays in VTK
>                     import vtk
>                     import numpy
>                     import weakref
>                     import gc
>
>                     # Use a numpy array as a VTK array
>                     z = numpy.arange(0,10,1,float)
>                     a = vtk.vtkDoubleArray()
>                     a.SetVoidArray(z,10,1)
>
>                     # Make the VTK array track the numpy array
>                     a.array = z
>
>                     # Create weakrefs to check for deletion
>                     rz = weakref.ref(z)
>                     ra = weakref.ref(a)
>
>                     # Decref the numpy array, it won't be deleted yet
>                     del z
>                     gc.collect()
>                     if rz() == None:
>                           print "numpy array is deleted (#1)"
>
>                     # Decref the VTK array, numpy array will be decref'd
>                     del a
>                     gc.collect()
>                     if rz() == None:
>                           print "numpy array is deleted (#2)"
>
>                     == END EXAMPLE ==
>
>                     This works because the attributes of a VTK-Python
>                     object are kept
>                     alive for as long as the corresponding VTK-C++
>                     object exists.
>
>                         David
>
>                     On Sun, Jan 19, 2014 at 12:09 AM, Burlen Loring
>                     <burlen.loring at gmail.com
>                     <mailto:burlen.loring at gmail.com>>
>                     wrote:
>
>                         To illustrate the callback in action I wrote
>                         an example where data is
>                         transferred to VTK from numpy ndarrays. I hope
>                         this will help. README
>                         file
>                         explains the example.
>                         https://github.com/burlen/TestZeroCopyPython
>
>
>                                 OK. as fas as VTK is concerned
>                                 callbackData is intended to be a key
>                                 for use by the callback, and VTK does
>                                 nothing with it beyond passing
>                                 it to the callback. I'll add a note
>                                 about this to clarify.
>
>                             Thanks. The problem with callback context
>                             ownership is that if it's a
>                             one-time, the callback "owns" the context,
>                             but if it's a notification
>                             kind of thing, the *notifier* owns the
>                             context (since the callback may
>                             never be called) and needs an additional
>                             function to free the context.
>
>                         I'm not following your complaint here. We're
>                         concerned about a pointer to
>                         some array that must be kept alive while VTK
>                         uses it but must be released
>                         when VTK is done. In this situation VTK "no
>                         longer needs the data" only
>                         ever
>                         once. and just as VTK always will call free
>                         when the pointer is passed in
>                         through the existing SetArray w/
>                         VTK_DATA_ARRAY_FREE flag. VTK will
>                         always
>                         call the callback, either when the
>                         vtkDataArray is resized or when it is
>                         Delete'd. The callback and callback data are
>                         pointers, as far as VTK is
>                         concerned they're just values, and VTK need
>                         not worry about free'ing
>                         them.
>                         Note that this doesn't prevent one from
>                         passing an instance of an
>                         arbitrary
>                         class in callbackData. In that case the
>                         instance should be deleted in the
>                         callback.
>
>
>                                 Works for me, although I'm not sure
>                                 NULL callbackData is ever
>                                 valid/usefull.
>
>                             As an example, if you have a LocalFree or
>                             whatever on Windows, you just
>                             need the array parameter, so NULL
>                             callbackData is needed.
>
>                         NULL callbackData would generally not work
>                         out. the callback needs
>                         non-NULL
>                         callbackData passed to it in order to identify
>                         the memory to free. In
>                         your
>                         example callbackData is the array pointer
>                         itself, not NULL. This is not a
>                         problem.
>
>                                 Where array
>                             and callbackData might be needed is
>                             something like where you need the
>                             array, but callbackData is the arena to
>                             free it from. Currently, you
>                             only get one or the other.
>
>                         actually I think currently you could get both
>                         without issue. don't forget
>                         callbackData is very flexible, it can be
>                         (point to) anything. In the case
>                         you're describing: write a class containing
>                         the arena and the array. pass
>                         a
>                         heap allocated instance of your class as
>                         callbackData when you call
>                         SetArray. When VTK calls the callback, in
>                         addition to releasing the array
>                         data, delete the instance to your class.
>
>                         Would you be happier with a polymorphic
>                         reference counted functor, rather
>                         than a callback? Initially I thought this
>                         would be overkill, but I'd be
>                         willing to go that way if people are against
>                         the callback idea.
>
>                         Burlen
>
>
>                         On 01/17/2014 09:19 PM, Ben Boeckel wrote:
>
>                             On Fri, Jan 17, 2014 at 15:21:42 -0800,
>                             Burlen Loring wrote:
>
>                                 Yes, my intention is that for now
>                                 they'd be used from glue code
>                                 rather than exposed directly. In the
>                                 future the API could be exposed
>                                 in the wrapped languages if there were
>                                 a compelling use case for it.
>                                 Until then it's not worth the effort
>                                 since I doubt VTK wrapping codes
>                                 would handle it correctly.
>
>                             That's valid; just making sure :) .
>
>                                 OK. as fas as VTK is concerned
>                                 callbackData is intended to be a key
>                                 for use by the callback, and VTK does
>                                 nothing with it beyond passing
>                                 it to the callback. I'll add a note
>                                 about this to clarify.
>
>                             Thanks. The problem with callback context
>                             ownership is that if it's a
>                             one-time, the callback "owns" the context,
>                             but if it's a notification
>                             kind of thing, the *notifier* owns the
>                             context (since the callback may
>                             never be called) and needs an additional
>                             function to free the context.
>
>                                 Works for me, although I'm not sure
>                                 NULL callbackData is ever
>                                 valid/usefull.
>
>                             As an example, if you have a LocalFree or
>                             whatever on Windows, you just
>                             need the array parameter, so NULL
>                             callbackData is needed. Where array
>                             and callbackData might be needed is
>                             something like where you need the
>                             array, but callbackData is the arena to
>                             free it from. Currently, you
>                             only get one or the other.
>
>                             --Ben
>
>
>     _______________________________________________
>     Powered by www.kitware.com <http://www.kitware.com>
>
>     Visit other Kitware open-source projects at
>     http://www.kitware.com/opensource/opensource.html
>
>     Follow this link to subscribe/unsubscribe:
>     http://www.vtk.org/mailman/listinfo/vtk-developers
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20140123/42770e35/attachment-0002.html>


More information about the vtk-developers mailing list