[vtk-developers] zero-copy mixed language support in vtkDataArray

Fri Jan 24 13:20:25 EST 2014

Berk, not at all! Your clarifications, opinions, and code, are all very 
helpful! David's and Ben's too. I tend to be stubborn to a fault, and 
should give up easier.

Burlen

On 01/23/2014 04:44 PM, Berk Geveci wrote:
> Sounds good. Thanks for the work and I apologize if I was too negative...
>
> -berk
>
> On Thu, Jan 23, 2014 at 6:40 PM, Burlen Loring <burlen.loring at gmail.com> wrote:
>> OK. let's abandon this patch then. If developers are careful things should
>> work observing the DeleteEvent alone. the resize after SetArray and multiple
>> back to back SetArray call cases are unlikely to arise under normal
>> circumstances. I only wanted to cover these cases to make it bulletproof and
>> efficient, as in those cases the data could be free'd before the delete
>> event occurs.
>>
>>
>> On 01/23/2014 01:40 PM, Berk Geveci wrote:
>>> Here is a code snippet:
>>>
>>> def MakeObserver(numpy_array):
>>>       "Internal function used to attach a numpy array to a vtk array"
>>>       def Closure(caller, event):
>>>           foo = numpy_array
>>>       return Closure
>>>
>>> vtkarray = numpy_support.numpy_to_vtk(array)
>>> vtkarray.SetName(name)
>>> # This makes the VTK array carry a reference to the numpy array.
>>> vtkarray.AddObserver('DeleteEvent', MakeObserver(array))
>>>
>>> See numpy_support for numpy_to_vtk but it essentially uses
>>> SetVoidArray() with the buffer coming from numpy. This guarantees that
>>> the numpy array sticks around until the VTK array is deleted. If the
>>> original reference to the numpy array is released and then the VTK
>>> array is deleted, the numpy array will be deleted. If either sticks
>>> around, the numpy array will stick around. Note that this will crash
>>> if the VTK object is deleted after Python is finalized. But that's
>>> fine because if Python is finalized, it will delete the numpy array
>>> anyway and make the VTK pointer a dangling one.
>>>
>>> See dataset_adapter.py in ParaView for the full implementation.
>>>
>>> This does not support array resizing but I don't see any reason why it
>>> should. Any data coming from a simulation should be treated as
>>> read-only by VTK anyway. Filters will never try to resize or change an
>>> array coming as input.
>>>
>>> I agree that the implementation that you are suggesting will support a
>>> lot more use cases. However, I am not convinced that there is a need
>>> to add code to VTK and make it harder to maintain to support
>>> hypothetical use cases. The in situ use case is pretty
>>> straightforward:
>>>
>>> - Simulation allocates data structures for its own use
>>> - Data structures are passed to VTK by reference in some sort of adaptor
>>> code
>>> - VTK does in situ processing - it treats those data structures as
>>> read only. It may produce new data and will delete those internally
>>> - Simulation code continues and if it wants deletes its own data
>>> structures. At this point VTK will have dangling pointers but who
>>> cares
>>> - Repeat until simulation is done
>>>
>>> This use case is simply handled by using SetVoidArray() without any
>>> delete option. No need to change VTK.
>>>
>>> The reason I implemented the Python code is a different use case. It
>>> was to support Python programmable filters that return new numpy
>>> arrays that VTK takes the ownership of. It is handled by the
>>> dataset_adapter.py without having to change VTK.
>>>
>>> To do anything beyond this, I'd like to see some actual use cases that
>>> fall within VTK's design and "mission statement".
>>>
>>> Best,
>>> -berk
>>>
>>>
>>>
>>> On Thu, Jan 23, 2014 at 2:56 PM, Burlen Loring <burlen.loring at gmail.com>
>>> wrote:
>>>> Hi Berk,
>>>>
>>>> I'm happy that you're working on this too! I think it's an important,
>>>> language agnostic, issue for interfacing to VTK in general.
>>>>
>>>> Assuming that your solution is perfect, one downside I see is that its
>>>> very
>>>> python specific. I think that the benefit to adding a PointerFreeEvent
>>>> that
>>>> users could respond to would solve the memory management issues
>>>> associated
>>>> with passing data to VTK by pointer in a very general language agnostic
>>>> way.
>>>> It also fits nicely in VTK's event/observer pattern, it's a small change
>>>> requiring no new api, and the way I implemented it doesn't impact
>>>> performance for those not passing pointers through SetArray api. Those
>>>> are
>>>> my main points.
>>>>
>>>>
>>>> My opinion is that passing ownership of an array allocated in a different
>>>> code to VTK is usually a dangerous thing - the developers of the other
>>>> code
>>>> can easily add a deallocation method somewhere without noticing that VTK
>>>> owns it. They are usually not aware of what VTK does, specially in in
>>>> situ
>>>> type application.
>>>>
>>>> I'd say that it's a dangerous thing given the way VTK is currently
>>>> implemented. I don't think it has to be though. The issue is that doing
>>>> this
>>>> safely will require some coordination between VTK and the owner of the
>>>> memory. Currently there's no way for the owner to know when VTK is
>>>> finished
>>>> with the data and it's safe to deallocate. Adding PointerFreeEvent, which
>>>> could be used to alert the owner of the memory that VTK is finished and
>>>> the
>>>> memory could be safely deallocated, provides the path for the
>>>> coordination
>>>> needed to resolve this issue.
>>>>
>>>> For example, the PointerFreeEvent and event/observer pattern allows VTK
>>>> to
>>>> interface to any arbitrary reference counting implementation through a
>>>> small
>>>> piece of user provided glue code, namely their specific implementation of
>>>> vtkCommand. The python numpy case makes a great illustration of this, and
>>>> I
>>>> wrote a small application to accompany the VTK patch for the purposes of
>>>> this discussion. eg see vtkPointerFreeEventObserver in
>>>> vtkTestZeroCopyPython.cxx and its use in the addScalar function exposed
>>>> to
>>>> python.
>>>> https://github.com/burlen/TestZeroCopyPython/tree/vtk-command-callback
>>>>
>>>> Here's how I see this in detail:
>>>>
>>>> * internal to VTK what makes passing data directly by pointer dangerous
>>>> is
>>>> that the memory backing the data can disappear/go out of scope/etc before
>>>> VTK is done with it. external to VTK, aside from some very simple cases,
>>>> it's not easy to know when VTK is finished with the data and thus safe to
>>>> deallocate. These issues are language agnostic. Leveraging VTK's
>>>> event/observer pattern with a new  PointerFreeEvent provides a language
>>>> agnostic notification path allowing these issues to be managed
>>>> effectively.
>>>>
>>>> * Another factor of importance when dealing with large data is that you
>>>> want
>>>> to release what ever data you can as soon as you can to reduce memory
>>>> pressure. You don't want to have large vtk data arrays around for the
>>>> life
>>>> of the application when they're not needed. When passing a pointer in to
>>>> VTK, the underlying data should be release in response to 3 events 1) the
>>>> vtk array is resized, 2) the vtk array is deleted, 3) SetArray is called
>>>> again with a new pointer. The new PointerFreeEvent, invoked at these 3
>>>> spots, allows the owner to deallocate at the right time, and not sooner,
>>>> in
>>>> a language agnostic way.
>>>>
>>>> * the owner of the data passed may not have direct access to VTK objects
>>>> or
>>>> even know about VTK. eg an in-situ library that doesn't expose VTK
>>>> objects
>>>> in its api. The event/observer pattern with the new PointerFreeEvent
>>>> provides a flexible path way for the owner to be notified that VTK is
>>>> finished. It's easy to use this in the glue code hiding VTK's
>>>> implementation
>>>> from the owner.
>>>>
>>>> * I also think that in the python numpy case, VTK should manage the py
>>>> object ref count invisible to the user in it's glue code so that it's
>>>> inherently safe no matter what the user does, or if the numpy obect goes
>>>> out
>>>> of scope. I've been looking at VTK's python wrappings and this and would
>>>> be
>>>> straight forward to implement using an observer to the new
>>>> PointerFreeEvent
>>>> (that's potentially the topic of a follow up patch assuming this ever
>>>> gets
>>>> accepted).
>>>>
>>>> I'm not familiar with your PV solution, but for now I'll assume that it
>>>> addresses all of the above correctly. however these issues are not
>>>> specific
>>>> to python. Given that the event/observer pattern and PointerFreeEvent is
>>>> a
>>>> small change adding no new api, doesn't impact performance, and would
>>>> solve
>>>> the issue in a language agnostic manner, would it not be a better choice?
>>>>
>>>>
>>>> I am a little confused about why this is needed for Fortran arrays. Why
>>>> couldn't you set VTK to not delete the array and leave it to the owner of
>>>> the array to take care of deletion?
>>>>
>>>> the primary issue is that, in non-trivial cases, the owner doesn't know
>>>> when
>>>> VTK is finished with the data. This issue is not limited to python and
>>>> fortran, it's an issue any time data is passed to VTK by pointer and
>>>> should
>>>> not be free'd with free or delete[]. One could even imagine a high
>>>> performance c++ app internally managing a pool of memory on its own where
>>>> free/delete[] would not be used yet memory passed to VTK would still need
>>>> to
>>>> be reclaimed in a timely manner.
>>>>
>>>>
>>>> I already have this working in ParaView's Python using observers and the
>>>> fact that a Python observer holds on to the function object, which though
>>>> a
>>>> closure can hold a reference to the array. This works as long as Python
>>>> is
>>>> finalized after the VTK array is deleted.
>>>>
>>>> Cool. Where can I find the implementation? In your solution what happens
>>>> when the array is resized? what happens when a new pointer is passed
>>>> through
>>>> SetVoidArray? Your last statement makes me worry that data may be held in
>>>> memory until python is finalized, if so that would be a show stopper, as
>>>> we
>>>> have a large number of very large arrays being exchanged over a
>>>> potentially
>>>> long run time and need them to be released as quickly as possible.Even if
>>>> your solution were otherwise perfect, I still think a non-python specific
>>>> solution would be better.
>>>>
>>>> Burlen
>>>>
>>>>
>>>> On 01/23/2014 05:49 AM, Berk Geveci wrote:
>>>>
>>>> Hey guys,
>>>>
>>>> Please let me take a look at this before merging it. I have already done
>>>> similar things. In fact, this is not needed for Python. I already have
>>>> this
>>>> working in ParaView's Python using observers and the fact that a Python
>>>> observer holds on to the function object, which though a closure can hold
>>>> a
>>>> reference to the array. This works as long as Python is finalized after
>>>> the
>>>> VTK array is deleted.
>>>>
>>>> Also, I am a little confused about why this is needed for Fortran arrays.
>>>> Why couldn't you set VTK to not delete the array and leave it to the
>>>> owner
>>>> of the array to take care of deletion? My opinion is that passing
>>>> ownership
>>>> of an array allocated in a different code to VTK is usually a dangerous
>>>> thing - the developers of the other code can easily add a deallocation
>>>> method somewhere without noticing that VTK owns it. They are usually not
>>>> aware of what VTK does, specially in in situ type application.
>>>>
>>>> Best,
>>>> -berk
>>>>
>>>>
>>>>
>>>> On Wed, Jan 22, 2014 at 6:00 PM, Burlen Loring <burlen.loring at gmail.com>
>>>> wrote:
>>>>> Hi Guys,
>>>>>
>>>>> I've refactored the patch as David suggested adding
>>>>> vtkCommand::PointerFreeEvent which is fired when the pointer passed to
>>>>> SetArray is no longer needed and the delete method is
>>>>> VTK_DATA_ARRAY_CALLBACK. Would you mind taking another look at it?
>>>>> http://review.source.kitware.com/#/c/14072/5
>>>>>
>>>>>
>>>>> Burlen
>>>>>
>>>>> On 01/19/2014 03:43 PM, David Gobbi wrote:
>>>>>> I meant vtkCommand, not vtkCallback.  Invoke a FreeEvent and catch it
>>>>>> on the other side with a vtkCommand.
>>>>>>
>>>>>> On Sun, Jan 19, 2014 at 4:41 PM, David Gobbi <david.gobbi at gmail.com>
>>>>>> wrote:
>>>>>>> Hi Burlen,
>>>>>>>
>>>>>>> If that trick won't work, then maybe the easiest thing is to make the
>>>>>>> vtkDataArray generate a specific event (e.g. create a new event called
>>>>>>> a FreeEvent) that is called when the memory is freed.  That way you
>>>>>>> are taking advantage of the existing vtkCallback support, which is
>>>>>>> already wrapped by all of the wrapper languages.
>>>>>>>
>>>>>>>      David
>>>>>>>
>>>>>>> On Sun, Jan 19, 2014 at 4:22 PM, Burlen Loring
>>>>>>> <burlen.loring at gmail.com>
>>>>>>> wrote:
>>>>>>>> thanks, that's a neat trick!
>>>>>>>>
>>>>>>>> it certainly keeps one safe : the numpy object is around while the
>>>>>>>> array is.
>>>>>>>> I don't think that this will do the right thing if the vtk array is
>>>>>>>> resized,
>>>>>>>> or if SetVoidPointer is called again. In both cases, the vtk data
>>>>>>>> array
>>>>>>>> is
>>>>>>>> still alive and well, but the numpy array's ref count should be
>>>>>>>> decremented.
>>>>>>>>
>>>>>>>> a python only solution wont work for me here, because I'm working in
>>>>>>>> the
>>>>>>>> C/C++ "glue layer" sitting in between the python app(s) (which aren't
>>>>>>>> mine)
>>>>>>>> and a legacy VTK based app. VTK objects aren't used in the app's
>>>>>>>> python
>>>>>>>> API
>>>>>>>> at all,  only lists, tuples, and I'm adding support for numpy arrays.
>>>>>>>> Using
>>>>>>>> VTK objects in the API is not an option.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/19/2014 8:43 AM, David Gobbi wrote:
>>>>>>>>> Hi Burlen,
>>>>>>>>>
>>>>>>>>> There might be a simpler way to get VTK to free the numpy array,
>>>>>>>>> that won't require any changes to the VTK code at all: just add
>>>>>>>>> the numpy array to the VTK array as an attribute.  Here is some
>>>>>>>>> example code:
>>>>>>>>>
>>>>>>>>> == BEGIN EXAMPLE ==
>>>>>>>>>
>>>>>>>>> # An example of using numpy arrays in VTK
>>>>>>>>> import vtk
>>>>>>>>> import numpy
>>>>>>>>> import weakref
>>>>>>>>> import gc
>>>>>>>>>
>>>>>>>>> # Use a numpy array as a VTK array
>>>>>>>>> z = numpy.arange(0,10,1,float)
>>>>>>>>> a = vtk.vtkDoubleArray()
>>>>>>>>> a.SetVoidArray(z,10,1)
>>>>>>>>>
>>>>>>>>> # Make the VTK array track the numpy array
>>>>>>>>> a.array = z
>>>>>>>>>
>>>>>>>>> # Create weakrefs to check for deletion
>>>>>>>>> rz = weakref.ref(z)
>>>>>>>>> ra = weakref.ref(a)
>>>>>>>>>
>>>>>>>>> # Decref the numpy array, it won't be deleted yet
>>>>>>>>> del z
>>>>>>>>> gc.collect()
>>>>>>>>> if rz() == None:
>>>>>>>>>         print "numpy array is deleted (#1)"
>>>>>>>>>
>>>>>>>>> # Decref the VTK array, numpy array will be decref'd
>>>>>>>>> del a
>>>>>>>>> gc.collect()
>>>>>>>>> if rz() == None:
>>>>>>>>>         print "numpy array is deleted (#2)"
>>>>>>>>>
>>>>>>>>> == END EXAMPLE ==
>>>>>>>>>
>>>>>>>>> This works because the attributes of a VTK-Python object are kept
>>>>>>>>> alive for as long as the corresponding VTK-C++ object exists.
>>>>>>>>>
>>>>>>>>>       David
>>>>>>>>>
>>>>>>>>> On Sun, Jan 19, 2014 at 12:09 AM, Burlen Loring
>>>>>>>>> <burlen.loring at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> To illustrate the callback in action I wrote an example where data
>>>>>>>>>> is
>>>>>>>>>> transferred to VTK from numpy ndarrays. I hope this will help.
>>>>>>>>>> README
>>>>>>>>>> file
>>>>>>>>>> explains the example. https://github.com/burlen/TestZeroCopyPython
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a
>>>>>>>>>>>> key
>>>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>>>> passing
>>>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>>>> Thanks. The problem with callback context ownership is that if
>>>>>>>>>>> it's
>>>>>>>>>>> a
>>>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>>>> notification
>>>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>>>> may
>>>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>>>> context.
>>>>>>>>>>>
>>>>>>>>>> I'm not following your complaint here. We're concerned about a
>>>>>>>>>> pointer to
>>>>>>>>>> some array that must be kept alive while VTK uses it but must be
>>>>>>>>>> released
>>>>>>>>>> when VTK is done. In this situation VTK "no longer needs the data"
>>>>>>>>>> only
>>>>>>>>>> ever
>>>>>>>>>> once. and just as VTK always will call free when the pointer is
>>>>>>>>>> passed in
>>>>>>>>>> through the existing SetArray w/ VTK_DATA_ARRAY_FREE flag. VTK will
>>>>>>>>>> always
>>>>>>>>>> call the callback, either when the vtkDataArray is resized or when
>>>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>> Delete'd. The callback and callback data are pointers, as far as
>>>>>>>>>> VTK
>>>>>>>>>> is
>>>>>>>>>> concerned they're just values, and VTK need not worry about
>>>>>>>>>> free'ing
>>>>>>>>>> them.
>>>>>>>>>> Note that this doesn't prevent one from passing an instance of an
>>>>>>>>>> arbitrary
>>>>>>>>>> class in callbackData. In that case the instance should be deleted
>>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>> callback.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>>>> valid/usefull.
>>>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>>>> just
>>>>>>>>>>> need the array parameter, so NULL callbackData is needed.
>>>>>>>>>> NULL callbackData would generally not work out. the callback needs
>>>>>>>>>> non-NULL
>>>>>>>>>> callbackData passed to it in order to identify the memory to free.
>>>>>>>>>> In
>>>>>>>>>> your
>>>>>>>>>> example callbackData is the array pointer itself, not NULL. This is
>>>>>>>>>> not a
>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>>>       Where array
>>>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>>>> the
>>>>>>>>>>> array, but callbackData is the arena to free it from. Currently,
>>>>>>>>>>> you
>>>>>>>>>>> only get one or the other.
>>>>>>>>>> actually I think currently you could get both without issue. don't
>>>>>>>>>> forget
>>>>>>>>>> callbackData is very flexible, it can be (point to) anything. In
>>>>>>>>>> the
>>>>>>>>>> case
>>>>>>>>>> you're describing: write a class containing the arena and the
>>>>>>>>>> array.
>>>>>>>>>> pass
>>>>>>>>>> a
>>>>>>>>>> heap allocated instance of your class as callbackData when you call
>>>>>>>>>> SetArray. When VTK calls the callback, in addition to releasing the
>>>>>>>>>> array
>>>>>>>>>> data, delete the instance to your class.
>>>>>>>>>>
>>>>>>>>>> Would you be happier with a polymorphic reference counted functor,
>>>>>>>>>> rather
>>>>>>>>>> than a callback? Initially I thought this would be overkill, but
>>>>>>>>>> I'd
>>>>>>>>>> be
>>>>>>>>>> willing to go that way if people are against the callback idea.
>>>>>>>>>>
>>>>>>>>>> Burlen
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 01/17/2014 09:19 PM, Ben Boeckel wrote:
>>>>>>>>>>> On Fri, Jan 17, 2014 at 15:21:42 -0800, Burlen Loring wrote:
>>>>>>>>>>>> Yes, my intention is that for now they'd be used from glue code
>>>>>>>>>>>> rather than exposed directly. In the future the API could be
>>>>>>>>>>>> exposed
>>>>>>>>>>>> in the wrapped languages if there were a compelling use case for
>>>>>>>>>>>> it.
>>>>>>>>>>>> Until then it's not worth the effort since I doubt VTK wrapping
>>>>>>>>>>>> codes
>>>>>>>>>>>> would handle it correctly.
>>>>>>>>>>> That's valid; just making sure :) .
>>>>>>>>>>>
>>>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a
>>>>>>>>>>>> key
>>>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>>>> passing
>>>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>>>> Thanks. The problem with callback context ownership is that if
>>>>>>>>>>> it's
>>>>>>>>>>> a
>>>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>>>> notification
>>>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>>>> may
>>>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>>>> context.
>>>>>>>>>>>
>>>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>>>> valid/usefull.
>>>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>>>> just
>>>>>>>>>>> need the array parameter, so NULL callbackData is needed. Where
>>>>>>>>>>> array
>>>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>>>> the
>>>>>>>>>>> array, but callbackData is the arena to free it from. Currently,
>>>>>>>>>>> you
>>>>>>>>>>> only get one or the other.
>>>>>>>>>>>
>>>>>>>>>>> --Ben
>>>>>
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://www.vtk.org/mailman/listinfo/vtk-developers
>>>>>