[vtk-developers] zero-copy mixed language support in vtkDataArray

Burlen Loring burlen.loring at gmail.com
Thu Jan 23 18:40:19 EST 2014


OK. let's abandon this patch then. If developers are careful things 
should work observing the DeleteEvent alone. the resize after SetArray 
and multiple back to back SetArray call cases are unlikely to arise 
under normal circumstances. I only wanted to cover these cases to make 
it bulletproof and efficient, as in those cases the data could be free'd 
before the delete event occurs.

On 01/23/2014 01:40 PM, Berk Geveci wrote:
> Here is a code snippet:
>
> def MakeObserver(numpy_array):
>      "Internal function used to attach a numpy array to a vtk array"
>      def Closure(caller, event):
>          foo = numpy_array
>      return Closure
>
> vtkarray = numpy_support.numpy_to_vtk(array)
> vtkarray.SetName(name)
> # This makes the VTK array carry a reference to the numpy array.
> vtkarray.AddObserver('DeleteEvent', MakeObserver(array))
>
> See numpy_support for numpy_to_vtk but it essentially uses
> SetVoidArray() with the buffer coming from numpy. This guarantees that
> the numpy array sticks around until the VTK array is deleted. If the
> original reference to the numpy array is released and then the VTK
> array is deleted, the numpy array will be deleted. If either sticks
> around, the numpy array will stick around. Note that this will crash
> if the VTK object is deleted after Python is finalized. But that's
> fine because if Python is finalized, it will delete the numpy array
> anyway and make the VTK pointer a dangling one.
>
> See dataset_adapter.py in ParaView for the full implementation.
>
> This does not support array resizing but I don't see any reason why it
> should. Any data coming from a simulation should be treated as
> read-only by VTK anyway. Filters will never try to resize or change an
> array coming as input.
>
> I agree that the implementation that you are suggesting will support a
> lot more use cases. However, I am not convinced that there is a need
> to add code to VTK and make it harder to maintain to support
> hypothetical use cases. The in situ use case is pretty
> straightforward:
>
> - Simulation allocates data structures for its own use
> - Data structures are passed to VTK by reference in some sort of adaptor code
> - VTK does in situ processing - it treats those data structures as
> read only. It may produce new data and will delete those internally
> - Simulation code continues and if it wants deletes its own data
> structures. At this point VTK will have dangling pointers but who
> cares
> - Repeat until simulation is done
>
> This use case is simply handled by using SetVoidArray() without any
> delete option. No need to change VTK.
>
> The reason I implemented the Python code is a different use case. It
> was to support Python programmable filters that return new numpy
> arrays that VTK takes the ownership of. It is handled by the
> dataset_adapter.py without having to change VTK.
>
> To do anything beyond this, I'd like to see some actual use cases that
> fall within VTK's design and "mission statement".
>
> Best,
> -berk
>
>
>
> On Thu, Jan 23, 2014 at 2:56 PM, Burlen Loring <burlen.loring at gmail.com> wrote:
>> Hi Berk,
>>
>> I'm happy that you're working on this too! I think it's an important,
>> language agnostic, issue for interfacing to VTK in general.
>>
>> Assuming that your solution is perfect, one downside I see is that its very
>> python specific. I think that the benefit to adding a PointerFreeEvent that
>> users could respond to would solve the memory management issues associated
>> with passing data to VTK by pointer in a very general language agnostic way.
>> It also fits nicely in VTK's event/observer pattern, it's a small change
>> requiring no new api, and the way I implemented it doesn't impact
>> performance for those not passing pointers through SetArray api. Those are
>> my main points.
>>
>>
>> My opinion is that passing ownership of an array allocated in a different
>> code to VTK is usually a dangerous thing - the developers of the other code
>> can easily add a deallocation method somewhere without noticing that VTK
>> owns it. They are usually not aware of what VTK does, specially in in situ
>> type application.
>>
>> I'd say that it's a dangerous thing given the way VTK is currently
>> implemented. I don't think it has to be though. The issue is that doing this
>> safely will require some coordination between VTK and the owner of the
>> memory. Currently there's no way for the owner to know when VTK is finished
>> with the data and it's safe to deallocate. Adding PointerFreeEvent, which
>> could be used to alert the owner of the memory that VTK is finished and the
>> memory could be safely deallocated, provides the path for the coordination
>> needed to resolve this issue.
>>
>> For example, the PointerFreeEvent and event/observer pattern allows VTK to
>> interface to any arbitrary reference counting implementation through a small
>> piece of user provided glue code, namely their specific implementation of
>> vtkCommand. The python numpy case makes a great illustration of this, and I
>> wrote a small application to accompany the VTK patch for the purposes of
>> this discussion. eg see vtkPointerFreeEventObserver in
>> vtkTestZeroCopyPython.cxx and its use in the addScalar function exposed to
>> python.
>> https://github.com/burlen/TestZeroCopyPython/tree/vtk-command-callback
>>
>> Here's how I see this in detail:
>>
>> * internal to VTK what makes passing data directly by pointer dangerous is
>> that the memory backing the data can disappear/go out of scope/etc before
>> VTK is done with it. external to VTK, aside from some very simple cases,
>> it's not easy to know when VTK is finished with the data and thus safe to
>> deallocate. These issues are language agnostic. Leveraging VTK's
>> event/observer pattern with a new  PointerFreeEvent provides a language
>> agnostic notification path allowing these issues to be managed effectively.
>>
>> * Another factor of importance when dealing with large data is that you want
>> to release what ever data you can as soon as you can to reduce memory
>> pressure. You don't want to have large vtk data arrays around for the life
>> of the application when they're not needed. When passing a pointer in to
>> VTK, the underlying data should be release in response to 3 events 1) the
>> vtk array is resized, 2) the vtk array is deleted, 3) SetArray is called
>> again with a new pointer. The new PointerFreeEvent, invoked at these 3
>> spots, allows the owner to deallocate at the right time, and not sooner, in
>> a language agnostic way.
>>
>> * the owner of the data passed may not have direct access to VTK objects or
>> even know about VTK. eg an in-situ library that doesn't expose VTK objects
>> in its api. The event/observer pattern with the new PointerFreeEvent
>> provides a flexible path way for the owner to be notified that VTK is
>> finished. It's easy to use this in the glue code hiding VTK's implementation
>> from the owner.
>>
>> * I also think that in the python numpy case, VTK should manage the py
>> object ref count invisible to the user in it's glue code so that it's
>> inherently safe no matter what the user does, or if the numpy obect goes out
>> of scope. I've been looking at VTK's python wrappings and this and would be
>> straight forward to implement using an observer to the new PointerFreeEvent
>> (that's potentially the topic of a follow up patch assuming this ever gets
>> accepted).
>>
>> I'm not familiar with your PV solution, but for now I'll assume that it
>> addresses all of the above correctly. however these issues are not specific
>> to python. Given that the event/observer pattern and PointerFreeEvent is a
>> small change adding no new api, doesn't impact performance, and would solve
>> the issue in a language agnostic manner, would it not be a better choice?
>>
>>
>> I am a little confused about why this is needed for Fortran arrays. Why
>> couldn't you set VTK to not delete the array and leave it to the owner of
>> the array to take care of deletion?
>>
>> the primary issue is that, in non-trivial cases, the owner doesn't know when
>> VTK is finished with the data. This issue is not limited to python and
>> fortran, it's an issue any time data is passed to VTK by pointer and should
>> not be free'd with free or delete[]. One could even imagine a high
>> performance c++ app internally managing a pool of memory on its own where
>> free/delete[] would not be used yet memory passed to VTK would still need to
>> be reclaimed in a timely manner.
>>
>>
>> I already have this working in ParaView's Python using observers and the
>> fact that a Python observer holds on to the function object, which though a
>> closure can hold a reference to the array. This works as long as Python is
>> finalized after the VTK array is deleted.
>>
>> Cool. Where can I find the implementation? In your solution what happens
>> when the array is resized? what happens when a new pointer is passed through
>> SetVoidArray? Your last statement makes me worry that data may be held in
>> memory until python is finalized, if so that would be a show stopper, as we
>> have a large number of very large arrays being exchanged over a potentially
>> long run time and need them to be released as quickly as possible.Even if
>> your solution were otherwise perfect, I still think a non-python specific
>> solution would be better.
>>
>> Burlen
>>
>>
>> On 01/23/2014 05:49 AM, Berk Geveci wrote:
>>
>> Hey guys,
>>
>> Please let me take a look at this before merging it. I have already done
>> similar things. In fact, this is not needed for Python. I already have this
>> working in ParaView's Python using observers and the fact that a Python
>> observer holds on to the function object, which though a closure can hold a
>> reference to the array. This works as long as Python is finalized after the
>> VTK array is deleted.
>>
>> Also, I am a little confused about why this is needed for Fortran arrays.
>> Why couldn't you set VTK to not delete the array and leave it to the owner
>> of the array to take care of deletion? My opinion is that passing ownership
>> of an array allocated in a different code to VTK is usually a dangerous
>> thing - the developers of the other code can easily add a deallocation
>> method somewhere without noticing that VTK owns it. They are usually not
>> aware of what VTK does, specially in in situ type application.
>>
>> Best,
>> -berk
>>
>>
>>
>> On Wed, Jan 22, 2014 at 6:00 PM, Burlen Loring <burlen.loring at gmail.com>
>> wrote:
>>> Hi Guys,
>>>
>>> I've refactored the patch as David suggested adding
>>> vtkCommand::PointerFreeEvent which is fired when the pointer passed to
>>> SetArray is no longer needed and the delete method is
>>> VTK_DATA_ARRAY_CALLBACK. Would you mind taking another look at it?
>>> http://review.source.kitware.com/#/c/14072/5
>>>
>>>
>>> Burlen
>>>
>>> On 01/19/2014 03:43 PM, David Gobbi wrote:
>>>> I meant vtkCommand, not vtkCallback.  Invoke a FreeEvent and catch it
>>>> on the other side with a vtkCommand.
>>>>
>>>> On Sun, Jan 19, 2014 at 4:41 PM, David Gobbi <david.gobbi at gmail.com>
>>>> wrote:
>>>>> Hi Burlen,
>>>>>
>>>>> If that trick won't work, then maybe the easiest thing is to make the
>>>>> vtkDataArray generate a specific event (e.g. create a new event called
>>>>> a FreeEvent) that is called when the memory is freed.  That way you
>>>>> are taking advantage of the existing vtkCallback support, which is
>>>>> already wrapped by all of the wrapper languages.
>>>>>
>>>>>     David
>>>>>
>>>>> On Sun, Jan 19, 2014 at 4:22 PM, Burlen Loring <burlen.loring at gmail.com>
>>>>> wrote:
>>>>>> thanks, that's a neat trick!
>>>>>>
>>>>>> it certainly keeps one safe : the numpy object is around while the
>>>>>> array is.
>>>>>> I don't think that this will do the right thing if the vtk array is
>>>>>> resized,
>>>>>> or if SetVoidPointer is called again. In both cases, the vtk data array
>>>>>> is
>>>>>> still alive and well, but the numpy array's ref count should be
>>>>>> decremented.
>>>>>>
>>>>>> a python only solution wont work for me here, because I'm working in
>>>>>> the
>>>>>> C/C++ "glue layer" sitting in between the python app(s) (which aren't
>>>>>> mine)
>>>>>> and a legacy VTK based app. VTK objects aren't used in the app's python
>>>>>> API
>>>>>> at all,  only lists, tuples, and I'm adding support for numpy arrays.
>>>>>> Using
>>>>>> VTK objects in the API is not an option.
>>>>>>
>>>>>>
>>>>>> On 1/19/2014 8:43 AM, David Gobbi wrote:
>>>>>>> Hi Burlen,
>>>>>>>
>>>>>>> There might be a simpler way to get VTK to free the numpy array,
>>>>>>> that won't require any changes to the VTK code at all: just add
>>>>>>> the numpy array to the VTK array as an attribute.  Here is some
>>>>>>> example code:
>>>>>>>
>>>>>>> == BEGIN EXAMPLE ==
>>>>>>>
>>>>>>> # An example of using numpy arrays in VTK
>>>>>>> import vtk
>>>>>>> import numpy
>>>>>>> import weakref
>>>>>>> import gc
>>>>>>>
>>>>>>> # Use a numpy array as a VTK array
>>>>>>> z = numpy.arange(0,10,1,float)
>>>>>>> a = vtk.vtkDoubleArray()
>>>>>>> a.SetVoidArray(z,10,1)
>>>>>>>
>>>>>>> # Make the VTK array track the numpy array
>>>>>>> a.array = z
>>>>>>>
>>>>>>> # Create weakrefs to check for deletion
>>>>>>> rz = weakref.ref(z)
>>>>>>> ra = weakref.ref(a)
>>>>>>>
>>>>>>> # Decref the numpy array, it won't be deleted yet
>>>>>>> del z
>>>>>>> gc.collect()
>>>>>>> if rz() == None:
>>>>>>>        print "numpy array is deleted (#1)"
>>>>>>>
>>>>>>> # Decref the VTK array, numpy array will be decref'd
>>>>>>> del a
>>>>>>> gc.collect()
>>>>>>> if rz() == None:
>>>>>>>        print "numpy array is deleted (#2)"
>>>>>>>
>>>>>>> == END EXAMPLE ==
>>>>>>>
>>>>>>> This works because the attributes of a VTK-Python object are kept
>>>>>>> alive for as long as the corresponding VTK-C++ object exists.
>>>>>>>
>>>>>>>      David
>>>>>>>
>>>>>>> On Sun, Jan 19, 2014 at 12:09 AM, Burlen Loring
>>>>>>> <burlen.loring at gmail.com>
>>>>>>> wrote:
>>>>>>>> To illustrate the callback in action I wrote an example where data is
>>>>>>>> transferred to VTK from numpy ndarrays. I hope this will help. README
>>>>>>>> file
>>>>>>>> explains the example. https://github.com/burlen/TestZeroCopyPython
>>>>>>>>
>>>>>>>>
>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a key
>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>> passing
>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>> Thanks. The problem with callback context ownership is that if it's
>>>>>>>>> a
>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>> notification
>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>> may
>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>> context.
>>>>>>>>>
>>>>>>>> I'm not following your complaint here. We're concerned about a
>>>>>>>> pointer to
>>>>>>>> some array that must be kept alive while VTK uses it but must be
>>>>>>>> released
>>>>>>>> when VTK is done. In this situation VTK "no longer needs the data"
>>>>>>>> only
>>>>>>>> ever
>>>>>>>> once. and just as VTK always will call free when the pointer is
>>>>>>>> passed in
>>>>>>>> through the existing SetArray w/ VTK_DATA_ARRAY_FREE flag. VTK will
>>>>>>>> always
>>>>>>>> call the callback, either when the vtkDataArray is resized or when it
>>>>>>>> is
>>>>>>>> Delete'd. The callback and callback data are pointers, as far as VTK
>>>>>>>> is
>>>>>>>> concerned they're just values, and VTK need not worry about free'ing
>>>>>>>> them.
>>>>>>>> Note that this doesn't prevent one from passing an instance of an
>>>>>>>> arbitrary
>>>>>>>> class in callbackData. In that case the instance should be deleted in
>>>>>>>> the
>>>>>>>> callback.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>> valid/usefull.
>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>> just
>>>>>>>>> need the array parameter, so NULL callbackData is needed.
>>>>>>>> NULL callbackData would generally not work out. the callback needs
>>>>>>>> non-NULL
>>>>>>>> callbackData passed to it in order to identify the memory to free. In
>>>>>>>> your
>>>>>>>> example callbackData is the array pointer itself, not NULL. This is
>>>>>>>> not a
>>>>>>>> problem.
>>>>>>>>
>>>>>>>>>      Where array
>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>> the
>>>>>>>>> array, but callbackData is the arena to free it from. Currently, you
>>>>>>>>> only get one or the other.
>>>>>>>> actually I think currently you could get both without issue. don't
>>>>>>>> forget
>>>>>>>> callbackData is very flexible, it can be (point to) anything. In the
>>>>>>>> case
>>>>>>>> you're describing: write a class containing the arena and the array.
>>>>>>>> pass
>>>>>>>> a
>>>>>>>> heap allocated instance of your class as callbackData when you call
>>>>>>>> SetArray. When VTK calls the callback, in addition to releasing the
>>>>>>>> array
>>>>>>>> data, delete the instance to your class.
>>>>>>>>
>>>>>>>> Would you be happier with a polymorphic reference counted functor,
>>>>>>>> rather
>>>>>>>> than a callback? Initially I thought this would be overkill, but I'd
>>>>>>>> be
>>>>>>>> willing to go that way if people are against the callback idea.
>>>>>>>>
>>>>>>>> Burlen
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/17/2014 09:19 PM, Ben Boeckel wrote:
>>>>>>>>> On Fri, Jan 17, 2014 at 15:21:42 -0800, Burlen Loring wrote:
>>>>>>>>>> Yes, my intention is that for now they'd be used from glue code
>>>>>>>>>> rather than exposed directly. In the future the API could be
>>>>>>>>>> exposed
>>>>>>>>>> in the wrapped languages if there were a compelling use case for
>>>>>>>>>> it.
>>>>>>>>>> Until then it's not worth the effort since I doubt VTK wrapping
>>>>>>>>>> codes
>>>>>>>>>> would handle it correctly.
>>>>>>>>> That's valid; just making sure :) .
>>>>>>>>>
>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a key
>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>> passing
>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>> Thanks. The problem with callback context ownership is that if it's
>>>>>>>>> a
>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>> notification
>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>> may
>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>> context.
>>>>>>>>>
>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>> valid/usefull.
>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>> just
>>>>>>>>> need the array parameter, so NULL callbackData is needed. Where
>>>>>>>>> array
>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>> the
>>>>>>>>> array, but callbackData is the arena to free it from. Currently, you
>>>>>>>>> only get one or the other.
>>>>>>>>>
>>>>>>>>> --Ben
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.vtk.org/mailman/listinfo/vtk-developers
>>>
>>




More information about the vtk-developers mailing list