[vtk-developers] zero-copy mixed language support in vtkDataArray

Berk Geveci berk.geveci at kitware.com
Thu Jan 23 19:44:50 EST 2014


Sounds good. Thanks for the work and I apologize if I was too negative...

-berk

On Thu, Jan 23, 2014 at 6:40 PM, Burlen Loring <burlen.loring at gmail.com> wrote:
> OK. let's abandon this patch then. If developers are careful things should
> work observing the DeleteEvent alone. the resize after SetArray and multiple
> back to back SetArray call cases are unlikely to arise under normal
> circumstances. I only wanted to cover these cases to make it bulletproof and
> efficient, as in those cases the data could be free'd before the delete
> event occurs.
>
>
> On 01/23/2014 01:40 PM, Berk Geveci wrote:
>>
>> Here is a code snippet:
>>
>> def MakeObserver(numpy_array):
>>      "Internal function used to attach a numpy array to a vtk array"
>>      def Closure(caller, event):
>>          foo = numpy_array
>>      return Closure
>>
>> vtkarray = numpy_support.numpy_to_vtk(array)
>> vtkarray.SetName(name)
>> # This makes the VTK array carry a reference to the numpy array.
>> vtkarray.AddObserver('DeleteEvent', MakeObserver(array))
>>
>> See numpy_support for numpy_to_vtk but it essentially uses
>> SetVoidArray() with the buffer coming from numpy. This guarantees that
>> the numpy array sticks around until the VTK array is deleted. If the
>> original reference to the numpy array is released and then the VTK
>> array is deleted, the numpy array will be deleted. If either sticks
>> around, the numpy array will stick around. Note that this will crash
>> if the VTK object is deleted after Python is finalized. But that's
>> fine because if Python is finalized, it will delete the numpy array
>> anyway and make the VTK pointer a dangling one.
>>
>> See dataset_adapter.py in ParaView for the full implementation.
>>
>> This does not support array resizing but I don't see any reason why it
>> should. Any data coming from a simulation should be treated as
>> read-only by VTK anyway. Filters will never try to resize or change an
>> array coming as input.
>>
>> I agree that the implementation that you are suggesting will support a
>> lot more use cases. However, I am not convinced that there is a need
>> to add code to VTK and make it harder to maintain to support
>> hypothetical use cases. The in situ use case is pretty
>> straightforward:
>>
>> - Simulation allocates data structures for its own use
>> - Data structures are passed to VTK by reference in some sort of adaptor
>> code
>> - VTK does in situ processing - it treats those data structures as
>> read only. It may produce new data and will delete those internally
>> - Simulation code continues and if it wants deletes its own data
>> structures. At this point VTK will have dangling pointers but who
>> cares
>> - Repeat until simulation is done
>>
>> This use case is simply handled by using SetVoidArray() without any
>> delete option. No need to change VTK.
>>
>> The reason I implemented the Python code is a different use case. It
>> was to support Python programmable filters that return new numpy
>> arrays that VTK takes the ownership of. It is handled by the
>> dataset_adapter.py without having to change VTK.
>>
>> To do anything beyond this, I'd like to see some actual use cases that
>> fall within VTK's design and "mission statement".
>>
>> Best,
>> -berk
>>
>>
>>
>> On Thu, Jan 23, 2014 at 2:56 PM, Burlen Loring <burlen.loring at gmail.com>
>> wrote:
>>>
>>> Hi Berk,
>>>
>>> I'm happy that you're working on this too! I think it's an important,
>>> language agnostic, issue for interfacing to VTK in general.
>>>
>>> Assuming that your solution is perfect, one downside I see is that its
>>> very
>>> python specific. I think that the benefit to adding a PointerFreeEvent
>>> that
>>> users could respond to would solve the memory management issues
>>> associated
>>> with passing data to VTK by pointer in a very general language agnostic
>>> way.
>>> It also fits nicely in VTK's event/observer pattern, it's a small change
>>> requiring no new api, and the way I implemented it doesn't impact
>>> performance for those not passing pointers through SetArray api. Those
>>> are
>>> my main points.
>>>
>>>
>>> My opinion is that passing ownership of an array allocated in a different
>>> code to VTK is usually a dangerous thing - the developers of the other
>>> code
>>> can easily add a deallocation method somewhere without noticing that VTK
>>> owns it. They are usually not aware of what VTK does, specially in in
>>> situ
>>> type application.
>>>
>>> I'd say that it's a dangerous thing given the way VTK is currently
>>> implemented. I don't think it has to be though. The issue is that doing
>>> this
>>> safely will require some coordination between VTK and the owner of the
>>> memory. Currently there's no way for the owner to know when VTK is
>>> finished
>>> with the data and it's safe to deallocate. Adding PointerFreeEvent, which
>>> could be used to alert the owner of the memory that VTK is finished and
>>> the
>>> memory could be safely deallocated, provides the path for the
>>> coordination
>>> needed to resolve this issue.
>>>
>>> For example, the PointerFreeEvent and event/observer pattern allows VTK
>>> to
>>> interface to any arbitrary reference counting implementation through a
>>> small
>>> piece of user provided glue code, namely their specific implementation of
>>> vtkCommand. The python numpy case makes a great illustration of this, and
>>> I
>>> wrote a small application to accompany the VTK patch for the purposes of
>>> this discussion. eg see vtkPointerFreeEventObserver in
>>> vtkTestZeroCopyPython.cxx and its use in the addScalar function exposed
>>> to
>>> python.
>>> https://github.com/burlen/TestZeroCopyPython/tree/vtk-command-callback
>>>
>>> Here's how I see this in detail:
>>>
>>> * internal to VTK what makes passing data directly by pointer dangerous
>>> is
>>> that the memory backing the data can disappear/go out of scope/etc before
>>> VTK is done with it. external to VTK, aside from some very simple cases,
>>> it's not easy to know when VTK is finished with the data and thus safe to
>>> deallocate. These issues are language agnostic. Leveraging VTK's
>>> event/observer pattern with a new  PointerFreeEvent provides a language
>>> agnostic notification path allowing these issues to be managed
>>> effectively.
>>>
>>> * Another factor of importance when dealing with large data is that you
>>> want
>>> to release what ever data you can as soon as you can to reduce memory
>>> pressure. You don't want to have large vtk data arrays around for the
>>> life
>>> of the application when they're not needed. When passing a pointer in to
>>> VTK, the underlying data should be release in response to 3 events 1) the
>>> vtk array is resized, 2) the vtk array is deleted, 3) SetArray is called
>>> again with a new pointer. The new PointerFreeEvent, invoked at these 3
>>> spots, allows the owner to deallocate at the right time, and not sooner,
>>> in
>>> a language agnostic way.
>>>
>>> * the owner of the data passed may not have direct access to VTK objects
>>> or
>>> even know about VTK. eg an in-situ library that doesn't expose VTK
>>> objects
>>> in its api. The event/observer pattern with the new PointerFreeEvent
>>> provides a flexible path way for the owner to be notified that VTK is
>>> finished. It's easy to use this in the glue code hiding VTK's
>>> implementation
>>> from the owner.
>>>
>>> * I also think that in the python numpy case, VTK should manage the py
>>> object ref count invisible to the user in it's glue code so that it's
>>> inherently safe no matter what the user does, or if the numpy obect goes
>>> out
>>> of scope. I've been looking at VTK's python wrappings and this and would
>>> be
>>> straight forward to implement using an observer to the new
>>> PointerFreeEvent
>>> (that's potentially the topic of a follow up patch assuming this ever
>>> gets
>>> accepted).
>>>
>>> I'm not familiar with your PV solution, but for now I'll assume that it
>>> addresses all of the above correctly. however these issues are not
>>> specific
>>> to python. Given that the event/observer pattern and PointerFreeEvent is
>>> a
>>> small change adding no new api, doesn't impact performance, and would
>>> solve
>>> the issue in a language agnostic manner, would it not be a better choice?
>>>
>>>
>>> I am a little confused about why this is needed for Fortran arrays. Why
>>> couldn't you set VTK to not delete the array and leave it to the owner of
>>> the array to take care of deletion?
>>>
>>> the primary issue is that, in non-trivial cases, the owner doesn't know
>>> when
>>> VTK is finished with the data. This issue is not limited to python and
>>> fortran, it's an issue any time data is passed to VTK by pointer and
>>> should
>>> not be free'd with free or delete[]. One could even imagine a high
>>> performance c++ app internally managing a pool of memory on its own where
>>> free/delete[] would not be used yet memory passed to VTK would still need
>>> to
>>> be reclaimed in a timely manner.
>>>
>>>
>>> I already have this working in ParaView's Python using observers and the
>>> fact that a Python observer holds on to the function object, which though
>>> a
>>> closure can hold a reference to the array. This works as long as Python
>>> is
>>> finalized after the VTK array is deleted.
>>>
>>> Cool. Where can I find the implementation? In your solution what happens
>>> when the array is resized? what happens when a new pointer is passed
>>> through
>>> SetVoidArray? Your last statement makes me worry that data may be held in
>>> memory until python is finalized, if so that would be a show stopper, as
>>> we
>>> have a large number of very large arrays being exchanged over a
>>> potentially
>>> long run time and need them to be released as quickly as possible.Even if
>>> your solution were otherwise perfect, I still think a non-python specific
>>> solution would be better.
>>>
>>> Burlen
>>>
>>>
>>> On 01/23/2014 05:49 AM, Berk Geveci wrote:
>>>
>>> Hey guys,
>>>
>>> Please let me take a look at this before merging it. I have already done
>>> similar things. In fact, this is not needed for Python. I already have
>>> this
>>> working in ParaView's Python using observers and the fact that a Python
>>> observer holds on to the function object, which though a closure can hold
>>> a
>>> reference to the array. This works as long as Python is finalized after
>>> the
>>> VTK array is deleted.
>>>
>>> Also, I am a little confused about why this is needed for Fortran arrays.
>>> Why couldn't you set VTK to not delete the array and leave it to the
>>> owner
>>> of the array to take care of deletion? My opinion is that passing
>>> ownership
>>> of an array allocated in a different code to VTK is usually a dangerous
>>> thing - the developers of the other code can easily add a deallocation
>>> method somewhere without noticing that VTK owns it. They are usually not
>>> aware of what VTK does, specially in in situ type application.
>>>
>>> Best,
>>> -berk
>>>
>>>
>>>
>>> On Wed, Jan 22, 2014 at 6:00 PM, Burlen Loring <burlen.loring at gmail.com>
>>> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>> I've refactored the patch as David suggested adding
>>>> vtkCommand::PointerFreeEvent which is fired when the pointer passed to
>>>> SetArray is no longer needed and the delete method is
>>>> VTK_DATA_ARRAY_CALLBACK. Would you mind taking another look at it?
>>>> http://review.source.kitware.com/#/c/14072/5
>>>>
>>>>
>>>> Burlen
>>>>
>>>> On 01/19/2014 03:43 PM, David Gobbi wrote:
>>>>>
>>>>> I meant vtkCommand, not vtkCallback.  Invoke a FreeEvent and catch it
>>>>> on the other side with a vtkCommand.
>>>>>
>>>>> On Sun, Jan 19, 2014 at 4:41 PM, David Gobbi <david.gobbi at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Burlen,
>>>>>>
>>>>>> If that trick won't work, then maybe the easiest thing is to make the
>>>>>> vtkDataArray generate a specific event (e.g. create a new event called
>>>>>> a FreeEvent) that is called when the memory is freed.  That way you
>>>>>> are taking advantage of the existing vtkCallback support, which is
>>>>>> already wrapped by all of the wrapper languages.
>>>>>>
>>>>>>     David
>>>>>>
>>>>>> On Sun, Jan 19, 2014 at 4:22 PM, Burlen Loring
>>>>>> <burlen.loring at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> thanks, that's a neat trick!
>>>>>>>
>>>>>>> it certainly keeps one safe : the numpy object is around while the
>>>>>>> array is.
>>>>>>> I don't think that this will do the right thing if the vtk array is
>>>>>>> resized,
>>>>>>> or if SetVoidPointer is called again. In both cases, the vtk data
>>>>>>> array
>>>>>>> is
>>>>>>> still alive and well, but the numpy array's ref count should be
>>>>>>> decremented.
>>>>>>>
>>>>>>> a python only solution wont work for me here, because I'm working in
>>>>>>> the
>>>>>>> C/C++ "glue layer" sitting in between the python app(s) (which aren't
>>>>>>> mine)
>>>>>>> and a legacy VTK based app. VTK objects aren't used in the app's
>>>>>>> python
>>>>>>> API
>>>>>>> at all,  only lists, tuples, and I'm adding support for numpy arrays.
>>>>>>> Using
>>>>>>> VTK objects in the API is not an option.
>>>>>>>
>>>>>>>
>>>>>>> On 1/19/2014 8:43 AM, David Gobbi wrote:
>>>>>>>>
>>>>>>>> Hi Burlen,
>>>>>>>>
>>>>>>>> There might be a simpler way to get VTK to free the numpy array,
>>>>>>>> that won't require any changes to the VTK code at all: just add
>>>>>>>> the numpy array to the VTK array as an attribute.  Here is some
>>>>>>>> example code:
>>>>>>>>
>>>>>>>> == BEGIN EXAMPLE ==
>>>>>>>>
>>>>>>>> # An example of using numpy arrays in VTK
>>>>>>>> import vtk
>>>>>>>> import numpy
>>>>>>>> import weakref
>>>>>>>> import gc
>>>>>>>>
>>>>>>>> # Use a numpy array as a VTK array
>>>>>>>> z = numpy.arange(0,10,1,float)
>>>>>>>> a = vtk.vtkDoubleArray()
>>>>>>>> a.SetVoidArray(z,10,1)
>>>>>>>>
>>>>>>>> # Make the VTK array track the numpy array
>>>>>>>> a.array = z
>>>>>>>>
>>>>>>>> # Create weakrefs to check for deletion
>>>>>>>> rz = weakref.ref(z)
>>>>>>>> ra = weakref.ref(a)
>>>>>>>>
>>>>>>>> # Decref the numpy array, it won't be deleted yet
>>>>>>>> del z
>>>>>>>> gc.collect()
>>>>>>>> if rz() == None:
>>>>>>>>        print "numpy array is deleted (#1)"
>>>>>>>>
>>>>>>>> # Decref the VTK array, numpy array will be decref'd
>>>>>>>> del a
>>>>>>>> gc.collect()
>>>>>>>> if rz() == None:
>>>>>>>>        print "numpy array is deleted (#2)"
>>>>>>>>
>>>>>>>> == END EXAMPLE ==
>>>>>>>>
>>>>>>>> This works because the attributes of a VTK-Python object are kept
>>>>>>>> alive for as long as the corresponding VTK-C++ object exists.
>>>>>>>>
>>>>>>>>      David
>>>>>>>>
>>>>>>>> On Sun, Jan 19, 2014 at 12:09 AM, Burlen Loring
>>>>>>>> <burlen.loring at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> To illustrate the callback in action I wrote an example where data
>>>>>>>>> is
>>>>>>>>> transferred to VTK from numpy ndarrays. I hope this will help.
>>>>>>>>> README
>>>>>>>>> file
>>>>>>>>> explains the example. https://github.com/burlen/TestZeroCopyPython
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a
>>>>>>>>>>> key
>>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>>> passing
>>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>>>
>>>>>>>>>> Thanks. The problem with callback context ownership is that if
>>>>>>>>>> it's
>>>>>>>>>> a
>>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>>> notification
>>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>>> may
>>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>>> context.
>>>>>>>>>>
>>>>>>>>> I'm not following your complaint here. We're concerned about a
>>>>>>>>> pointer to
>>>>>>>>> some array that must be kept alive while VTK uses it but must be
>>>>>>>>> released
>>>>>>>>> when VTK is done. In this situation VTK "no longer needs the data"
>>>>>>>>> only
>>>>>>>>> ever
>>>>>>>>> once. and just as VTK always will call free when the pointer is
>>>>>>>>> passed in
>>>>>>>>> through the existing SetArray w/ VTK_DATA_ARRAY_FREE flag. VTK will
>>>>>>>>> always
>>>>>>>>> call the callback, either when the vtkDataArray is resized or when
>>>>>>>>> it
>>>>>>>>> is
>>>>>>>>> Delete'd. The callback and callback data are pointers, as far as
>>>>>>>>> VTK
>>>>>>>>> is
>>>>>>>>> concerned they're just values, and VTK need not worry about
>>>>>>>>> free'ing
>>>>>>>>> them.
>>>>>>>>> Note that this doesn't prevent one from passing an instance of an
>>>>>>>>> arbitrary
>>>>>>>>> class in callbackData. In that case the instance should be deleted
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> callback.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>>> valid/usefull.
>>>>>>>>>>
>>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>>> just
>>>>>>>>>> need the array parameter, so NULL callbackData is needed.
>>>>>>>>>
>>>>>>>>> NULL callbackData would generally not work out. the callback needs
>>>>>>>>> non-NULL
>>>>>>>>> callbackData passed to it in order to identify the memory to free.
>>>>>>>>> In
>>>>>>>>> your
>>>>>>>>> example callbackData is the array pointer itself, not NULL. This is
>>>>>>>>> not a
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>>>      Where array
>>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>>> the
>>>>>>>>>> array, but callbackData is the arena to free it from. Currently,
>>>>>>>>>> you
>>>>>>>>>> only get one or the other.
>>>>>>>>>
>>>>>>>>> actually I think currently you could get both without issue. don't
>>>>>>>>> forget
>>>>>>>>> callbackData is very flexible, it can be (point to) anything. In
>>>>>>>>> the
>>>>>>>>> case
>>>>>>>>> you're describing: write a class containing the arena and the
>>>>>>>>> array.
>>>>>>>>> pass
>>>>>>>>> a
>>>>>>>>> heap allocated instance of your class as callbackData when you call
>>>>>>>>> SetArray. When VTK calls the callback, in addition to releasing the
>>>>>>>>> array
>>>>>>>>> data, delete the instance to your class.
>>>>>>>>>
>>>>>>>>> Would you be happier with a polymorphic reference counted functor,
>>>>>>>>> rather
>>>>>>>>> than a callback? Initially I thought this would be overkill, but
>>>>>>>>> I'd
>>>>>>>>> be
>>>>>>>>> willing to go that way if people are against the callback idea.
>>>>>>>>>
>>>>>>>>> Burlen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01/17/2014 09:19 PM, Ben Boeckel wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 17, 2014 at 15:21:42 -0800, Burlen Loring wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yes, my intention is that for now they'd be used from glue code
>>>>>>>>>>> rather than exposed directly. In the future the API could be
>>>>>>>>>>> exposed
>>>>>>>>>>> in the wrapped languages if there were a compelling use case for
>>>>>>>>>>> it.
>>>>>>>>>>> Until then it's not worth the effort since I doubt VTK wrapping
>>>>>>>>>>> codes
>>>>>>>>>>> would handle it correctly.
>>>>>>>>>>
>>>>>>>>>> That's valid; just making sure :) .
>>>>>>>>>>
>>>>>>>>>>> OK. as fas as VTK is concerned callbackData is intended to be a
>>>>>>>>>>> key
>>>>>>>>>>> for use by the callback, and VTK does nothing with it beyond
>>>>>>>>>>> passing
>>>>>>>>>>> it to the callback. I'll add a note about this to clarify.
>>>>>>>>>>
>>>>>>>>>> Thanks. The problem with callback context ownership is that if
>>>>>>>>>> it's
>>>>>>>>>> a
>>>>>>>>>> one-time, the callback "owns" the context, but if it's a
>>>>>>>>>> notification
>>>>>>>>>> kind of thing, the *notifier* owns the context (since the callback
>>>>>>>>>> may
>>>>>>>>>> never be called) and needs an additional function to free the
>>>>>>>>>> context.
>>>>>>>>>>
>>>>>>>>>>> Works for me, although I'm not sure NULL callbackData is ever
>>>>>>>>>>> valid/usefull.
>>>>>>>>>>
>>>>>>>>>> As an example, if you have a LocalFree or whatever on Windows, you
>>>>>>>>>> just
>>>>>>>>>> need the array parameter, so NULL callbackData is needed. Where
>>>>>>>>>> array
>>>>>>>>>> and callbackData might be needed is something like where you need
>>>>>>>>>> the
>>>>>>>>>> array, but callbackData is the arena to free it from. Currently,
>>>>>>>>>> you
>>>>>>>>>> only get one or the other.
>>>>>>>>>>
>>>>>>>>>> --Ben
>>>>
>>>>
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.vtk.org/mailman/listinfo/vtk-developers
>>>>
>>>
>



More information about the vtk-developers mailing list