[VTK ARB] Discussion for next meeting

Fri Aug 13 12:33:03 EDT 2010

These are great discussion points. I'm looking forward to a lively exchange.

On Fri, Aug 13, 2010 at 10:50 AM, Berk Geveci <berk.geveci at kitware.com> wrote:
>> The change to remove SetInput() is very invasive. I hope we can
>> quatify and justify that such a drastic change will have positive
>> impact on VTK maintenance and developer productivity. Otherwise, the
>> impact on customers and "outside" developers cannot be justified.
>
> Let me try to give some concrete examples and driving use cases. First
> to clarify, my goal is to simplify VTK, including parts that are
> currently accessible to only experts: the pipeline, parallel
> processing, client-server computing, the rendering kits etc etc. I
> believe that by making more of the lower level VTK code accessible to
> developers, we can attract more researchers who are currently building
> their own frameworks for their research. I hope that this will help us
> move VTK to the future. Whether or not this justifies breaking
> backward compatibility is probably best discussed during our meeting.
>
> 1. Separation of "data model" and "execution model":
>
> As I mentioned here
> http://paraview.org/Wiki/VTK/Modularization_Proposal, we would like to
> reorganize VTK's kits such that people can depend on minimal subsets
> of VTK. One of the driving use cases behind this is to create a
> self-contained "data model" kit within VTK. It turns out that VTK's
> data model is very  successul and there are more and more applications
> using it. Some of these applications are not what we originally
> expected. For example, there is discussion about using VTK's data
> model as the interface for IO libraries. Think of a scientific
> simulation running on a supercomputer. Currently, to write their
> output, such simulations use libraries like HDF5 and NetCDF. These
> libraries provide a very low-level, array-based API. This makes it
> impossible to optimize the IO based on the data structure (for example
> to merge unstructured partitions before writing). So it would be much
> better if the IO API was a data model and middle IO layer know about
> this data model (underlying IO implementation can still be HDF5). Such
> applications want to depend on only a small subset of VTK. Significant
> code bloat is not acceptable. It is not possible to separate the data
> object classes from the algorithm and executive classes without
> removing the producer from the data object. Without the producer,
> SetInput() cannot be implemented. Of course, I lied. I can come up
> with hacks to leave the producer and still the data object class not
> depend on the algorithm class but this conflicts with the "simplify"
> goal.
>
> 2. Using another filter inside a filter:
>
> If I want to delegate some of my algorithms work to another filter
> inside RequestData(), currently I have to do this:
>
> vtkDataObject* copy = my_input->NewInstance();
> copy->ShallowCopy(my_input);
> internal_filter->SetInput(copy);
> internal_filter->Update();
> // do something with internal_filter->GetOutput();
>
> because if I do
>
> internal_filter->SetInput(my_input);
> internal_filter->Update();
>
> the internal filter end up being connected to my filter's input and
> Update() tries to propagate the pipeline request with another pipeline
> request. Really bad. And really hard to explain to junior algorithm
> developers. So, this is a very very common mistake. If the data object
> does not refer to its producer and if we have a substitute for
> SetInput() that takes a data object without connecting the pipeline,
> the following would work very nicely
>
> internal_filter->SetInputData(my_input); // or whatever this method is called
> internal_filter->Update();
>
> I spent quite a lot of time trying to figure out how to fix this
> without breaking backwards compatibility in the past. I couldn't come
> up with a good solution. Of course, we can create convenience
> functions that do that shallow copy but that still leaves a path that
> is wrong but looks right.
>
> In fact, I was discussing this during lunch at Kitware yesterday and
> one of the more senior developers (I won't name names) thought that
> SetInput() did not connect the pipeline and could be used internally.
>
> 3. Getting rid of the RequestDataObject pass:
>
> RequestDataObject pass sucks. It requires that some readers go and
> read bunch of data to satisfy it. It requires that we jump through
> hoops in dealing with composite and temporal datasets. It is totally
> unnecessary. Data objects are not used to propagate meta-data so why
> should they be there before RequestData(). In fact, it is bad that
> they are because junior algorithm developers access the data object
> and store stuff in them. Then the executive wipes the output in the
> beginning of RequestData() and now someone more senior has to explain
> why you shouldn't access the output in RequestInformation(). Getting
> rid of it is easy. But that means that GetOuput() will return NULL
> unless RequestData() is called. Therefore, output cannot be used to
> connect pipelines anymore.
>
> I can come up with more use cases but these are the most important
> ones. Also, I would like to reiterate that we can fix many instances
> of SetInput() with some sort of script. If we decide to move forward
> with this change, I'd propose that we make SetInput() deprecated in
> 5.10 (to be released in November) and remove it in 6.0 (to be release
> in May 2011 or so).
>
> -berk
>