[VTK ARB] Discussion for next meeting

Berk Geveci berk.geveci at kitware.com
Fri Aug 13 10:50:52 EDT 2010


> The change to remove SetInput() is very invasive. I hope we can
> quatify and justify that such a drastic change will have positive
> impact on VTK maintenance and developer productivity. Otherwise, the
> impact on customers and "outside" developers cannot be justified.

Let me try to give some concrete examples and driving use cases. First
to clarify, my goal is to simplify VTK, including parts that are
currently accessible to only experts: the pipeline, parallel
processing, client-server computing, the rendering kits etc etc. I
believe that by making more of the lower level VTK code accessible to
developers, we can attract more researchers who are currently building
their own frameworks for their research. I hope that this will help us
move VTK to the future. Whether or not this justifies breaking
backward compatibility is probably best discussed during our meeting.

1. Separation of "data model" and "execution model":

As I mentioned here
http://paraview.org/Wiki/VTK/Modularization_Proposal, we would like to
reorganize VTK's kits such that people can depend on minimal subsets
of VTK. One of the driving use cases behind this is to create a
self-contained "data model" kit within VTK. It turns out that VTK's
data model is very  successul and there are more and more applications
using it. Some of these applications are not what we originally
expected. For example, there is discussion about using VTK's data
model as the interface for IO libraries. Think of a scientific
simulation running on a supercomputer. Currently, to write their
output, such simulations use libraries like HDF5 and NetCDF. These
libraries provide a very low-level, array-based API. This makes it
impossible to optimize the IO based on the data structure (for example
to merge unstructured partitions before writing). So it would be much
better if the IO API was a data model and middle IO layer know about
this data model (underlying IO implementation can still be HDF5). Such
applications want to depend on only a small subset of VTK. Significant
code bloat is not acceptable. It is not possible to separate the data
object classes from the algorithm and executive classes without
removing the producer from the data object. Without the producer,
SetInput() cannot be implemented. Of course, I lied. I can come up
with hacks to leave the producer and still the data object class not
depend on the algorithm class but this conflicts with the "simplify"
goal.

2. Using another filter inside a filter:

If I want to delegate some of my algorithms work to another filter
inside RequestData(), currently I have to do this:

vtkDataObject* copy = my_input->NewInstance();
copy->ShallowCopy(my_input);
internal_filter->SetInput(copy);
internal_filter->Update();
// do something with internal_filter->GetOutput();

because if I do

internal_filter->SetInput(my_input);
internal_filter->Update();

the internal filter end up being connected to my filter's input and
Update() tries to propagate the pipeline request with another pipeline
request. Really bad. And really hard to explain to junior algorithm
developers. So, this is a very very common mistake. If the data object
does not refer to its producer and if we have a substitute for
SetInput() that takes a data object without connecting the pipeline,
the following would work very nicely

internal_filter->SetInputData(my_input); // or whatever this method is called
internal_filter->Update();

I spent quite a lot of time trying to figure out how to fix this
without breaking backwards compatibility in the past. I couldn't come
up with a good solution. Of course, we can create convenience
functions that do that shallow copy but that still leaves a path that
is wrong but looks right.

In fact, I was discussing this during lunch at Kitware yesterday and
one of the more senior developers (I won't name names) thought that
SetInput() did not connect the pipeline and could be used internally.

3. Getting rid of the RequestDataObject pass:

RequestDataObject pass sucks. It requires that some readers go and
read bunch of data to satisfy it. It requires that we jump through
hoops in dealing with composite and temporal datasets. It is totally
unnecessary. Data objects are not used to propagate meta-data so why
should they be there before RequestData(). In fact, it is bad that
they are because junior algorithm developers access the data object
and store stuff in them. Then the executive wipes the output in the
beginning of RequestData() and now someone more senior has to explain
why you shouldn't access the output in RequestInformation(). Getting
rid of it is easy. But that means that GetOuput() will return NULL
unless RequestData() is called. Therefore, output cannot be used to
connect pipelines anymore.

I can come up with more use cases but these are the most important
ones. Also, I would like to reiterate that we can fix many instances
of SetInput() with some sort of script. If we decide to move forward
with this change, I'd propose that we make SetInput() deprecated in
5.10 (to be released in November) and remove it in 6.0 (to be release
in May 2011 or so).

-berk



More information about the Arb mailing list