[Paraview-developers] minimizing memory for reader (or catalyst)

Fri May 12 13:07:53 EDT 2017

I am reworking a reader module to see how I can improve memory usage and performance by using internal caching and I would like to see if my concept has major flaws or other things to worry about. The final target will be in-situ usage, but I'll practicing a bit with the paraview reader module.

I have concluded that since the basic simulation data do not lend themselves to normal zero-copy techniques, and I'm not completely convinced that the vtkMappedUnstructuredGrid approach will work either (I need to deal with possible on-the-fly polyhedral decomposition too), my approach like an inside-out version of vtkMappedUnstructuredGrid.

1) Query my simulation to obtain all the sizing data.
2) Allocate a vtkUnstructuredGrid with the appropriate sizing and pass the underlying VTK data contents back for filling in. This easiest to done by the simulation, since it  knows its own data structures and minimizes the ABI connection. Only the size of vtkIdType is needed by the simulation itself and the interface code is templated on variants of that.
3) Finally update the vtkUnstructuredGrid

   Eg,
   ... query simulation for sizing
   // get pointers for/from unstructured grid:
     vtkSmartPointer<vtkCellArray> cells = vtkmesh->GetCells();
     cells->GetData()->SetNumberOfTuples(nConnectivity);
    .... other arrays and sizing

     // wrap the WritePointer with a pass-through list-container (no allocation)
     UList<vtkIdType> ul_cells
     (
         cells->WritePointer(sizing.nFieldCells(), sizing.nConnectivity()),
         sizing.nConnectivity()
     );

    // fill with contents in a form that VTK would expect
     sizing.populateInternal(myMesh, ul_cellTypes,ul_cells, ul_cellLocations,ul_faces, ul_faceLocations, myMapping);

   // update VTK side of things:
     vtkmesh->SetCells(cellTypes, cellLocations, cells, faceLocations, faces);

This seems to work OK and is more-or-less an en bloc alternative to vtkMappedUnstructuredGrid.

The next stage is where it gets interesting, or at least where I get quite confused. Assuming that the mesh doesn't change very often during a simulation, I would like to re-use the VTK grid entirely. For this, I'm using hashtable with a (std::string, vtkSmartPointer<vtkUnstructuredGrid>) key/value pair that handles deletion nicely.

On the first time through, I create the vtk-mesh and store as a SmartPointer in the hash before attaching it to the ParaView output:
    vtkMultiBlockDataSet* output = vtkMultiBlockDataSet::SafeDownCast(outputVector->GetInformationObject(0)->Get(vtkMultiBlockDataSet::DATA_OBJECT());
    ...
    block->SetBlock(datasetNo, dataset)

I assume that this simply increases the RefCount, or does paraview make a deep-copy of the data returned on the output vector?
On further calls, I can simply do the same type of thing (adding to the output object), but with the grid retrieved from the hashtable cache.

So what happens when I start modify the cached values (presuming that paraview has a shallow copy of the data)?
For example, there are topology changes that I would like to handle. If I access my pointer from the cache, make the modifications and then add it back to the output vector, what state is the data in? Am I modifying (corrupting) data that is currently in use by paraview? Or does the paraview hold off until I complete the action? During this modification time, do I have two copied of the data, or are they always shallow copies referencing the same data?

Any good pointers to understanding this would be very greatly appreciated! I've scoured the wiki and other locations, but without much success.

Cheers,
/mark