[vtk-developers] Time Parallelism + vtkFileCachePipeline + vtkNetCDFReader

Sat Sep 24 11:50:35 EDT 2005

I've been working on parallelising vtkNetCDFReader for use in ParaView
and have some questions regarding pieces/pipelines and time variables.

I'm noticing that one individual dataset is generally not huge, but when
1600 time steps are present, the size of data is on the increase. The
reader currently manages extents using vtkExtentTranslator which does a
very nice job of partitioning the dataset into blocks and the data is
being split across processors nicely. However the reader will only reads
the current timeFrame and this seems like a terrible waste of resources,
when in fact 8 nodes could in fact read 100 time steps each and still
have room to spare.

Some time ago I put together a convenient filter called
vtkFileCachePipelineFilter which is a wrapper around a sub pipeline : to
save myself writing too much in this email, I've put a cuople of slides
(actually several almost duplicated ones) into here
ftp://ftp.cscs.ch/out/biddisco/vtk/vtkFileCachePipeline.ppt

The idea is very simple. Every time you run your program (eg demo of
something nice), you spend ages waiting for the reader/contour/glyph etc
etc to execute, and then you render it. In my code, I tend to save the
results to disk, put some #ifdef statements in to allow myself to
reexecute the pipleline if I want to regenerate the data (eg with new
parameters). I got fed up with this eventually (maintenance), and so
created a pipeline wrapper which caches the data (Terminology I use is,
shallow cache = on disk, deep cache=in memory). when you rerun the
program, the cache filter intercepts the pipeline update, checks the
disk and if the cache file is there, loads it in (and optionally keeps
it in memory). The file name can be set in the cache filter in a
sprintf(%) type way so that cache with param X is stored in one file, if
param Y is requested, it gets cached in another file etc etc. I've added
an option to use time variables as the param, so when running an
animation, the whole series of time data gets cached (memory permitting)
and it looks lovely from the user's viewpoint.The first time they run
it, the data gets loaded, but once the animation goes back to step zero,
it's all in memory and all is well.

Now in actual fact, the filter is extremely simple and does very little,
but now that I'm putting together the parallelism on the netCDF reader,
I see that there's an opportunity here to do something a bit nice and
combine the time variable caching and the parallel streaming into a
combined concept.

My initial thought is that the Heirarchical Dataset (the netDCF reader
is now a subclass of vtkHeirarchicalDatasetAlgorithm) could be populated
with N datasets (each representing a time step) and the one that is
passed downstream to the rest of the pipeline is selected from this by
the time step variable/parameter. I'd want to then use the animation
within paraview to stop through the data more rapidly than previously.
(Combining this with the caching ability so that downstream filters
could do contours/glyphs etc and have those results cached - would
actually be more useful in many cases thasn the raw data).

I'm told that others are working on similar ideas. If anyone has any
documents on this subject that I might be able to read, I'm going on
holiday in a couple of days and would welcome some extra material. In
particular, is there already a mechanism for splitting data across time
as well as spatial extents. etc etc.

Many thanks

JB

-- 
John Biddiscombe,                            email:biddisco @ cscs.ch
http://www.cscs.ch/about/BJohn.php
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland      | Fax:  +41 (91) 610.82.82