[Paraview] Is VTU File Reading Parallelized?

Berk Geveci berk.geveci at kitware.com
Fri May 30 11:30:47 EDT 2008


> When the comsumer request information about the data (e.g. file reader), it
> is frequently the case that the file must be opened on all processes,
> information extracted and passed back down the pipeline. In many cases,
> opening a huge file (on a shared NFS filesystem - yikes), querying the
> information about point/cell data, number of cells etc is not necessary for
> every node, when later on, node X will be told only to get piece N anyway.
>
> NB. The paraview GUI ignores the information from all nodes other than
> process 0 anyway - so even sending it to the other nodes, seems pointless :)

OK, so we have two distinct use cases:

1. Get information to be displayed by the application
2. Get information that the downstream filters consume

You are right that for (1) we should not open the file on anything
that the first node. For (2), all nodes need some meta-information to
function. For example, if the reader produces structured data, all
nodes need WHOLE_EXTENT(). For imaging filters, they need the scalar
type, number of components etc. It is possible to write a parallel
reader such that it opens the file in the first RequestInformation()
on the first node, reads and caches all necessary information (so that
subsequent calls to RequestInformation() can return the cached
information and not open the file again) and also broadcasts it to the
other nodes. Such a reader would need the get the global controller to
do that. Be careful if you decide to make a change, it is very
important that your algorithm does not call any methods that causes
its MTime to change during or between any of the pipeline changes.
This may cause serious issues including deadlock due to asymmetric
execution among different processes.

> What I was wondering, is why is UPDATE_PIECE and NUMBER_OF_PIECES not set in
> RequestInformation? I've struggled with this for years and never understood
> why it is (seemingly deliberately) not set. In most readers, the file is
> opened, scanned and information set during RequestInformation. I have on
> occasion, sent the information from node 0 to the other nodes using mpi
> rather than have them open the file themselves. Often the information
> depends on the node number and so lacking this information is strange.

The fact that ParaView, by default, sets UPDATE_PIECE to local process
id and NUMBER_OF_PIECES to number of processes does not mean it is
always so. When streaming, NUMBER_OF_PIECES is number of procs *
number of streaming passes. This is something I have struggled with as
well. My conclusion is that if you want to know the number processes
and your process id, get it from your controller (make sure to give
the user the option to get the controller in case we start using MPI
groups in the future). Do not assume that number of processes ==
number of pieces. Of course, parallel algorithms that need the whole
data (distributed) to function, will not be available when streaming
and they can assume that number of pieces == number of processes (for
example streamtracer).

-berk


More information about the ParaView mailing list