[Paraview] Programmable filter in parallel

David E DeMarle dave.demarle at kitware.com
Thu Aug 11 11:33:59 EDT 2011


You should end up with one multiblock dataset on each processor, all
of those should have eight children. On any given processor 7 of those
children will be NULL and the remaining one will be unique to that
processor. Use UPDATE_PIECE and possibly localprocessid to figure out
which of the eight children the processor should fill in. The rest of
the vtkCompositeDataPipeline that ParaView uses expects and knows how
to handle that structure and filters downstream should have no problem
handling it.

And no these aren't stupid questions. They are described fairly well
in the most recent kitware books and courses but otherwise the
information is widely scattered around the paraview wiki, kitware
source magazine and the mailing list archives.

David E DeMarle
Kitware, Inc.
R&D Engineer
28 Corporate Drive
Clifton Park, NY 12065-8662
Phone: 518-371-3971 x109



On Thu, Aug 11, 2011 at 11:11 AM, Tim Gallagher
<tim.gallagher at gatech.edu> wrote:
> David,
>
> Thanks for your response. It's much clearer how it all works, but I'm still unsure how it fits together.
>
> I don't actually need to know the interprocess links -- I have a list of blocks to read and that list needs to be split over the processors. So each processor needs to identify itself and the total number of procs, but that's all. So I can definitely do that with the mpi4py, I was unaware that would work inside the filter and I didn't know the paraview.vtk.parallel existed.
>
> I'm not actually splitting the structured data; I'm splitting the vtkMultiBlockDataSet. So each processor is responsible for populating a portion of the dataset. For instance, in serial when the file (say, with 8 blocks) is read, we end up with one vtkMultiBlockDataset with 8 vtkStructuredData's inside it. If I have a parallel reader (with 8 processes), I have a hunch I'll end up with 8 vtkMultiBlockDataSet's with one vtkStructuredData under each. Is this correct? Will this cause problems for other filters downstream? If for fun, I wanted to merge it such that each processor still only retains it's block, but they share a common parent vtkMultiBlockDataset, is that possible?
>
> I appreciate your help with this. Maybe these are stupid questions answered somewhere else, but I can't seem to find them!
>
> Tim
>
>
> ----- Original Message -----
> From: "David E DeMarle" <dave.demarle at kitware.com>
> To: gtg085x at mail.gatech.edu
> Cc: "ParaView list" <paraview at paraview.org>
> Sent: Thursday, August 11, 2011 9:54:24 AM
> Subject: Re: [Paraview] Programmable filter in parallel
>
> ParaView tries to do no aggregation other than rendering onto the same
> screen. Each processor is told what portion it is responsible for via
> the UPDATE_EXTENT or UPDATE_PIECE/UPDATE_NUMBER_OF_PIECES keys and are
> supposed to only produce what it is asked for. (See
> http://paraview.org/Wiki/Writing_ParaView_Readers for more of the
> story.)
>
> Filters that need cross communication to work properly (beyond what
> they can get from ghost cells) do so by accessing the
> vtkMultiProcessController that connects all of the nodes in the server
> (or sometimes via MPI directly but that isn't recommended).
>
> Try the following for two means of getting a hold of the interprocess links.
> import paraview.vtk.parallel
> #print(dir(paraview.vtk.parallel))
> #print(dir(paraview.vtk.parallel.vtkMultiProcessController))
> controller = paraview.vtk.parallel.vtkMultiProcessController.GetGlobalController()
> print controller.GetLocalProcessId()
> print controller.GetNumberOfProcesses()
>
> from mpi4py import MPI
> #print(dir(MPI))
> #print(help(MPI))
> print MPI.COMM_WORLD.Get_rank()
> print MPI.COMM_WORLD.Get_size()
>
> Note also that there is a "feature" in the python programmable filter
> that comes into play with structured data. That feature says that
> structured data is not split at all by default. If you want structured
> data to actually be parallel you need to put this code in your python
> programmable filter.
>
> from paraview import util
> self.GetExecutive().SetExtentTranslator(self.GetExecutive().GetOutputInformation(0),
> vtk.vtkExtentTranslator())
>
>
> David E DeMarle
> Kitware, Inc.
> R&D Engineer
> 28 Corporate Drive
> Clifton Park, NY 12065-8662
> Phone: 518-371-3971 x109
>
>
>
> On Wed, Aug 3, 2011 at 11:09 AM, Tim Gallagher <tim.gallagher at gatech.edu> wrote:
>> I guess I sort of answered my own question -- the entire script runs on each processor, so I ended up with 8 copies of my data in memory (or I would have, had I not filled the 12 GB of RAM and 20 GB of swap space and my system crashed).
>>
>> So is there some way to query the processor information? Probably something in the RequestInformation script -- find out how many processors there are and then the prog. filter determines based on processor ID and number of processors what section of the data to load.
>>
>> In that case, how does the aggregation of the data work? The exact pipeline is:
>>
>> DataObjectGenerator("MB{}")
>> ProgrammableFilter
>>
>> in serial, the PF appends blocks into the input and passes that through to the output. In parallel, that same pipeline would create a MB{} on each CPU that gets filled with that CPU's data, but at the end of this step I would want a single MB{} object, not NCPU MB{}'s.
>>
>> Hopefully that makes sense... I've never used PV in parallel, so I'm not sure how it all works.
>>
>> Tim
>>
>> ----- Original Message -----
>> From: "Tim Gallagher" <tim.gallagher at gatech.edu>
>> To: "ParaView list" <paraview at paraview.org>
>> Sent: Wednesday, August 3, 2011 9:24:25 AM
>> Subject: [Paraview] Programmable filter in parallel
>>
>> Hi,
>>
>> I know many of the built-in readers/filters already work in parallel, but how does one write a parallel programmable filter?
>>
>> Our data files are XDMF and split into blocks of data. We have a single XDMF file that we can read that reads all the blocks and generates a vtkMultiBlockDataset (this works with the built in XDMF reader).
>>
>> However, each block has some ghost cells around it that are needed to do the CellDataToPointData interpolation. For large numbers of blocks, this creates far too many grid points for our machines to load. So, I've written a programmable filter that does:
>>
>> start with empty vtkMultiBlockDataset
>> for each block in restart file
>>   read block file with XDMFReader
>>   CellDataToPointData
>>   strip off the extra layers of cells
>>   append to output vtkMultiBlockDataset
>>
>> If I run this in parallel, what exactly is parallel? Is the reading and CD2PD done in parallel on each block? Is none of it parallel? Ideally, I would have the loop over blocks done in parallel, but I don't know how to indicate that in the programmable filter (if it's possible).
>>
>> Any advice would be great,
>>
>> Tim
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
>


More information about the ParaView mailing list