[Paraview] Programmable filter in parallel

Fri Aug 12 17:35:01 EDT 2011

Howdy Sean,

When you set the extent translator does each processor not get a
different update extent?

David E DeMarle
Kitware, Inc.
R&D Engineer
28 Corporate Drive
Clifton Park, NY 12065-8662
Phone: 518-371-3971 x109

On Fri, Aug 12, 2011 at 12:40 PM, Sean Ziegeler
<sean.ziegeler at nrlssc.navy.mil> wrote:
> I appear to be running into the same problem with my programmable filter.
>  Since the Transform filter cannot scale rectilinear data, I wrote the
> following programmable filter to do it:
>
> zscale = 0.001
> pdi = self.GetInput()
> pdo = self.GetOutput()
> pdo.ShallowCopy(pdi)
> zsi = pdi.GetZCoordinates()
> zso = vtk.vtkDoubleArray()
> zso.DeepCopy(zsi)
> zss = zso.GetSize()
> for i in xrange(zss):
>  zso.SetValue(i, zsi.GetValue(i)*zscale)
> pdo.SetZCoordinates(zso)
>
> Obviously, I need to update it to use the newer input/output names and numpy
> arrays for speed, but it does work in serial.  However, it appears to
> duplicate every point on every processor in parallel.  I've been poring over
> the docs and experimenting, but I've yet to find a way to use UPDATE_EXTENT
> properly in parallel with rectilinear data.  Any ideas?
>
> Thanks,
> Sean
>
> On 08/11/11 10:33, David E DeMarle wrote:
>>
>> You should end up with one multiblock dataset on each processor, all
>> of those should have eight children. On any given processor 7 of those
>> children will be NULL and the remaining one will be unique to that
>> processor. Use UPDATE_PIECE and possibly localprocessid to figure out
>> which of the eight children the processor should fill in. The rest of
>> the vtkCompositeDataPipeline that ParaView uses expects and knows how
>> to handle that structure and filters downstream should have no problem
>> handling it.
>>
>> And no these aren't stupid questions. They are described fairly well
>> in the most recent kitware books and courses but otherwise the
>> information is widely scattered around the paraview wiki, kitware
>> source magazine and the mailing list archives.
>>
>> David E DeMarle
>> Kitware, Inc.
>> R&D Engineer
>> 28 Corporate Drive
>> Clifton Park, NY 12065-8662
>> Phone: 518-371-3971 x109
>>
>>
>>
>> On Thu, Aug 11, 2011 at 11:11 AM, Tim Gallagher
>> <tim.gallagher at gatech.edu>  wrote:
>>>
>>> David,
>>>
>>> Thanks for your response. It's much clearer how it all works, but I'm
>>> still unsure how it fits together.
>>>
>>> I don't actually need to know the interprocess links -- I have a list of
>>> blocks to read and that list needs to be split over the processors. So each
>>> processor needs to identify itself and the total number of procs, but that's
>>> all. So I can definitely do that with the mpi4py, I was unaware that would
>>> work inside the filter and I didn't know the paraview.vtk.parallel existed.
>>>
>>> I'm not actually splitting the structured data; I'm splitting the
>>> vtkMultiBlockDataSet. So each processor is responsible for populating a
>>> portion of the dataset. For instance, in serial when the file (say, with 8
>>> blocks) is read, we end up with one vtkMultiBlockDataset with 8
>>> vtkStructuredData's inside it. If I have a parallel reader (with 8
>>> processes), I have a hunch I'll end up with 8 vtkMultiBlockDataSet's with
>>> one vtkStructuredData under each. Is this correct? Will this cause problems
>>> for other filters downstream? If for fun, I wanted to merge it such that
>>> each processor still only retains it's block, but they share a common parent
>>> vtkMultiBlockDataset, is that possible?
>>>
>>> I appreciate your help with this. Maybe these are stupid questions
>>> answered somewhere else, but I can't seem to find them!
>>>
>>> Tim
>>>
>>>
>>> ----- Original Message -----
>>> From: "David E DeMarle"<dave.demarle at kitware.com>
>>> To: gtg085x at mail.gatech.edu
>>> Cc: "ParaView list"<paraview at paraview.org>
>>> Sent: Thursday, August 11, 2011 9:54:24 AM
>>> Subject: Re: [Paraview] Programmable filter in parallel
>>>
>>> ParaView tries to do no aggregation other than rendering onto the same
>>> screen. Each processor is told what portion it is responsible for via
>>> the UPDATE_EXTENT or UPDATE_PIECE/UPDATE_NUMBER_OF_PIECES keys and are
>>> supposed to only produce what it is asked for. (See
>>> http://paraview.org/Wiki/Writing_ParaView_Readers for more of the
>>> story.)
>>>
>>> Filters that need cross communication to work properly (beyond what
>>> they can get from ghost cells) do so by accessing the
>>> vtkMultiProcessController that connects all of the nodes in the server
>>> (or sometimes via MPI directly but that isn't recommended).
>>>
>>> Try the following for two means of getting a hold of the interprocess
>>> links.
>>> import paraview.vtk.parallel
>>> #print(dir(paraview.vtk.parallel))
>>> #print(dir(paraview.vtk.parallel.vtkMultiProcessController))
>>> controller =
>>> paraview.vtk.parallel.vtkMultiProcessController.GetGlobalController()
>>> print controller.GetLocalProcessId()
>>> print controller.GetNumberOfProcesses()
>>>
>>> from mpi4py import MPI
>>> #print(dir(MPI))
>>> #print(help(MPI))
>>> print MPI.COMM_WORLD.Get_rank()
>>> print MPI.COMM_WORLD.Get_size()
>>>
>>> Note also that there is a "feature" in the python programmable filter
>>> that comes into play with structured data. That feature says that
>>> structured data is not split at all by default. If you want structured
>>> data to actually be parallel you need to put this code in your python
>>> programmable filter.
>>>
>>> from paraview import util
>>>
>>> self.GetExecutive().SetExtentTranslator(self.GetExecutive().GetOutputInformation(0),
>>> vtk.vtkExtentTranslator())
>>>
>>>
>>> David E DeMarle
>>> Kitware, Inc.
>>> R&D Engineer
>>> 28 Corporate Drive
>>> Clifton Park, NY 12065-8662
>>> Phone: 518-371-3971 x109
>>>
>>>
>>>
>>> On Wed, Aug 3, 2011 at 11:09 AM, Tim Gallagher<tim.gallagher at gatech.edu>
>>>  wrote:
>>>>
>>>> I guess I sort of answered my own question -- the entire script runs on
>>>> each processor, so I ended up with 8 copies of my data in memory (or I would
>>>> have, had I not filled the 12 GB of RAM and 20 GB of swap space and my
>>>> system crashed).
>>>>
>>>> So is there some way to query the processor information? Probably
>>>> something in the RequestInformation script -- find out how many processors
>>>> there are and then the prog. filter determines based on processor ID and
>>>> number of processors what section of the data to load.
>>>>
>>>> In that case, how does the aggregation of the data work? The exact
>>>> pipeline is:
>>>>
>>>> DataObjectGenerator("MB{}")
>>>> ProgrammableFilter
>>>>
>>>> in serial, the PF appends blocks into the input and passes that through
>>>> to the output. In parallel, that same pipeline would create a MB{} on each
>>>> CPU that gets filled with that CPU's data, but at the end of this step I
>>>> would want a single MB{} object, not NCPU MB{}'s.
>>>>
>>>> Hopefully that makes sense... I've never used PV in parallel, so I'm not
>>>> sure how it all works.
>>>>
>>>> Tim
>>>>
>>>> ----- Original Message -----
>>>> From: "Tim Gallagher"<tim.gallagher at gatech.edu>
>>>> To: "ParaView list"<paraview at paraview.org>
>>>> Sent: Wednesday, August 3, 2011 9:24:25 AM
>>>> Subject: [Paraview] Programmable filter in parallel
>>>>
>>>> Hi,
>>>>
>>>> I know many of the built-in readers/filters already work in parallel,
>>>> but how does one write a parallel programmable filter?
>>>>
>>>> Our data files are XDMF and split into blocks of data. We have a single
>>>> XDMF file that we can read that reads all the blocks and generates a
>>>> vtkMultiBlockDataset (this works with the built in XDMF reader).
>>>>
>>>> However, each block has some ghost cells around it that are needed to do
>>>> the CellDataToPointData interpolation. For large numbers of blocks, this
>>>> creates far too many grid points for our machines to load. So, I've written
>>>> a programmable filter that does:
>>>>
>>>> start with empty vtkMultiBlockDataset
>>>> for each block in restart file
>>>>   read block file with XDMFReader
>>>>   CellDataToPointData
>>>>   strip off the extra layers of cells
>>>>   append to output vtkMultiBlockDataset
>>>>
>>>> If I run this in parallel, what exactly is parallel? Is the reading and
>>>> CD2PD done in parallel on each block? Is none of it parallel? Ideally, I
>>>> would have the loop over blocks done in parallel, but I don't know how to
>>>> indicate that in the programmable filter (if it's possible).
>>>>
>>>> Any advice would be great,
>>>>
>>>> Tim
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>> http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ParaView Wiki at:
>>>> http://paraview.org/Wiki/ParaView
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at:
>>> http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.paraview.org/mailman/listinfo/paraview
>>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ParaView Wiki at:
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>