[Paraview] Non-blocking coprocessing

Tue Oct 25 17:02:03 EDT 2016

Hi Andy,

I'll take a look at the in transit approach. I'm essentially extracting a slice every N steps and collecting it as a single slice on a single processor each time, accumulating the slices as I go. Then every M*N steps, I want that single processor to do an expensive operation and save the output.

So if the in transit approach you mentioned would work well for that, I'll give it a shot. I'm doing this on some SGI and Cray machines, I don't know if that has special ways to do this like you mentioned exists at NERSC.

Thanks,

Tim

________________________________
From: Andy Bauer <andy.bauer at kitware.com>
Sent: Tuesday, October 25, 2016 4:43 PM
To: Ufuk Utku Turuncoglu (BE)
Cc: Gallagher, Timothy P; paraview at paraview.org
Subject: Re: [Paraview] Non-blocking coprocessing

Hi Tim,

This may be better to do as an in transit set up. This way the processes would be independent. Through Catalyst I'd worry about all of the processes waiting on the global rank 0 doing work before all of the other Catalyst ranks return control to the simulation. Depending on the system you're on you could do this communication through file IO for something like Cori at NERSC with its burst buffers.

If you want to do down the in transit path, let me know and I can see about digging up some scripts that I had for that.

Best,
Andy

On Tue, Oct 25, 2016 at 3:57 AM, Ufuk Utku Turuncoglu (BE) <u.utku.turuncoglu at be.itu.edu.tr<mailto:u.utku.turuncoglu at be.itu.edu.tr>> wrote:
Hi Tim,

I am not sure about the non-blocking type communication is supported by ParaView, Catalyst or not but i think that assigning an extra core for global reduction is possible. You could use MPI communication for this purpose. So, look at following code of mine for overloaded coprocessorinitializewithpython function. As you can see, it also gets the MPI communicatior and it allows to use a pool of processor (or cores) for co-processing. In my case, it is running smoothly without any problem. I hope it helps.

--ufuk

extern "C" void my_coprocessorinitializewithpython_(int *fcomm, const char* pythonScriptName, const char strarr[][255], int *size) {
  if (pythonScriptName != NULL) {
    if (!g_coprocessor) {
      g_coprocessor = vtkCPProcessor::New();
      MPI_Comm handle = MPI_Comm_f2c(*fcomm);
      vtkMPICommunicatorOpaqueComm *Comm = new vtkMPICommunicatorOpaqueComm(&handle);
      g_coprocessor->Initialize(*Comm);
      vtkSmartPointer<vtkCPPythonScriptPipeline> pipeline = vtkSmartPointer<vtkCPPythonScriptPipeline>::New();
      pipeline->Initialize(pythonScriptName);
      g_coprocessor->AddPipeline(pipeline);
      //pipeline->FastDelete();
    }

    if (!g_coprocessorData) {
      g_coprocessorData = vtkCPDataDescription::New();
      // must be input port for all model components and for all dimensions
      for (int i = 0; i < *size; i++) {
        g_coprocessorData->AddInput(strarr[i]);
        std::cout << "adding input port [" << i << "] = " << strarr[i] << std::endl;

      }
    }
  }
}

On 25/10/16 01:56, Gallagher, Timothy P wrote:

Hello again!

I'm looking at using coprocessing for something that may take awhile to actually compute, so I would like to do it in a non-blocking fashion. Essentially I am going to be extracting data from the simulation into some numpy arrays (so once copied, the original data in the pipeline can change) and then send it to the root processor to do some global operations.

The global operations may take some time (not minutes, but longer than I want my simulation to wait for it to complete). Is there a way to do part of the pipeline in a non-blocking fashion, where the script calls a function that will write out a data file when processing and then returns control to the simulation prior to the function completion? Will I have to do something in native-python, like spawn a new thread to do the function call, or is there a way to do it with how Paraview operates?

On a related note, I may not want to have the root processor of the coprocessing to have any simulation code running on it. If I am running my simulation on N cores, is it possible to have N+1 cores running the coprocessor pipeline where the extra core receives the global data reduction from the N cores and does the crunching? Or am I starting to ask for too much there?

Thanks as always,

Tim

_______________________________________________
Powered by www.kitware.com<http://www.kitware.com>

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView

Search the list archives at: http://markmail.org/search/?q=ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview

_______________________________________________
Powered by www.kitware.com<http://www.kitware.com>

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView

Search the list archives at: http://markmail.org/search/?q=ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/paraview/attachments/20161025/9378ad86/attachment.html>