[Paraview] paraview - client-server
pat marion
pat.marion at kitware.com
Fri Apr 30 10:03:34 EDT 2010
Hey Burlen, on the bug report page for 10283, I think you need to fix the
command line you are testing with :
$ ssh remote cmd1 && cm2
will execute cmd1 on remote and cmd2 locally. It should be:
$ ssh remote "cmd1 && cmd2"
Pat
On Fri, Apr 30, 2010 at 9:12 AM, pat marion <pat.marion at kitware.com> wrote:
> I have applied your patch. I agree that paraview should explicity close
> the child process. But... what I am pointing out is that calling
> QProcess::close() does not help in this situation. What I am saying is
> that, even when paraview does kill the process, any commands run by ssh on
> the other side of the netpipe will be orphaned by sshd. Are you sure you
> can't reproduce it?
>
>
> $ ssh localhost sleep 1d
> $ < press control-c >
> $ pidof sleep
> $ # sleep is still running
>
> Pat
>
>
> On Fri, Apr 30, 2010 at 2:08 AM, burlen <burlen.loring at gmail.com> wrote:
>
>> Hi Pat,
>>
>> From my point of view the issue is philosophical, because practically
>> speaking I couldn't reproduce the orphans with out doing something a little
>> odd namely, ssh ... && sleep 1d. Although the fact that a user reported
>> suggests that it may occur in the real world as well. The question is this:
>> should an application explicitly clean up resources it allocates? or should
>> an application rely on the user not only knowing that there is the potential
>> for a resource leak but also knowing enough to do the right thing to avoid
>> it (eg ssh -tt ...)? In my opinion, as a matter of principle, if PV spawns a
>> process it should explicitly clean it up and there should be no way it can
>> become an orphan. In this case the fact that the orphan can hold ports open
>> is particularly insidious, because further connection attempt on that port
>> fails with no helpful error information. Also it is not very difficult to
>> clean up a spawned process. What it comes down to is a little book keeping
>> to hang on to the qprocess handle and a few lines of code called from
>> pqCommandServerStartup destructor to make certain it's cleaned up. This is
>> from the patch I submitted when I filed the bug report.
>>
>> + // close running process
>> + if (this->Process->state()==QProcess::Running)
>> + {
>> + this->Process->close();
>> + }
>> + // free the object
>> + delete this->Process;
>> + this->Process=NULL;
>>
>> I think if the cluster admins out there new which ssh options
>> (GatewayPorts etc) are important for ParView to work seamlessly, then they
>> might be willing to open them up. It's my impression that the folks that
>> build clusters want tools like PV to be easy to use, but they don't
>> necessarily know all the in's and out's of confinguring and running PV.
>>
>> Thanks for looking at this again! The -tt option to ssh is indeed a good
>> find.
>>
>> Burlen
>>
>> pat marion wrote:
>>
>>> Hi all!
>>>
>>> I'm bringing this thread back- I have learned a couple new things...
>>>
>>> -----------------------
>>> No more orphans:
>>>
>>> Here is an easy way to create an orphan:
>>>
>>> $ ssh localhost sleep 1d
>>> $ <press control c>
>>>
>>> The ssh process is cleaned up, but sshd orphans the sleep process. You
>>> can avoid this by adding '-t' to ssh:
>>>
>>> $ ssh -t localhost sleep 1d
>>>
>>> Works like a charm! But then there is another problem... try this
>>> command from paraview (using QProcess) and it still leaves an orphan, doh!
>>> Go back and re-read ssh's man page and you have the solution, use '-t'
>>> twice: ssh -tt
>>>
>>> -------------------------
>>> GatewayPorts and portfwd workaround:
>>>
>>> In this scenario we have 3 machines: workstation, service-node, and
>>> compute-node. I want to ssh from workstation to service-node and submit a
>>> job that will run pvserver on compute-node. When pvserver starts on
>>> compute-node I want it to reverse connect to service-node and I want
>>> service-node to forward the connection to workstation. So here I go:
>>>
>>> $ ssh -R11111:localhost:11111 service-node qsub start_pvserver.sh
>>>
>>> Oops, the qsub command returns immediately and closes my ssh tunnel.
>>> Let's pretend that the scheduler doesn't provide an easy way to keep the
>>> command alive, so I have resorted to using 'sleep 1d'. So here I go, using
>>> -tt to prevent orphans:
>>>
>>> $ ssh -tt -R11111:localhost:11111 service-node "qsub start_pvserver.sh
>>> && sleep 1d"
>>>
>>> Well, this will only work if GatewayPorts is enabled in sshd_config on
>>> service-node. If GatewayPorts is not enabled, the ssh tunnel will only
>>> accept connections from localhost, it will not accept a connection from
>>> compute-node. We can ask the sysadmin to enable GatewayPorts, or we could
>>> use portfwd. You can run portfwd on service-node to forward port 22222 to
>>> port 11111, then have compute-node connect to service-node:22222. So your
>>> job script would launch pvserver like this:
>>>
>>> pvserver -rc -ch=service-node -sp=22222
>>>
>>> Problem solved! Also convenient, we can use portfwd to replace 'sleep
>>> 1d'. So the final command, executed by paraview client:
>>>
>>> ssh -tt -R 11111:localhost:11111 service-node "qsub start_pvserver.sh &&
>>> portfwd -g -c fwd.cfg"
>>>
>>> Where fwd.cfg contains:
>>>
>>> tcp { 22222 { => localhost:11111 } }
>>>
>>>
>>> Hope this helps!
>>>
>>> Pat
>>>
>>> On Fri, Feb 12, 2010 at 7:06 PM, burlen <burlen.loring at gmail.com<mailto:
>>> burlen.loring at gmail.com>> wrote:
>>>
>>>
>>> Incidentally, this brings up an interesting point about
>>> ParaView with client/server. It doesn't try to clean up it's
>>> child processes, AFAIK. For example, if you set up this ssh
>>> tunnel inside the ParaView GUI (e.g., using a command instead
>>> of a manual connection), and you cancel the connection, it
>>> will leave the ssh running. You have to track down the ssh
>>> process and kill it yourself. It's minor thing, but it can
>>> also prevent future connections if you don't realize there's a
>>> zombie ssh that kept your ports open.
>>>
>>> I attempted to reproduce on my kubuntu 9.10, qt 4.5.2 system, with
>>> slightly different results, which may be qt/distro/os specific.
>>>
>>> On my system as long as the process ParaView spawns finishes on
>>> its own there is no problem. That's usually how one would expect
>>> things to work out since when the client disconnects the server
>>> closes followed by ssh. But, you are right that PV never
>>> explicitly kills or otherwise cleans up after the process it
>>> starts. So if the spawned process for some reason doesn't finish
>>> orphan processes are introduced.
>>>
>>> I was able to produce orphan ssh processes, giving the PV client a
>>> server start up command that doesn't finish. eg
>>>
>>> ssh ... pvserver ... && sleep 100d
>>>
>>> I get the situation you described which prevents further
>>> connection on the same ports. Once PV tries and fails to connect
>>> on th eopen ports, there is crash soon after.
>>>
>>> I filed a bug report with a patch:
>>> http://www.paraview.org/Bug/view.php?id=10283
>>>
>>>
>>>
>>> Sean Ziegeler wrote:
>>>
>>> Most batch systems have an option to wait until the job is
>>> finished before the submit command returns. I know PBS uses
>>> "-W block=true" and that SGE and LSF have similar options (but
>>> I don't recall the precise flags).
>>>
>>> If your batch system doesn't provide that, I'd recommend
>>> adding some shell scripting to loop through checking the queue
>>> for job completion and not return until it's done. The sleep
>>> thing would work, but wouldn't exit when the server finishes,
>>> leaving the ssh tunnels (and other things like portfwd if you
>>> put them in your scripts) lying around.
>>>
>>> Incidentally, this brings up an interesting point about
>>> ParaView with client/server. It doesn't try to clean up it's
>>> child processes, AFAIK. For example, if you set up this ssh
>>> tunnel inside the ParaView GUI (e.g., using a command instead
>>> of a manual connection), and you cancel the connection, it
>>> will leave the ssh running. You have to track down the ssh
>>> process and kill it yourself. It's minor thing, but it can
>>> also prevent future connections if you don't realize there's a
>>> zombie ssh that kept your ports open.
>>>
>>>
>>> On 02/08/10 21:03, burlen wrote:
>>>
>>> I am curious to hear what Sean has to say.
>>>
>>> But, say the batch system returns right away after the job
>>> is submitted,
>>> I think we can doctor the command so that it will live for
>>> a while
>>> longer, what about something like this:
>>>
>>> ssh -R XXXX:localhost:YYYY remote_machine
>>> "submit_my_job.sh && sleep
>>> 100d"
>>>
>>>
>>> pat marion wrote:
>>>
>>> Hey just checked out the wiki page, nice! One
>>> question, wouldn't this
>>> command hang up and close the tunnel after submitting
>>> the job?
>>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>>> Pat
>>>
>>> On Mon, Feb 8, 2010 at 8:12 PM, pat marion
>>> <pat.marion at kitware.com <mailto:pat.marion at kitware.com>
>>> <mailto:pat.marion at kitware.com
>>>
>>> <mailto:pat.marion at kitware.com>>> wrote:
>>>
>>> Actually I didn't write the notes at the hpc.mil
>>> <http://hpc.mil> <http://hpc.mil>
>>>
>>> link.
>>>
>>> Here is something- and maybe this is the problem that
>>> Sean refers
>>> to- in some cases, when I have set up a reverse ssh
>>> tunnel from
>>> login node to workstation (command executed from
>>> workstation) then
>>> the forward does not work when the compute node
>>> connects to the
>>> login node. However, if I have the compute node
>>> connect to the
>>> login node on port 33333, then use portfwd to forward
>>> that to
>>> localhost:11111, where the ssh tunnel is listening on
>>> port 11111,
>>> it works like a charm. The portfwd tricks it into
>>> thinking the
>>> connection is coming from localhost and allow the ssh
>>> tunnel to
>>> work. Hope that made a little sense...
>>>
>>> Pat
>>>
>>>
>>> On Mon, Feb 8, 2010 at 6:29 PM, burlen
>>> <burlen.loring at gmail.com <mailto:burlen.loring at gmail.com>
>>> <mailto:burlen.loring at gmail.com
>>> <mailto:burlen.loring at gmail.com>>> wrote:
>>>
>>> Nice, thanks for the clarification. I am guessing that
>>> your
>>> example should probably be the recommended approach rather
>>> than the portfwd method suggested on the PV wiki. :) I
>>> took
>>> the initiative to add it to the Wiki. KW let me know
>>> if this
>>> is not the case!
>>>
>>>
>>> http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel
>>>
>>>
>>>
>>> Would you mind taking a look to be sure I didn't miss
>>> anything
>>> or bollix it up?
>>>
>>> The sshd config options you mentioned may be why your
>>> method
>>> doesn't work on the Pleiades system, either that or
>>> there is a
>>> firewall between the front ends and compute nodes. In
>>> either
>>> case I doubt the NAS sys admins are going to
>>> reconfigure for
>>> me :) So at least for now I'm stuck with the two hop ssh
>>> tunnels and interactive batch jobs. if there were
>>> someway to
>>> script the ssh tunnel in my batch script I would be
>>> golden...
>>>
>>> By the way I put the details of the two hop ssh tunnel
>>> on the
>>> wiki as well, and a link to Pat's hpc.mil
>>> <http://hpc.mil> <http://hpc.mil>
>>>
>>> notes. I don't dare try to summarize them since I've never
>>> used portfwd and it refuses to compile both on my
>>> workstation
>>> and the cluster.
>>>
>>> Hopefully putting these notes on the Wiki will save future
>>> ParaView users some time and headaches.
>>>
>>>
>>> Sean Ziegeler wrote:
>>>
>>> Not quite- the pvsc calls ssh with both the tunnel options
>>> and the commands to submit the batch job. You don't even
>>> need a pvsc; it just makes the interface fancier. As long
>>> as you or PV executes something like this from your
>>> machine:
>>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>>>
>>> This means that port XXXX on remote_machine will be the
>>> port to which the server must connect. Port YYYY (e.g.,
>>> 11111) on your client machine is the one on which PV
>>> listens. You'd have to tell the server (in the batch
>>> submission script, for example) the name of the node and
>>> port XXXX to which to connect.
>>>
>>> One caveat that might be causing you problems, port
>>> forwarding (and "gateway ports" if the server is running
>>> on a different node than the login node) must be enabled
>>> in the remote_machine's sshd_config. If not, no ssh
>>> tunnels will work at all (see: man ssh and man
>>> sshd_config). That's something that an administrator
>>> would need to set up for you.
>>>
>>> On 02/08/10 12:26, burlen wrote:
>>>
>>> So to be sure about what you're saying: Your .pvsc
>>> script ssh's to the
>>> front end and submits a batch job which when it's
>>> scheduled , your batch
>>> script creates a -R style tunnel and starts pvserver
>>> using PV reverse
>>> connection. ? or are you using portfwd or a second ssh
>>> session to
>>> establish the tunnel ?
>>>
>>> If you're doing this all from your .pvsc script
>>> without a second ssh
>>> session and/or portfwd that's awesome! I haven't been
>>> able to script
>>> this, something about the batch system prevents the
>>> tunnel created
>>> within the batch job's ssh session from working. I
>>> don't know if that's
>>> particular to this system or a general fact of life
>>> about batch systems.
>>>
>>> Question: How are you creating the tunnel in your
>>> batch script?
>>>
>>> Sean Ziegeler wrote:
>>>
>>> Both ways will work for me in most cases, i.e. a
>>> "forward" connection
>>> with ssh -L or a reverse connection with ssh -R.
>>>
>>> However, I find that the reverse method is more
>>> scriptable. You can
>>> set up a .pvsc file that the client can load and
>>> will call ssh with
>>> the appropriate options and commands for the
>>> remote host, all from the
>>> GUI. The client will simply wait for the reverse
>>> connection from the
>>> server, whether it takes 5 seconds or 5 hours for
>>> the server to get
>>> through the batch queue.
>>>
>>> Using the forward connection method, if the server
>>> isn't started soon
>>> enough, the client will attempt to connect and
>>> then fail. I've always
>>> had to log in separately, wait for the server to
>>> start running, then
>>> tell my client to connect.
>>>
>>> -Sean
>>>
>>> On 02/06/10 12:58, burlen wrote:
>>>
>>> Hi Pat,
>>>
>>> My bad. I was looking at the PV wiki, and
>>> thought you were talking about
>>> doing this without an ssh tunnel and using
>>> only port forward and
>>> paraview's --reverse-connection option . Now
>>> that I am reading your
>>> hpc.mil <http://hpc.mil> <http://hpc.mil> post I see
>>>
>>> what you
>>> mean :)
>>>
>>> Burlen
>>>
>>>
>>> pat marion wrote:
>>>
>>> Maybe I'm misunderstanding what you mean
>>> by local firewall, but
>>> usually as long as you can ssh from your
>>> workstation to the login node
>>> you can use a reverse ssh tunnel.
>>>
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com <http://www.kitware.com>
>>> <http://www.kitware.com>
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the
>>> ParaView Wiki at:
>>> http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20100430/de1cdc72/attachment-0001.htm>
More information about the ParaView
mailing list