[Paraview] paraview - client-server
pat marion
pat.marion at kitware.com
Fri Apr 30 09:12:25 EDT 2010
I have applied your patch. I agree that paraview should explicity close the
child process. But... what I am pointing out is that calling
QProcess::close() does not help in this situation. What I am saying is
that, even when paraview does kill the process, any commands run by ssh on
the other side of the netpipe will be orphaned by sshd. Are you sure you
can't reproduce it?
$ ssh localhost sleep 1d
$ < press control-c >
$ pidof sleep
$ # sleep is still running
Pat
On Fri, Apr 30, 2010 at 2:08 AM, burlen <burlen.loring at gmail.com> wrote:
> Hi Pat,
>
> From my point of view the issue is philosophical, because practically
> speaking I couldn't reproduce the orphans with out doing something a little
> odd namely, ssh ... && sleep 1d. Although the fact that a user reported
> suggests that it may occur in the real world as well. The question is this:
> should an application explicitly clean up resources it allocates? or should
> an application rely on the user not only knowing that there is the potential
> for a resource leak but also knowing enough to do the right thing to avoid
> it (eg ssh -tt ...)? In my opinion, as a matter of principle, if PV spawns a
> process it should explicitly clean it up and there should be no way it can
> become an orphan. In this case the fact that the orphan can hold ports open
> is particularly insidious, because further connection attempt on that port
> fails with no helpful error information. Also it is not very difficult to
> clean up a spawned process. What it comes down to is a little book keeping
> to hang on to the qprocess handle and a few lines of code called from
> pqCommandServerStartup destructor to make certain it's cleaned up. This is
> from the patch I submitted when I filed the bug report.
>
> + // close running process
> + if (this->Process->state()==QProcess::Running)
> + {
> + this->Process->close();
> + }
> + // free the object
> + delete this->Process;
> + this->Process=NULL;
>
> I think if the cluster admins out there new which ssh options (GatewayPorts
> etc) are important for ParView to work seamlessly, then they might be
> willing to open them up. It's my impression that the folks that build
> clusters want tools like PV to be easy to use, but they don't necessarily
> know all the in's and out's of confinguring and running PV.
>
> Thanks for looking at this again! The -tt option to ssh is indeed a good
> find.
>
> Burlen
>
> pat marion wrote:
>
>> Hi all!
>>
>> I'm bringing this thread back- I have learned a couple new things...
>>
>> -----------------------
>> No more orphans:
>>
>> Here is an easy way to create an orphan:
>>
>> $ ssh localhost sleep 1d
>> $ <press control c>
>>
>> The ssh process is cleaned up, but sshd orphans the sleep process. You
>> can avoid this by adding '-t' to ssh:
>>
>> $ ssh -t localhost sleep 1d
>>
>> Works like a charm! But then there is another problem... try this command
>> from paraview (using QProcess) and it still leaves an orphan, doh! Go back
>> and re-read ssh's man page and you have the solution, use '-t' twice: ssh
>> -tt
>>
>> -------------------------
>> GatewayPorts and portfwd workaround:
>>
>> In this scenario we have 3 machines: workstation, service-node, and
>> compute-node. I want to ssh from workstation to service-node and submit a
>> job that will run pvserver on compute-node. When pvserver starts on
>> compute-node I want it to reverse connect to service-node and I want
>> service-node to forward the connection to workstation. So here I go:
>>
>> $ ssh -R11111:localhost:11111 service-node qsub start_pvserver.sh
>>
>> Oops, the qsub command returns immediately and closes my ssh tunnel.
>> Let's pretend that the scheduler doesn't provide an easy way to keep the
>> command alive, so I have resorted to using 'sleep 1d'. So here I go, using
>> -tt to prevent orphans:
>>
>> $ ssh -tt -R11111:localhost:11111 service-node "qsub start_pvserver.sh &&
>> sleep 1d"
>>
>> Well, this will only work if GatewayPorts is enabled in sshd_config on
>> service-node. If GatewayPorts is not enabled, the ssh tunnel will only
>> accept connections from localhost, it will not accept a connection from
>> compute-node. We can ask the sysadmin to enable GatewayPorts, or we could
>> use portfwd. You can run portfwd on service-node to forward port 22222 to
>> port 11111, then have compute-node connect to service-node:22222. So your
>> job script would launch pvserver like this:
>>
>> pvserver -rc -ch=service-node -sp=22222
>>
>> Problem solved! Also convenient, we can use portfwd to replace 'sleep
>> 1d'. So the final command, executed by paraview client:
>>
>> ssh -tt -R 11111:localhost:11111 service-node "qsub start_pvserver.sh &&
>> portfwd -g -c fwd.cfg"
>>
>> Where fwd.cfg contains:
>>
>> tcp { 22222 { => localhost:11111 } }
>>
>>
>> Hope this helps!
>>
>> Pat
>>
>> On Fri, Feb 12, 2010 at 7:06 PM, burlen <burlen.loring at gmail.com <mailto:
>> burlen.loring at gmail.com>> wrote:
>>
>>
>> Incidentally, this brings up an interesting point about
>> ParaView with client/server. It doesn't try to clean up it's
>> child processes, AFAIK. For example, if you set up this ssh
>> tunnel inside the ParaView GUI (e.g., using a command instead
>> of a manual connection), and you cancel the connection, it
>> will leave the ssh running. You have to track down the ssh
>> process and kill it yourself. It's minor thing, but it can
>> also prevent future connections if you don't realize there's a
>> zombie ssh that kept your ports open.
>>
>> I attempted to reproduce on my kubuntu 9.10, qt 4.5.2 system, with
>> slightly different results, which may be qt/distro/os specific.
>>
>> On my system as long as the process ParaView spawns finishes on
>> its own there is no problem. That's usually how one would expect
>> things to work out since when the client disconnects the server
>> closes followed by ssh. But, you are right that PV never
>> explicitly kills or otherwise cleans up after the process it
>> starts. So if the spawned process for some reason doesn't finish
>> orphan processes are introduced.
>>
>> I was able to produce orphan ssh processes, giving the PV client a
>> server start up command that doesn't finish. eg
>>
>> ssh ... pvserver ... && sleep 100d
>>
>> I get the situation you described which prevents further
>> connection on the same ports. Once PV tries and fails to connect
>> on th eopen ports, there is crash soon after.
>>
>> I filed a bug report with a patch:
>> http://www.paraview.org/Bug/view.php?id=10283
>>
>>
>>
>> Sean Ziegeler wrote:
>>
>> Most batch systems have an option to wait until the job is
>> finished before the submit command returns. I know PBS uses
>> "-W block=true" and that SGE and LSF have similar options (but
>> I don't recall the precise flags).
>>
>> If your batch system doesn't provide that, I'd recommend
>> adding some shell scripting to loop through checking the queue
>> for job completion and not return until it's done. The sleep
>> thing would work, but wouldn't exit when the server finishes,
>> leaving the ssh tunnels (and other things like portfwd if you
>> put them in your scripts) lying around.
>>
>> Incidentally, this brings up an interesting point about
>> ParaView with client/server. It doesn't try to clean up it's
>> child processes, AFAIK. For example, if you set up this ssh
>> tunnel inside the ParaView GUI (e.g., using a command instead
>> of a manual connection), and you cancel the connection, it
>> will leave the ssh running. You have to track down the ssh
>> process and kill it yourself. It's minor thing, but it can
>> also prevent future connections if you don't realize there's a
>> zombie ssh that kept your ports open.
>>
>>
>> On 02/08/10 21:03, burlen wrote:
>>
>> I am curious to hear what Sean has to say.
>>
>> But, say the batch system returns right away after the job
>> is submitted,
>> I think we can doctor the command so that it will live for
>> a while
>> longer, what about something like this:
>>
>> ssh -R XXXX:localhost:YYYY remote_machine
>> "submit_my_job.sh && sleep
>> 100d"
>>
>>
>> pat marion wrote:
>>
>> Hey just checked out the wiki page, nice! One
>> question, wouldn't this
>> command hang up and close the tunnel after submitting
>> the job?
>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>> Pat
>>
>> On Mon, Feb 8, 2010 at 8:12 PM, pat marion
>> <pat.marion at kitware.com <mailto:pat.marion at kitware.com>
>> <mailto:pat.marion at kitware.com
>>
>> <mailto:pat.marion at kitware.com>>> wrote:
>>
>> Actually I didn't write the notes at the hpc.mil
>> <http://hpc.mil> <http://hpc.mil>
>>
>> link.
>>
>> Here is something- and maybe this is the problem that
>> Sean refers
>> to- in some cases, when I have set up a reverse ssh
>> tunnel from
>> login node to workstation (command executed from
>> workstation) then
>> the forward does not work when the compute node
>> connects to the
>> login node. However, if I have the compute node
>> connect to the
>> login node on port 33333, then use portfwd to forward
>> that to
>> localhost:11111, where the ssh tunnel is listening on
>> port 11111,
>> it works like a charm. The portfwd tricks it into
>> thinking the
>> connection is coming from localhost and allow the ssh
>> tunnel to
>> work. Hope that made a little sense...
>>
>> Pat
>>
>>
>> On Mon, Feb 8, 2010 at 6:29 PM, burlen
>> <burlen.loring at gmail.com <mailto:burlen.loring at gmail.com>
>> <mailto:burlen.loring at gmail.com
>> <mailto:burlen.loring at gmail.com>>> wrote:
>>
>> Nice, thanks for the clarification. I am guessing that
>> your
>> example should probably be the recommended approach rather
>> than the portfwd method suggested on the PV wiki. :) I
>> took
>> the initiative to add it to the Wiki. KW let me know
>> if this
>> is not the case!
>>
>>
>> http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel
>>
>>
>>
>> Would you mind taking a look to be sure I didn't miss
>> anything
>> or bollix it up?
>>
>> The sshd config options you mentioned may be why your
>> method
>> doesn't work on the Pleiades system, either that or
>> there is a
>> firewall between the front ends and compute nodes. In
>> either
>> case I doubt the NAS sys admins are going to
>> reconfigure for
>> me :) So at least for now I'm stuck with the two hop ssh
>> tunnels and interactive batch jobs. if there were
>> someway to
>> script the ssh tunnel in my batch script I would be
>> golden...
>>
>> By the way I put the details of the two hop ssh tunnel
>> on the
>> wiki as well, and a link to Pat's hpc.mil
>> <http://hpc.mil> <http://hpc.mil>
>>
>> notes. I don't dare try to summarize them since I've never
>> used portfwd and it refuses to compile both on my
>> workstation
>> and the cluster.
>>
>> Hopefully putting these notes on the Wiki will save future
>> ParaView users some time and headaches.
>>
>>
>> Sean Ziegeler wrote:
>>
>> Not quite- the pvsc calls ssh with both the tunnel options
>> and the commands to submit the batch job. You don't even
>> need a pvsc; it just makes the interface fancier. As long
>> as you or PV executes something like this from your
>> machine:
>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>>
>> This means that port XXXX on remote_machine will be the
>> port to which the server must connect. Port YYYY (e.g.,
>> 11111) on your client machine is the one on which PV
>> listens. You'd have to tell the server (in the batch
>> submission script, for example) the name of the node and
>> port XXXX to which to connect.
>>
>> One caveat that might be causing you problems, port
>> forwarding (and "gateway ports" if the server is running
>> on a different node than the login node) must be enabled
>> in the remote_machine's sshd_config. If not, no ssh
>> tunnels will work at all (see: man ssh and man
>> sshd_config). That's something that an administrator
>> would need to set up for you.
>>
>> On 02/08/10 12:26, burlen wrote:
>>
>> So to be sure about what you're saying: Your .pvsc
>> script ssh's to the
>> front end and submits a batch job which when it's
>> scheduled , your batch
>> script creates a -R style tunnel and starts pvserver
>> using PV reverse
>> connection. ? or are you using portfwd or a second ssh
>> session to
>> establish the tunnel ?
>>
>> If you're doing this all from your .pvsc script
>> without a second ssh
>> session and/or portfwd that's awesome! I haven't been
>> able to script
>> this, something about the batch system prevents the
>> tunnel created
>> within the batch job's ssh session from working. I
>> don't know if that's
>> particular to this system or a general fact of life
>> about batch systems.
>>
>> Question: How are you creating the tunnel in your
>> batch script?
>>
>> Sean Ziegeler wrote:
>>
>> Both ways will work for me in most cases, i.e. a
>> "forward" connection
>> with ssh -L or a reverse connection with ssh -R.
>>
>> However, I find that the reverse method is more
>> scriptable. You can
>> set up a .pvsc file that the client can load and
>> will call ssh with
>> the appropriate options and commands for the
>> remote host, all from the
>> GUI. The client will simply wait for the reverse
>> connection from the
>> server, whether it takes 5 seconds or 5 hours for
>> the server to get
>> through the batch queue.
>>
>> Using the forward connection method, if the server
>> isn't started soon
>> enough, the client will attempt to connect and
>> then fail. I've always
>> had to log in separately, wait for the server to
>> start running, then
>> tell my client to connect.
>>
>> -Sean
>>
>> On 02/06/10 12:58, burlen wrote:
>>
>> Hi Pat,
>>
>> My bad. I was looking at the PV wiki, and
>> thought you were talking about
>> doing this without an ssh tunnel and using
>> only port forward and
>> paraview's --reverse-connection option . Now
>> that I am reading your
>> hpc.mil <http://hpc.mil> <http://hpc.mil> post I see
>>
>> what you
>> mean :)
>>
>> Burlen
>>
>>
>> pat marion wrote:
>>
>> Maybe I'm misunderstanding what you mean
>> by local firewall, but
>> usually as long as you can ssh from your
>> workstation to the login node
>> you can use a reverse ssh tunnel.
>>
>>
>> _______________________________________________
>> Powered by www.kitware.com <http://www.kitware.com>
>> <http://www.kitware.com>
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the
>> ParaView Wiki at:
>> http://paraview.org/Wiki/ParaView
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.paraview.org/mailman/listinfo/paraview
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20100430/d19cba85/attachment-0001.htm>
More information about the ParaView
mailing list