[Paraview] paraview - client-server

pat marion pat.marion at kitware.com
Fri Apr 30 10:03:34 EDT 2010


Hey Burlen, on the bug report page for 10283, I think you need to fix the
command line you are testing with :

$ ssh remote cmd1 && cm2

will execute cmd1 on remote and cmd2 locally.  It should be:

$ ssh remote "cmd1 && cmd2"

Pat

On Fri, Apr 30, 2010 at 9:12 AM, pat marion <pat.marion at kitware.com> wrote:

> I have applied your patch.  I agree that paraview should explicity close
> the child process.  But... what I am pointing out is that calling
> QProcess::close() does not help in this situation.  What I am saying is
> that, even when paraview does kill the process, any commands run by ssh on
> the other side of the netpipe will be orphaned by sshd.  Are you sure you
> can't reproduce it?
>
>
> $ ssh localhost sleep 1d
> $ < press control-c >
> $ pidof sleep
> $ # sleep is still running
>
> Pat
>
>
> On Fri, Apr 30, 2010 at 2:08 AM, burlen <burlen.loring at gmail.com> wrote:
>
>> Hi Pat,
>>
>> From my point of view the issue is philosophical, because practically
>> speaking I couldn't reproduce the orphans with out doing something a little
>> odd namely, ssh ... &&  sleep 1d. Although the fact that a user reported
>> suggests that it may occur in the real world as well. The question is this:
>> should an application explicitly clean up resources it allocates? or should
>> an application rely on the user not only knowing that there is the potential
>> for a resource leak but also knowing enough to do the right thing to avoid
>> it (eg ssh -tt ...)? In my opinion, as a matter of principle, if PV spawns a
>> process it should explicitly clean it up and there should be no way it can
>> become an orphan. In this case the fact that the orphan can hold ports open
>> is particularly insidious, because further connection attempt on that port
>> fails with no helpful error information. Also it is not very difficult to
>> clean up a spawned process. What it comes down to is a little book keeping
>> to hang on to the qprocess handle and a few lines of code called from
>> pqCommandServerStartup destructor to make certain it's cleaned up. This is
>> from the patch I submitted when I filed the bug report.
>>
>> +    // close running process
>> +    if (this->Process->state()==QProcess::Running)
>> +      {
>> +      this->Process->close();
>> +      }
>> +    // free the object
>> +    delete this->Process;
>> +    this->Process=NULL;
>>
>> I think if the cluster admins out there new which ssh options
>> (GatewayPorts etc) are important for ParView to work seamlessly, then they
>> might be willing to open them up. It's my impression that the folks that
>> build clusters want tools like PV to be easy to use, but they don't
>> necessarily know all the in's and out's of confinguring and running PV.
>>
>> Thanks for looking at this again! The -tt option to ssh is indeed a good
>> find.
>>
>> Burlen
>>
>> pat marion wrote:
>>
>>> Hi all!
>>>
>>> I'm bringing this thread back- I have learned a couple new things...
>>>
>>> -----------------------
>>> No more orphans:
>>>
>>> Here is an easy way to create an orphan:
>>>
>>>   $ ssh localhost sleep 1d
>>>   $ <press control c>
>>>
>>> The ssh process is cleaned up, but sshd orphans the sleep process.  You
>>> can avoid this by adding '-t' to ssh:
>>>
>>>  $ ssh -t localhost sleep 1d
>>>
>>> Works like a charm!  But then there is another problem... try this
>>> command from paraview (using QProcess) and it still leaves an orphan, doh!
>>>  Go back and re-read ssh's man page and you have the solution, use '-t'
>>> twice: ssh -tt
>>>
>>> -------------------------
>>> GatewayPorts and portfwd workaround:
>>>
>>> In this scenario we have 3 machines: workstation, service-node, and
>>> compute-node.  I want to ssh from workstation to service-node and submit a
>>> job that will run pvserver on compute-node.  When pvserver starts on
>>> compute-node I want it to reverse connect to service-node and I want
>>> service-node to forward the connection to workstation.  So here I go:
>>>
>>>   $ ssh -R11111:localhost:11111 service-node qsub start_pvserver.sh
>>>
>>> Oops, the qsub command returns immediately and closes my ssh tunnel.
>>>  Let's pretend that the scheduler doesn't provide an easy way to keep the
>>> command alive, so I have resorted to using 'sleep 1d'.  So here I go, using
>>> -tt to prevent orphans:
>>>
>>>  $ ssh -tt -R11111:localhost:11111 service-node "qsub start_pvserver.sh
>>> && sleep 1d"
>>>
>>> Well, this will only work if GatewayPorts is enabled in sshd_config on
>>> service-node.  If GatewayPorts is not enabled, the ssh tunnel will only
>>> accept connections from localhost, it will not accept a connection from
>>> compute-node.  We can ask the sysadmin to enable GatewayPorts, or we could
>>> use portfwd.  You can run portfwd on service-node to forward port 22222 to
>>> port 11111, then have compute-node connect to service-node:22222.  So your
>>> job script would launch pvserver like this:
>>>
>>>  pvserver -rc -ch=service-node -sp=22222
>>>
>>> Problem solved!  Also convenient, we can use portfwd to replace 'sleep
>>> 1d'.  So the final command, executed by paraview client:
>>>
>>>  ssh -tt -R 11111:localhost:11111 service-node "qsub start_pvserver.sh &&
>>> portfwd -g -c fwd.cfg"
>>>
>>> Where fwd.cfg contains:
>>>
>>>  tcp { 22222 { => localhost:11111 } }
>>>
>>>
>>> Hope this helps!
>>>
>>> Pat
>>>
>>> On Fri, Feb 12, 2010 at 7:06 PM, burlen <burlen.loring at gmail.com<mailto:
>>> burlen.loring at gmail.com>> wrote:
>>>
>>>
>>>        Incidentally, this brings up an interesting point about
>>>        ParaView with client/server.  It doesn't try to clean up it's
>>>        child processes, AFAIK.  For example, if you set up this ssh
>>>        tunnel inside the ParaView GUI (e.g., using a command instead
>>>        of a manual connection), and you cancel the connection, it
>>>        will leave the ssh running.  You have to track down the ssh
>>>        process and kill it yourself.  It's minor thing, but it can
>>>        also prevent future connections if you don't realize there's a
>>>        zombie ssh that kept your ports open.
>>>
>>>    I attempted to reproduce on my kubuntu 9.10, qt 4.5.2 system, with
>>>    slightly different results, which may be qt/distro/os specific.
>>>
>>>    On my system as long as the process ParaView spawns finishes on
>>>    its own there is no problem. That's usually how one would expect
>>>    things to work out since when the client disconnects the server
>>>    closes followed by ssh. But, you are right that PV never
>>>    explicitly kills or otherwise cleans up after the process it
>>>    starts. So if the spawned process for some reason doesn't finish
>>>    orphan processes are introduced.
>>>
>>>    I was able to produce orphan ssh processes, giving the PV client a
>>>    server start up command that doesn't finish. eg
>>>
>>>      ssh ... pvserver ... && sleep 100d
>>>
>>>    I get the situation you described which prevents further
>>>    connection on the same ports. Once PV tries and fails to connect
>>>    on th eopen ports, there is crash soon after.
>>>
>>>    I filed a bug report with a patch:
>>>    http://www.paraview.org/Bug/view.php?id=10283
>>>
>>>
>>>
>>>    Sean Ziegeler wrote:
>>>
>>>        Most batch systems have an option to wait until the job is
>>>        finished before the submit command returns.  I know PBS uses
>>>        "-W block=true" and that SGE and LSF have similar options (but
>>>        I don't recall the precise flags).
>>>
>>>        If your batch system doesn't provide that, I'd recommend
>>>        adding some shell scripting to loop through checking the queue
>>>        for job completion and not return until it's done.  The sleep
>>>        thing would work, but wouldn't exit when the server finishes,
>>>        leaving the ssh tunnels (and other things like portfwd if you
>>>        put them in your scripts) lying around.
>>>
>>>        Incidentally, this brings up an interesting point about
>>>        ParaView with client/server.  It doesn't try to clean up it's
>>>        child processes, AFAIK.  For example, if you set up this ssh
>>>        tunnel inside the ParaView GUI (e.g., using a command instead
>>>        of a manual connection), and you cancel the connection, it
>>>        will leave the ssh running.  You have to track down the ssh
>>>        process and kill it yourself.  It's minor thing, but it can
>>>        also prevent future connections if you don't realize there's a
>>>        zombie ssh that kept your ports open.
>>>
>>>
>>>        On 02/08/10 21:03, burlen wrote:
>>>
>>>            I am curious to hear what Sean has to say.
>>>
>>>            But, say the batch system returns right away after the job
>>>            is submitted,
>>>            I think we can doctor the command so that it will live for
>>>            a while
>>>            longer, what about something like this:
>>>
>>>            ssh -R XXXX:localhost:YYYY remote_machine
>>>            "submit_my_job.sh && sleep
>>>            100d"
>>>
>>>
>>>            pat marion wrote:
>>>
>>>                Hey just checked out the wiki page, nice! One
>>>                question, wouldn't this
>>>                command hang up and close the tunnel after submitting
>>>                the job?
>>>                ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>>>                Pat
>>>
>>>                On Mon, Feb 8, 2010 at 8:12 PM, pat marion
>>>                <pat.marion at kitware.com <mailto:pat.marion at kitware.com>
>>>                <mailto:pat.marion at kitware.com
>>>
>>>                <mailto:pat.marion at kitware.com>>> wrote:
>>>
>>>                Actually I didn't write the notes at the hpc.mil
>>>                <http://hpc.mil> <http://hpc.mil>
>>>
>>>                link.
>>>
>>>                Here is something- and maybe this is the problem that
>>>                Sean refers
>>>                to- in some cases, when I have set up a reverse ssh
>>>                tunnel from
>>>                login node to workstation (command executed from
>>>                workstation) then
>>>                the forward does not work when the compute node
>>>                connects to the
>>>                login node. However, if I have the compute node
>>>                connect to the
>>>                login node on port 33333, then use portfwd to forward
>>>                that to
>>>                localhost:11111, where the ssh tunnel is listening on
>>>                port 11111,
>>>                it works like a charm. The portfwd tricks it into
>>>                thinking the
>>>                connection is coming from localhost and allow the ssh
>>>                tunnel to
>>>                work. Hope that made a little sense...
>>>
>>>                Pat
>>>
>>>
>>>                On Mon, Feb 8, 2010 at 6:29 PM, burlen
>>>                <burlen.loring at gmail.com <mailto:burlen.loring at gmail.com>
>>>                <mailto:burlen.loring at gmail.com
>>>                <mailto:burlen.loring at gmail.com>>> wrote:
>>>
>>>                Nice, thanks for the clarification. I am guessing that
>>>                your
>>>                example should probably be the recommended approach rather
>>>                than the portfwd method suggested on the PV wiki. :) I
>>>                took
>>>                the initiative to add it to the Wiki. KW let me know
>>>                if this
>>>                is not the case!
>>>
>>>
>>> http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel
>>>
>>>
>>>
>>>                Would you mind taking a look to be sure I didn't miss
>>>                anything
>>>                or bollix it up?
>>>
>>>                The sshd config options you mentioned may be why your
>>>                method
>>>                doesn't work on the Pleiades system, either that or
>>>                there is a
>>>                firewall between the front ends and compute nodes. In
>>>                either
>>>                case I doubt the NAS sys admins are going to
>>>                reconfigure for
>>>                me :) So at least for now I'm stuck with the two hop ssh
>>>                tunnels and interactive batch jobs. if there were
>>>                someway to
>>>                script the ssh tunnel in my batch script I would be
>>>                golden...
>>>
>>>                By the way I put the details of the two hop ssh tunnel
>>>                on the
>>>                wiki as well, and a link to Pat's hpc.mil
>>>                <http://hpc.mil> <http://hpc.mil>
>>>
>>>                notes. I don't dare try to summarize them since I've never
>>>                used portfwd and it refuses to compile both on my
>>>                workstation
>>>                and the cluster.
>>>
>>>                Hopefully putting these notes on the Wiki will save future
>>>                ParaView users some time and headaches.
>>>
>>>
>>>                Sean Ziegeler wrote:
>>>
>>>                Not quite- the pvsc calls ssh with both the tunnel options
>>>                and the commands to submit the batch job. You don't even
>>>                need a pvsc; it just makes the interface fancier. As long
>>>                as you or PV executes something like this from your
>>>                machine:
>>>                ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
>>>
>>>                This means that port XXXX on remote_machine will be the
>>>                port to which the server must connect. Port YYYY (e.g.,
>>>                11111) on your client machine is the one on which PV
>>>                listens. You'd have to tell the server (in the batch
>>>                submission script, for example) the name of the node and
>>>                port XXXX to which to connect.
>>>
>>>                One caveat that might be causing you problems, port
>>>                forwarding (and "gateway ports" if the server is running
>>>                on a different node than the login node) must be enabled
>>>                in the remote_machine's sshd_config. If not, no ssh
>>>                tunnels will work at all (see: man ssh and man
>>>                sshd_config). That's something that an administrator
>>>                would need to set up for you.
>>>
>>>                On 02/08/10 12:26, burlen wrote:
>>>
>>>                So to be sure about what you're saying: Your .pvsc
>>>                script ssh's to the
>>>                front end and submits a batch job which when it's
>>>                scheduled , your batch
>>>                script creates a -R style tunnel and starts pvserver
>>>                using PV reverse
>>>                connection. ? or are you using portfwd or a second ssh
>>>                session to
>>>                establish the tunnel ?
>>>
>>>                If you're doing this all from your .pvsc script
>>>                without a second ssh
>>>                session and/or portfwd that's awesome! I haven't been
>>>                able to script
>>>                this, something about the batch system prevents the
>>>                tunnel created
>>>                within the batch job's ssh session from working. I
>>>                don't know if that's
>>>                particular to this system or a general fact of life
>>>                about batch systems.
>>>
>>>                Question: How are you creating the tunnel in your
>>>                batch script?
>>>
>>>                Sean Ziegeler wrote:
>>>
>>>                Both ways will work for me in most cases, i.e. a
>>>                "forward" connection
>>>                with ssh -L or a reverse connection with ssh -R.
>>>
>>>                However, I find that the reverse method is more
>>>                scriptable. You can
>>>                set up a .pvsc file that the client can load and
>>>                will call ssh with
>>>                the appropriate options and commands for the
>>>                remote host, all from the
>>>                GUI. The client will simply wait for the reverse
>>>                connection from the
>>>                server, whether it takes 5 seconds or 5 hours for
>>>                the server to get
>>>                through the batch queue.
>>>
>>>                Using the forward connection method, if the server
>>>                isn't started soon
>>>                enough, the client will attempt to connect and
>>>                then fail. I've always
>>>                had to log in separately, wait for the server to
>>>                start running, then
>>>                tell my client to connect.
>>>
>>>                -Sean
>>>
>>>                On 02/06/10 12:58, burlen wrote:
>>>
>>>                Hi Pat,
>>>
>>>                My bad. I was looking at the PV wiki, and
>>>                thought you were talking about
>>>                doing this without an ssh tunnel and using
>>>                only port forward and
>>>                paraview's --reverse-connection option . Now
>>>                that I am reading your
>>>                hpc.mil <http://hpc.mil> <http://hpc.mil> post I see
>>>
>>>                what you
>>>                mean :)
>>>
>>>                Burlen
>>>
>>>
>>>                pat marion wrote:
>>>
>>>                Maybe I'm misunderstanding what you mean
>>>                by local firewall, but
>>>                usually as long as you can ssh from your
>>>                workstation to the login node
>>>                you can use a reverse ssh tunnel.
>>>
>>>
>>>                _______________________________________________
>>>                Powered by www.kitware.com <http://www.kitware.com>
>>>                <http://www.kitware.com>
>>>
>>>                Visit other Kitware open-source projects at
>>>                http://www.kitware.com/opensource/opensource.html
>>>
>>>                Please keep messages on-topic and check the
>>>                ParaView Wiki at:
>>>                http://paraview.org/Wiki/ParaView
>>>
>>>                Follow this link to subscribe/unsubscribe:
>>>                http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20100430/de1cdc72/attachment-0001.htm>


More information about the ParaView mailing list