[Paraview] Issues with PVSB 5.2 and OSMesa support

Tue Feb 7 16:03:50 EST 2017

Hi Michel,

> Indeed, I built PVSB 5.2 with the intel 2016.2.181 and intelmpi 5.1.3.181
> compilers, then ran the resulting pvserver on Haswell CPU nodes (Intel
> E5-2680v3) which supports AVX2 instructions.  So this fits exactly the
> known issue you mentioned in your email.
>
Yep, that'll do it.  The problem is due to a bug in the Intel compiler
performing over-agressive vectorized code generation.  I'm not sure if it's
fixed in >= 17 or not but I definitely know it's broken in <= 16.x.
GALLIUM_DRIVER=SWR is going to give you the best performance in this
situation anyways though and is the recommended osmesa driver on x86_64
CPUs.

Exporting the GALIIUM_DRIVER env variable to swr then leads to
> an interesting behavior. With the swr driver, the good news is that I can
> connect my pvserver built in release mode without crashing.
>
Great!

> For the recollection, the llvmpipe driver compiled in release mode crashes
> during the client/server connection, whereas the llvmpipe driver compiled
> in debug mode works fine.
>
This lines up with the issue being bad vectorization since the compiler
won't be doing m,ost of those optimizations in a debug build.

However, our PBS scheduling killed quickly my interactive job because the
> virtual memory was exhausted, which was puzzling. Increasing the number
> of cores requested for my job and keeping some of them idle allowed me to
> increase the available memory at the cost of wasted cpu resources.
>
I suspect the problem is is a massive oversubscription of threads by swr.
The default behavior of swr is to use all available CPU cores on the node.
However, when running multiple MPI processes per node, they have no way of
knowing about each other.  So if you've got 24 cores per node and run 24
pvservers, you'll end up with 24^2 = 576 rendering threads on a nodes; not
so great.  You can control this with the KNOB_MAX_WORKER_THREADS
environment variable.  Typically you'll want to set it to the inverse of
processes per node your job is running.  So if yor node has 24 cores and
you run 24 processes per node, then set KNOB_MAX_WORKER_THREADS to 1, but
if you're running 4 processes per node, then set it to 6; you get the
idea.  That should address the virtual memory problem.  It's a balance
since typically rendering will perform better with fewer ppn and mroe
threads per process, but the filters, like Contour, parallelize at the MPI
level and work better with a higher ppn.  You'll need to find the right
balance for your use case depending on whether it's render-heaver or
pipeline pricessing heavy.

> Would you also know if this known issue with the llvmpipe driver will be
> fixed of PV 5.3 (agreeing on the fact that the swr driver should be faster
> on intel CPU provided that it does not exhaust the memory consumption).
>

It's actually an Intel compiler bug and not a ParaView (or even Mesa for
that matter) issue, so probably not.  It may be fixed in future releases of
icc but I wouldn't know withotu testng it.

- Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/paraview/attachments/20170207/3f4b6dc2/attachment.html>