[ITK-dev] [ITK] New Highly Parallel Build System, the POWER8
Chuck Atkins
chuck.atkins at kitware.com
Fri Apr 24 09:58:54 EDT 2015
>
> Thanks for running and posting those performance numbers. It sadly seems
> like 1:1 is most frequently the most efficient use of CPU cycles.
>
Often, yes, but not always. We used this same system for benchmarking a
new multi-threaded iso contour algorithm using vtkSMPTools backed by Intel
TBB that Will Schroeder is publishing a paper on. We used this system and
a 2 x 18-core + HT Intel machine. Interestingly, when measuring parallel
efficiency (speedup vs number of cores), the POWER8 measured the same
efficiency (actually slightly better) at 2:1 as the Intel machine did at
1:1.
> It's interesting to see how this architectures scales with a large number
> of processes, while each core is designed for 8 lighter weight threads is
> seems.
>
Even if the threads are "heavy" you can still get good gains from it since
it's more about the details of what the threads are doing. The SMT
(symmetric multi-threading) hardware basically lets all parts of a core be
utilised simultaneously by multiple threads, i.e. 1 thread can be using the
FPU while another thread can at the same time be using the ALU. Having a
bunch of threads all trying to use the same resource on a core is where SMT
breaks down, which is why it's often a detriment in HPC. However, if
you're memory bound instead of compute bound, then while some threads are
waiting for memory access, other threads who already have thier data in
cache can churn away on the compute cycles. It's all a balancing game.
I was hoping to run similar performance test on lhcp-rh6 with 80 virtual
> cores 4 sockets each with 10 cores + hyper-theading. Unfortunately I need
> to use ninja more as my timing results appear to be from cached
> compilations and not actually running the compiler.
>
Is it Ninja going this or ccache? I often find shared builds machines are
configured with ccache by default by symlinking it in place of /usr/bin/gcc
and /usr/bin/g++. You can likely bypass it by locating the full path to
the actual compilers and passing those in to CMake and avoid ccache
entirely.
Also, I moved my NUMA distribution script into my personal git repo if
anybody feels like tinkering with it:
https://github.com/chuckatkins/miscelaneous-scripts/blob/master/scripts/spread_numa.sh
- Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/insight-developers/attachments/20150424/f9843ee0/attachment.html>
More information about the Insight-developers
mailing list