[ITK-dev] [ITK] New Highly Parallel Build System, the POWER8
Chuck Atkins
chuck.atkins at kitware.com
Thu Apr 23 18:21:55 EDT 2015
In case anybody's interested, here's the "spread_numa.sh" script I use to
evenly distribute across NUMA domains and bind to CPU cores:
----------BEGIN spread_numa.sh----------
#!/bin/bash
# Evenly spread a command across numa domains for a given number of CPU
cores
function spread()
{
NUM_CORES=$1
shift
# Use this wicked awk script to parse the numactl hardware layout and
# select an equal number of cores from each NUMA domain, evenly spaced
# across each domain
SPREAD="$(numactl -H | sed -n 's|.*cpus: \(.*\)|\1|p' | awk -v
NC=${NUM_CORES} -v ND=${NUMA_DOMAINS} 'BEGIN{CPD=NC/ND} {S=NF/CPD;
for(C=0;C<CPD;C++){F0=C*S; F1=(F0==int(F0)?F0:int(F0)+1)+1; printf("%d",
$F1); if(!(NR==ND && C==CPD-1)){printf(",")} } }')"
echo Executing: numactl --physcpubind=${SPREAD} "$@"
numactl --physcpubind=${SPREAD} "$@"
}
# Check command arguments
if [ $# -lt 2 ]
then
echo "Usage: $0 [NUM_CORES_TO_USE] [cmd [arg1] ... [argn]]"
exit 1
fi
# Determine the total number of CPU cores
MAX_CORES=$(numactl -s | sed -n 's|physcpubind: \(.*\)|\1|p' | wc -w)
# Determine the total number of NUMA domains
NUMA_DOMAINS=$(numactl -H | sed -n 's|available: \([0-9]*\).*|\1|p')
# Verify the number of cores is sane
NUM_CORES=$1
shift
if [ $NUM_CORES -gt $MAX_CORES ]
then
echo "WARNING: $NUM_CORES cores is out of bounds. Setting to $MAX_CORES
cores."
NUM_CORES=$MAX_CORES
fi
if [ $((NUM_CORES%NUMA_DOMAINS)) -ne 0 ]
then
TMP=$(( ((NUM_CORES/NUMA_DOMAINS) + 1) * NUMA_DOMAINS ))
echo "WARNING: $NUM_CORES core(s) are not evenly divided across
$NUMA_DOMAINS NUMA domains. Setting to $TMP."
NUM_CORES=$TMP
fi
echo "Using ${NUM_CORES}/${MAX_CORES} cores across ${NUMA_DOMAINS} NUMA
domains"
spread ${NUM_CORES} "$@"
----------END spread_numa.sh----------
- Chuck
On Thu, Apr 23, 2015 at 4:57 PM, Chuck Atkins <chuck.atkins at kitware.com>
wrote:
> (re-sent for the rest of the dev list)
> Hi Bradley,
>
> It's pretty fast. The interesting numbers are for 20, 40, 80, and 160.
> That aligns with 1:1, 2:1, 4:1, and 8:1 threads to core ratio. Starting
> from the already configured ITKLinuxPOWER8 currently being built, I did a
> ninja clean and then "time ninja -jN". Watching the cpu load for 20, 40,
> and 80 cores though, I see a fair amount of both process migration and
> unbalanced thread distribution, i.e. for -j20 I'll often see 2 cores with 6
> or 8 threads and the rest with only 1 or 2. So in addition to the -jN
> settings, I also ran 20, 40, and 80 threads using numactl with fixed
> binding to physical CPU cores to evenly distribute the threads across cores
> and prevent thread migration. See timings below in seconds:
>
> ThreadsRealUserSysTotal CPU Time201037.09719866.685429.79620296.481*(Numa
> Bind) 20**915.910**16290.589**319.017**16609.606*40713.77226953.663556.960
> 27510.623(Numa Bind) 40641.92422442.685432.37922875.06480588.35740970.439
> 822.94441793.383*(Numa Bind) 80**538.801**35366.297**637.922**36004.219*
> 160572.49262542.9011289.86463832.765(Numa Bind) 160549.74261864.666
> 1242.97563107.641
>
>
> So it seems like core binding gives us an approximate 10% performance
> increase for all thread configurations. And while clearly the core-locked
> 4:1 gave us the best time, looking at the total CPU time (user+sys) the 1:1
> looks to be the most efficient for actual cycles used.
>
> It's interesting to watch how the whole system gets used up for most of
> the build but everything gets periodically gated on a handful of linker
> processes. And of course, it's always cool to see a screen cap of htop
> with a whole boat load of cores at 100%
>
>
> - Chuck
>
> On Thu, Apr 23, 2015 at 10:01 AM, Bradley Lowekamp <blowekamp at mail.nih.gov
> > wrote:
>
>> Matt,
>>
>> I'd love to explore the build performance of this system.
>>
>> Any chance you could run clean builds of ITK on this system with
>> 20,40,60,80,100,120,140 and 160 processes and record the timings?
>>
>> I am very curious how this unique systems scales with multiple heavy
>> weight processes, as it's design appears to be uniquely suitable to lighter
>> weight multi-threading.
>>
>> Thanks,
>> Brad
>>
>> On Apr 22, 2015, at 11:51 PM, Matt McCormick <matt.mccormick at kitware.com>
>> wrote:
>>
>> > Hi folks,
>> >
>> > With thanks to Chuck Atkins and FSF France, we have a new build on the
>> > dashboard [1] for the IBM POWER8 [2] system. This is a PowerPC64
>> > system with 20 cores and 8 threads per core -- a great system where we
>> > can test and improve ITK parallel computing performance!
>> >
>> >
>> > To generate a test build on Gerrit, add
>> >
>> > request build: power8
>> >
>> > in a review's comments.
>> >
>> >
>> > There are currently some build warnings and test failures that should
>> > be addressed before we will be able to use the system effectively. Any
>> > help here is appreciated.
>> >
>> > Thanks,
>> > Matt
>> >
>> >
>> > [1]
>> https://open.cdash.org/index.php?project=Insight&date=2015-04-22&filtercount=1&showfilters=1&field1=site/string&compare1=63&value1=gcc112
>> >
>> > [2] https://en.wikipedia.org/wiki/POWER8
>> > _______________________________________________
>> > Powered by www.kitware.com
>> >
>> > Visit other Kitware open-source projects at
>> > http://www.kitware.com/opensource/opensource.html
>> >
>> > Kitware offers ITK Training Courses, for more information visit:
>> > http://kitware.com/products/protraining.php
>> >
>> > Please keep messages on-topic and check the ITK FAQ at:
>> > http://www.itk.org/Wiki/ITK_FAQ
>> >
>> > Follow this link to subscribe/unsubscribe:
>> > http://public.kitware.com/mailman/listinfo/insight-developers
>> > _______________________________________________
>> > Community mailing list
>> > Community at itk.org
>> > http://public.kitware.com/mailman/listinfo/community
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/insight-developers/attachments/20150423/349db3ca/attachment.html>
More information about the Insight-developers
mailing list