Speedbuilding LLVM/Clang in 2 minutes on ARM

Frederic Cambus May 12, 2021 [LLVM] [Compilers] [Toolchains]

This post is the AArch64 counterpart of my "Speedbuilding LLVM/Clang in 5 minutes" article.

After publishing and sharing the previous post URL with some friends on IRC, I was asked if I wanted to try doing the same on a 160 cores ARM machine. Finding out what my answer was is left as an exercise to the reader :-)

The system I'm using for this experiment is a BM.Standard.A1.160 bare-metal machine from Oracle Cloud, which has a dual-socket motherboard with two 80 cores Ampere Altra CPUs, for a total 160 cores, and 1024 GB of RAM. This is to the best of my knowledge the fastest AArch64 server machine available at this time.

The system is running Oracle Linux Server 8.3 with up-to-date packages and kernel.

The full result of cat /proc/cpuinfo is available here.

uname -a
Linux benchmarks 5.4.17-2102.201.3.el8uek.aarch64 #2 SMP Fri Apr 23 09:42:46 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux

Let's start by installing required packages:

dnf in clang git lld

Unfortunately the CMake version available in the packages repository (3.11.4) is too old to build the main branch of the LLVM Git repository, and Ninja is not available either.

Let's bootstrap Pkgsrc to build and install them:

git clone https://github.com/NetBSD/pkgsrc.git
cd pkgsrc/bootstrap
./bootstrap --make-jobs=160 --unprivileged

===> bootstrap started: Wed May 12 12:23:34 GMT 2021
===> bootstrap ended:   Wed May 12 12:26:08 GMT 2021

We then need to add ~pkg/bin and ~pkg/sbin to the path:

export PATH=$PATH:$HOME/pkg/bin:$HOME/pkg/sbin

For faster Pkgsrc builds, we can edit ~/pkg/etc/mk.conf and add:

MAKE_JOBS=              160

Let's build and install CMake and Ninja:

cd ~/pkgsrc/devel/cmake
bmake install package clean clean-depends

cd ~/pkgsrc/devel/ninja-build
bmake install package clean clean-depends

The compiler used for the builds is Clang 10.0.1:

clang --version
clang version 10.0.1 (Red Hat 10.0.1-1.0.1.module+el8.3.0+7827+89335dbf)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /bin

Regarding linkers, we are using GNU ld and GNU Gold from binutils 2.30, and LLD 10.0.1.

GNU ld version 2.30-79.0.1.el8
GNU gold (version 2.30-79.0.1.el8) 1.15
LLD 10.0.1 (compatible with GNU linkers)

For all the following runs, I'm building from the Git repository main branch commit cf4610d27bbb5c3a744374440e2fdf77caa12040. The build directory is of course fully erased between each run.

commit cf4610d27bbb5c3a744374440e2fdf77caa12040
Author: Victor Huang <wei.huang@ibm.com>
Date:   Wed May 12 10:56:54 2021 -0500

I'm not sure what the underlying storage is, but with 1 TB of RAM there is no reason not to use a ramdisk.

mkdir /mnt/ramdisk
mount -t tmpfs -o size=32g tmpfs /mnt/ramdisk
cd /mnt/ramdisk

To get a baseline, let's do a full release build on this machine:

cd llvm-project
mkdir build
cd build

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        ../llvm

time make -j160
real    7m3.226s
user    403m28.362s
sys     6m41.331s

By default, CMake generates Makefiles. As documented in the "Getting Started with the LLVM System" tutorial, most LLVM developers use Ninja.

Let's switch to generating Ninja build files, and using ninja to build:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -GNinja ../llvm

time ninja
[4182/4182] Linking CXX executable bin/c-index-test

real    4m20.403s
user    427m27.118s
sys     7m2.320s

htop

By default, GNU ld is used for linking. Let's switch to using gold:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=gold \
        -GNinja ../llvm

time ninja
[4182/4182] Linking CXX executable bin/c-index-test

real    4m1.062s
user    427m1.648s
sys     6m58.282s

LLD has been a viable option for some years now. Let's use it:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -GNinja ../llvm

time ninja
[4182/4182] Linking CXX executable bin/clang-scan-deps

real    3m58.476s
user    428m3.807s
sys     7m14.418s

Using GNU gold instead of GNU ld results in noticeably faster builds, and switching to LLD shaves a few mores seconds from the build.

If we want to build faster, we can make some compromises and start stripping the build by removing some components.

Let's start by disabling additional architecture support:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="AArch64" \
        -GNinja ../llvm

time ninja
[3195/3195] Linking CXX executable bin/c-index-test

real    3m10.312s
user    326m54.898s
sys     5m24.770s

We can verify the resulting Clang binary only supports AArch64 targets:

bin/clang --print-targets
  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)

Let's go further and disable the static analyzer and the ARC Migration Tool:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="AArch64" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -GNinja ../llvm

time ninja
[3146/3146] Creating library symlink lib/libclang-cpp.so

real    3m6.474s
user    319m25.914s
sys     5m20.924s

Let's disable building some LLVM tools and utils:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="AArch64" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -DLLVM_BUILD_TOOLS=OFF \
        -DLLVM_BUILD_UTILS=OFF \
        -GNinja ../llvm

time ninja
[2879/2879] Creating library symlink lib/libclang-cpp.so

real    2m59.659s
user    298m47.482s
sys     4m57.430s

Compared to the previous build, the following binaries were not built: FileCheck, count, lli-child-target, llvm-jitlink-executor, llvm-PerfectShuffle, not, obj2yaml, yaml2obj, and yaml-bench.

We are reaching the end of our journey here. At this point, we are done stripping out things.

Let's disable optimizations and do a last run:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="AArch64" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -DLLVM_BUILD_TOOLS=OFF \
        -DLLVM_BUILD_UTILS=OFF \
        -DCMAKE_CXX_FLAGS_RELEASE="-O0" \
        -GNinja ../llvm

time ninja
[2879/2879] Linking CXX executable bin/c-index-test

real    2m37.003s
user    231m53.133s
sys     4m56.675s

So this is it, this machine can build a full LLVM/Clang release build in a bit less than four minutes, and a stripped down build with optimizations disabled in two minutes. Two minutes. This is absolutely mind-blowing... The future is now!

Back to top