Speedbuilding LLVM/Clang in 3 minutes on Power10

Frederic Cambus March 28, 2024 [LLVM] [Compilers] [Toolchains]

This post is the Power10 counterpart of my "Speedbuilding LLVM/Clang in 5 minutes" and "Speedbuilding LLVM/Clang in 2 minutes on ARM" articles.

The system I'm using for this experiment is an IBM POWER10 9043-MRX (E1050) server with a total of 24 cores and 192 threads, and 2 TB of RAM.

The system is running AlmaLinux 9.3 with up-to-date packages and kernel.

The full result of cat /proc/cpuinfo is available here.

uname -a
Linux benchmarks 5.14.0-284.11.1.el9_2.ppc64le #1 SMP Tue May 9 09:51:51 UTC 2023 ppc64le ppc64le ppc64le GNU/Linux

The compiler used for the builds is Clang 16.0.6:

clang --version
clang version 16.0.6 (Red Hat 16.0.6-1.el9)
Target: ppc64le-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Regarding linkers, we are using GNU ld and GNU Gold from binutils 2.35, LLD 16.0.6, and mold 2.30.0.

GNU ld version 2.35.2-42.el9_3.1
GNU gold (version 2.35.2-42.el9_3.1) 1.16
LLD 16.0.6 (compatible with GNU linkers)
mold 2.30.0 (compatible with GNU ld)

For all the following runs, I'm building from the Git repository main branch commit d7975c9d93fb4a69c0bd79d7d5b3f6be77a25c73. The build directory is of course fully erased between each run.

commit d7975c9d93fb4a69c0bd79d7d5b3f6be77a25c73
Author: Alexey Bataev <a.bataev@outlook.com>
Date:   Thu Mar 28 10:35:15 2024 -0400

To get a baseline, let's do a full release build on this machine:

cd llvm-project
mkdir build
cd build

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        ../llvm

time make -j192
real    6m24.963s
user    558m22.641s
sys     6m36.038s

By default, CMake generates Makefiles. As documented in the "Getting Started with the LLVM System" tutorial, most LLVM developers use Ninja.

Let's switch to generating Ninja build files, and using ninja to build:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -GNinja ../llvm

time ninja
[4996/4996] Linking CXX executable bin/c-index-test

real    4m18.966s
user    646m50.702s
sys     7m4.562s

htop

By default, GNU ld is used for linking. Let's switch to using gold:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=gold \
        -GNinja ../llvm

time ninja
[4996/4996] Linking CXX executable bin/c-index-test

real    4m16.043s
user    644m52.475s
sys     6m22.136s

LLD has been a viable option for some years now. Let's use it:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -GNinja ../llvm

time ninja
[4996/4996] Linking CXX executable bin/c-index-test

real    4m6.797s
user    644m10.316s
sys     7m19.764s

Since I wrote the previous posts of the series, Mold has reached maturity and gained PowerPC support. Let's try it:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=mold \
        -GNinja ../llvm

time ninja
[4996/4996] Linking CXX executable bin/c-index-test

real    4m4.206s
user    642m24.880s
sys     6m23.151s

Using GNU gold instead of GNU ld results in slightly faster builds, and switching to LLD and then Mold shaves a few more seconds from the build. For the remaining of the article, I will stick to using LLD as linker.

If we want to build faster, we can make some compromises and start stripping the build by removing some components.

Let's start by disabling additional architecture support:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="PowerPC" \
        -GNinja ../llvm

time ninja
[3787/3787] Linking CXX executable bin/c-index-test

real    3m17.436s
user    476m8.062s
sys     4m57.820s

We can verify the resulting Clang binary only supports PowerPC targets:

bin/clang --print-targets

  Registered Targets:
    ppc32   - PowerPC 32
    ppc32le - PowerPC 32 LE
    ppc64   - PowerPC 64
    ppc64le - PowerPC 64 LE

Let's go further and disable the static analyzer and the ARC Migration Tool:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="PowerPC" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -GNinja ../llvm

time ninja
[3717/3717] Linking CXX executable bin/c-index-test

real    3m16.444s
user    462m24.103s
sys     4m48.255s

Let's disable building some LLVM tools and utils:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="PowerPC" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -DLLVM_BUILD_TOOLS=OFF \
        -DLLVM_BUILD_UTILS=OFF \
        -GNinja ../llvm

time ninja
[3324/3324] Linking CXX executable bin/c-index-test

real    3m11.458s
user    429m11.170s
sys     4m15.618s

We are reaching the end of our journey here. At this point, we are done stripping out things.

Contrary to the previous builds done in 2021 on X86 and ARM, disabling optimizations by building with the "-O0" flag results in consistently slower build times on this server.

So this is it, this machine can build a full LLVM/Clang release build in a bit more than four minutes, and a stripped down build in three minutes.

Back to top