Frederic Cambus

Blog · Git · Contact

Speedbuilding LLVM/Clang in 5 minutes

Toolchains

This post is a spiritual successor to my "Building LLVM on OpenBSD/loongson" article, in which I retraced my attempts to build LLVM 3.7.1 on MIPS64 in a RAM constrained environment.

After reading the excellent "Make LLVM fast again", I wanted to revisit the topic, and see how fast I could build a recent version of LLVM and Clang on modern x86 server hardware.

The system I'm using for this experiment is a CCX62 instance from Hetzner, which has 48 dedicated vCPUs and 192 GB of RAM. This is the fastest machine available in their cloud offering at the moment.

The system is running Fedora 34 with up-to-date packages and kernel.

The full result of cat /proc/cpuinfo is available here.

uname -a
Linux benchmarks 5.11.18-300.fc34.x86_64 #1 SMP Mon May 3 15:10:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Let's start by installing required packages:

dnf in clang cmake git lld ninja-build

The compiler used for the builds is Clang 12.0.0:

clang --version
clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Regarding linkers, we are using GNU ld and GNU Gold from binutils 2.35.1, and LLD 12.0.0.

GNU ld version 2.35.1-41.fc34
GNU gold (version 2.35.1-41.fc34) 1.16
LLD 12.0.0 (compatible with GNU linkers)

For all the following runs, I'm building from the Git repository main branch commit 831cf15ca6892e2044447f8dc516d76b8a827f1e. The build directory is of course fully erased between each run.

commit 831cf15ca6892e2044447f8dc516d76b8a827f1e
Author: David Spickett <david.spickett@linaro.org>
Date:   Wed May 5 11:49:35 2021 +0100

To get a baseline, let's do a full release build on this machine:

cd llvm-project
mkdir build
cd build

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        ../llvm

time make -j48
real    11m19.852s
user    436m30.619s
sys     12m5.724s

By default, CMake generates Makefiles. As documented in the "Getting Started with the LLVM System" tutorial, most LLVM developers use Ninja.

Let's switch to generating Ninja build files, and using ninja to build:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -GNinja ../llvm

time ninja
[4182/4182] Generating ../../bin/llvm-readelf

real    10m13.755s
user    452m16.034s
sys     12m7.584s

htop

By default, GNU ld is used for linking. Let's switch to using gold:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=gold \
        -GNinja ../llvm

time ninja
[4182/4182] Generating ../../bin/llvm-readelf

real    10m13.405s
user    451m35.029s
sys     11m57.649s

LLD has been a viable option for some years now. Let's use it:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -GNinja ../llvm

time ninja
[4182/4182] Generating ../../bin/llvm-readelf

real    10m12.710s
user    451m12.444s
sys     12m12.634s

During tests on smaller build machines, I had observed that using GNU gold or LLD instead of GNU ld resulted in noticeably faster builds. This doesn't seem to be the case on this machine. We end up with a slightly faster build by using LLD, but not by a large margin at all.

If we want to build faster, we can make some compromises and start stripping the build by removing some components.

Let's start by disabling additional architecture support:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="X86" \
        -GNinja ../llvm 

time ninja
[3196/3196] Generating ../../bin/llvm-readelf

real    7m55.531s
user    344m56.462s
sys     8m53.970s

We can verify the resulting Clang binary only supports x86 targets:

bin/clang --print-targets
  Registered Targets:
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64

Let's go further and disable the static analyzer and the ARC Migration Tool:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="X86" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -GNinja ../llvm 

time ninja
[3147/3147] Generating ../../bin/llvm-readelf

real    7m42.299s
user    334m47.916s
sys     8m44.704s

Let's disable building some LLVM tools and utils:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="X86" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -DLLVM_BUILD_TOOLS=OFF \
        -DLLVM_BUILD_UTILS=OFF \
        -GNinja ../llvm

time ninja
[2880/2880] Generating ../../bin/llvm-readelf

real    7m21.016s
user    315m42.127s
sys     8m9.377s

Compared to the previous build, the following binaries were not built: FileCheck, count, lli-child-target, llvm-jitlink-executor, llvm-PerfectShuffle, not, obj2yaml, yaml2obj, and yaml-bench.

We are reaching the end of our journey here. At this point, we are done stripping out things.

Let's disable optimizations and do a last run:

cmake   -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS=clang \
        -DLLVM_USE_LINKER=lld \
        -DLLVM_TARGETS_TO_BUILD="X86" \
        -DCLANG_ENABLE_STATIC_ANALYZER=OFF \
        -DCLANG_ENABLE_ARCMT=OFF \
        -DLLVM_BUILD_TOOLS=OFF \
        -DLLVM_BUILD_UTILS=OFF \
        -DCMAKE_CXX_FLAGS_RELEASE="-O0" \
        -GNinja ../llvm 

time ninja
[2880/2880] Linking CXX executable bin/c-index-test

real    5m37.225s
user    253m18.515s
sys     9m2.413s

That's it. Five minutes. Don't try this at home :-)