LLVM was imported in the OpenBSD ports tree back in 2008, and happily lived there for a long while before being imported in the source tree at the g2k16 hackathon in 2016. I previously wrote about this in “The state of toolchains in OpenBSD” last year.

As mentioned in my previous article, we do not use upstream build system to build LLVM in the base system, but hand-writen BSD Makefiles. Importing CMake into the base system was not an option, because of the size of the project and the large dependency chain it requires for building. As a drawback, the build is slower than it could be, were we able to take advantage of a more modern build system.

Nowadays, Clang is the default compiler on the amd64, arm64, armv7, i386, macppc, octeon, powerpc64, and riscv64 platforms. It is also available in the sparc64 base system.

But then, why do we still need LLVM in the ports tree? As an aside, for those wondering why we need a compiler in the base system in the first place, Julio Merino wrote about this in his “Compilers in the (BSD) base system” post.

In the OpenBSD base system, we only build LLVM backends for a given architecture, so on amd64 and i386 we build LLVM’s X86 backend. The mapping we do between OpenBSD’s MACHINE_ARCH and LLVM_ARCH values can be found in gnu/usr.bin/clang/Makefile.arch.

Note that we also build the AMDGPU backend on platforms requiring it.

On an amd64 machine, the registered targets for the base compiler are:

$ clang --print-targets
  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64

And the ones for Clang installed from ports are:

$ clang-13 --print-targets
  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    avr        - Atmel AVR Microcontroller
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    hexagon    - Hexagon
    lanai      - Lanai
    mips       - MIPS (32-bit big endian)
    mips64     - MIPS (64-bit big endian)
    mips64el   - MIPS (64-bit little endian)
    mipsel     - MIPS (32-bit little endian)
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc32le    - PowerPC 32 LE
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    r600       - AMD GPUs HD2XXX-HD6XXX
    riscv32    - 32-bit RISC-V
    riscv64    - 64-bit RISC-V
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore

The devel/llvm port is built using CMake and Ninja, resulting in more efficient builds. On top of building all available LLVM backends, we also build:

  • The Clang Static Analyzer and its companion tool scan-build
  • Clang utilities (clang-format and clang-* tools)
  • LLVM utilities (LLVM binary utilities: llvm-ar, llvm-as, llvm-objcopy, llvm-objdump, etc.)
  • Tools to process code coverage data (llvm-profdata and llvm-cov)
  • Various other tools such as llc, lli, llvm-mc, llvm-mca, etc.

So in essence, we try to keep the base system LLVM somewhat minimal, and build additional features and tooling in the port version. This solution has worked well for us so far.

One last thing to note, we only build one version of LLVM in ports, which is kept in sync with the base version, so we do not ship packages for older (or newer) versions of LLVM.