Toolchains adventures - Q4 2021

Frederic Cambus January 03, 2022 [LLVM] [Compilers] [Toolchains]

This is the third post in my toolchains adventures series. Please read the introduction and the Q3 2021 report if you want to get more context about this journey.

The fourth quarter of 2021 started out in the best possible way, as I've been granted commit access to the LLVM project on October 1st.

During the first part of October, I did commit a couple of micro-optimizations to several compiler drivers along with small improvements in various places, as highlighted in the commit list at the end of this post.

At the end of the month, I attended the OpenBSD h2k21 hackathon in Gouveia, Portugal.

During the hackathon, I spent some time doing builds of LLVM from our base system to do measurements and evaluate if it could make sense to build our toolchain with ThinLTO optimizations enabled. While full LTO builds would be out of the question as our developers regularly build snapshots of the base system (and often on laptops), ThinLTO typically achieves a good compromise between optimizations and resources usage.

Unfortunately, my experiment didn't prove conclusive, and I quickly grew tired of waiting hours between each run to check the results. I used LLVM 11.1.0 at the time, and retesting more recently with LLVM 13.0.0 on a 4 CPUs virtual machine with 16GB of RAM gave similar results. Running time make -j4 in /usr/src/gnu/usr.bin/clang after applying modifications to enable building with ThinLTO resulted in a 7.3% increase in build time. Then, using the newly built ThinLTO optimized toolchain, I rebuilt an optimized LLVM again and the build was only 1.1% faster than the previous run.

Those preliminary benchmarks only measuring build time make me think there is little point in enabling ThinLTO alone at this time, and that it should be coupled with PGO (Profile-guided optimization) to be worth considering.

For the record, I used the following diff to rebuild LLVM in base:

Index: gnu/usr.bin/clang/Makefile.inc
===================================================================
RCS file: /cvs/src/gnu/usr.bin/clang/Makefile.inc,v
retrieving revision 1.25
diff -u -p -r1.25 Makefile.inc
--- gnu/usr.bin/clang/Makefile.inc	21 Aug 2021 03:00:02 -0000	1.25
+++ gnu/usr.bin/clang/Makefile.inc	24 Oct 2021 16:18:15 -0000
@@ -46,6 +46,11 @@ CXXFLAGS+=	-fomit-frame-pointer
 NOPIE_FLAGS=	-fPIE
 .endif
 
+# ThinLTO
+.if ${MACHINE_ARCH} == "amd64"
+CXXFLAGS+=	-flto=thin
+.endif
+
 CPPFLAGS+=	-D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS \
 		-D__STDC_FORMAT_MACROS

The next thing I did was checking usage of .gnu.warning.* sections in our C library. For an introduction to these sections, please refer to this article.

At the time of writing, libc functions for which we have .gnu.warning.* sections are:

.gnu.warning.strcpy:
	strcpy() is almost always misused, please use strlcpy()
.gnu.warning.stpcpy:
	stpcpy() is dangerous; do not use it
.gnu.warning.wcscat:
	wcscat() is almost always misused, please use wcslcat()
.gnu.warning.sprintf:
	sprintf() is often misused, please use snprintf()
.gnu.warning.tempnam:
	tempnam() possibly used unsafely; consider using mkstemp()
.gnu.warning.vsprintf:
	vsprintf() is often misused, please use vsnprintf()
.gnu.warning.mktemp:
	mktemp() possibly used unsafely; consider using mkstemp()
.gnu.warning.strcat:
	strcat() is almost always misused, please use strlcat()
.gnu.warning.wcscpy:
	wcscpy() is almost always misused, please use wcslcpy()
.gnu.warning.rand_r:
	rand_r() is not random, it is deterministic.
.gnu.warning.rand:
	rand() may return deterministic values, is that what you want?
.gnu.warning.getwd:
	getwd() possibly used unsafely; consider using getcwd()
.gnu.warning.random:
	random() may return deterministic values, is that what you want?
.gnu.warning.tmpnam:
	tmpnam() possibly used unsafely; consider using mkstemp()

Support for emitting linker warnings when using a symbol for which a .gnu.warning.symbol section exists is implemented in GNU linkers (ld and gold), but currently not in LLVM's LLD linker. Since we switched to LLD as the default linker on most of OpenBSD architectures, those warnings are not emitted anymore for a majority of users.

I thus sent a diff to remove mentions of ld warning messages for the mktemp(3), tmpnam(3), and tempnam(3) manual pages, but it was suggested that we should instead try to get LLD to support this feature instead. After discussing the matter with other developers during h2k21, this is indeed the consensus.

On the last day of the hackathon, I packaged elfcat, which is a neat ELF visualizer generating interactive HTML files from ELF binaries.

In November, I built gwcheck, a small tool to display .gnu.warning.* sections names in ELF objects along with their content, in order to check which other projects used them. So far, aside from OpenBSD, it turned out that FreeBSD, NetBSD, and DragonFly all use these sections in their libc, and that glibc, Newlib, diet libc, uClibc do as well. I then added a comment about my findings in the LLVM bug tracker issue about adding support in LLD to generate linker warnings when encountering them.

Regarding LLVM itself, I added support in llvm-readobj for reading ELF core notes for both OpenBSD and NetBSD. Notes generated in those core files provide additional information about the kernel state and CPU registers. These notes are described in the core(5) manual pages for each of those operating systems. Here is a link to the OpenBSD version, and here is one for the NetBSD counterpart.

I have not much to report in Pkgsrc land for this quarter, the only toolchains related commit I got the chance to make was for updating the mold linker to the 1.0.0 version.

That's all for now. I absolutely would like to continue exploring the topic, but I feel there is only so much I can do on my free time. Maybe I should start considering working in the field full-time?

LLVM commits:

2021-12-20f6ba5c4[llvm-readobj] Check ELFType value first when checking for OpenBSD notes
2021-11-29878ff1f[llvm-readobj] Add support for machine-independent NetBSD ELF core notes
2021-11-2469deb13[clang][scan-build] Use cc/c++ instead of gcc/g++ on FreeBSD
2021-11-026503117[llvm-readobj] Add support for reading OpenBSD ELF core notes
2021-10-306ecd4a4[clang][scan-build] Use uname -s to detect the operating system
2021-10-21b471e25[clang] Support __float128 on DragonFlyBSD
2021-10-219635b29[docs] Fix broken link rendering in the LLVM Coding Standards
2021-10-164d7c7d8[docs] Mention DragonFlyBSD as a supported platform for LLVM
2021-10-15ecef035[Driver][NetBSD] Use Triple reference instead of ToolChain.getTriple()
2021-10-148ecbcd0[Driver][Darwin] Use T reference instead of getToolChain().getTriple()
2021-10-14f7a3214[Driver][WebAssembly] Use ToolChain reference instead of getToolChain()
2021-10-096417260[Driver][OpenBSD] Use ToolChain reference instead of getToolChain()
2021-10-081f90b36[Driver][NetBSD] Use ToolChain reference instead of getToolChain()
2021-10-06f0ffff4[CMake] Fix typo in error message for LLD in bootstrap builds

Back to top