Profile-guided optimization (PGO) is a well known compiler optimization technique. In PGO, runtime profiles from a program’s executions are used by the compiler to make optimal choices about inlining and code layout. This leads to improved performance and reduced code size.
PGO can be deployed to your application or library with the following steps: 1. Identify a representative workload. 2. Collect profiles. 3. Use the profiles in a Release build.
Step 1: Identify a Representative Workload
First, identify a representative benchmark or workload for your application. This is a critical step as the profiles collected from the workload identify the hot and cold regions in the code. When using the profiles, the compiler will perform aggressive optimizations and inlining in the hot regions. The compiler may also choose to reduce the code size of cold regions while trading off performance.
Identifying a good workload is also beneficial to keep track of performance in general.
Step 2: Collect Profiles
Profile collection involves three steps: - building native code with instrumentation, - running the instrumented app on the device and generating profiles, and - merging/post-processing the profiles on the host.
Create Instrumented Build
The profiles are collected by running the workload from step 1 on an
instrumented build of the application. To generate an instrumented build, add
-fprofile-generate
to the compiler and linker flags. This flag should be
controlled by a separate build variable since the flag is not needed during a
default build.
Generate Profiles
Next, run the instrumented app on the device and generate profiles.
Profiles are collected in memory when the instrumented binary is run and get
written to a file at exit. However, functions registered with atexit
are not
called in an Android app — the app just gets killed.
The application/workload has to do extra work to set a path for the profile file and then explicitly trigger a profile write.
- To set the profile file path, call
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw
.%m
is useful when there are multiple shared libraries.%m
expands to a unique module signature for that library, resulting in a separate profile per library. See here for other useful pattern specifiers.PROFILE_DIR
is a directory that is writable from the app. See the demo for detecting this directory at runtime. - To explicitly trigger a profile write, call the
__llvm_profile_write_file
function.
extern "C" {
extern int __llvm_profile_set_filename(const char*);
extern int __llvm_profile_write_file(void);
}
#define PROFILE_DIR "<location-writable-from-app>"
void workload() {
// ...
// run workload
// ...
// set path and write profiles after workload execution
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw");
__llvm_profile_write_file();
return;
}
NB: Generating the profile file is simpler if the workload is a standalone binary —
just set the LLVM_PROFILE_FILE
environment variable to %t/default-%m.profraw
before running the binary.
Post-process Profiles
The profile files are in the .profraw format. They must first be fetched from
the device using adb pull
. After fetch, use the llvm-profdata
utility in
the NDK to convert from .profraw
to .profdata
, which can then be passed to the
compiler.
$NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-profdata \
merge --output=pgo_profile.profdata \
<list-of-profraw-files>
Use the llvm-profdata
and clang
from the same NDK release to avoid version
mismatch of the profile file formats.
Step 3 Use the Profiles to Build Application
Use the profile from the previous step during a release build of your
application by passing -fprofile-use=<>.profdata
to the compiler and linker. The
profiles can be used even as the code evolves — the Clang compiler can tolerate
slight mismatch between the source and the profiles.
NB: In general, for most libraries, the profiles are common across architectures. For e.g., profiles generated from arm64 build of the library can be used for all architectures. The caveat being that if there are architecture-specific code paths in the library (arm vs x86 or 32-bit vs 64-bit), separate profiles should be used for each such configuration.
Putting it all together
https://github.com/DanAlbert/ndk-samples/tree/pgo/pgo shows an end-to-end demo for using PGO from an app. It provides additional details that were skimmed over in this doc.
- The CMake build rules show how to setup a CMake variable that builds native code with instrumentation. When the build variable is not set, native code is optimized using previously generated PGO profiles.
- In an instrumented build, pgodemo.cpp writes the profiles are workload execution.
- A writable location for the profiles is obtained at runtime in
MainActivity.kt
using
applicationContext.cacheDir.toString()
. - To pull profiles from the device without requiring
adb root
, use theadb
recipe here.