15 Jun 2010 conrad   » (Journeyer)

Speeding up cross-compiling with ccache and distcc on Debian

The conventional way of doing embedded development is to cross-compile everything then copy it onto the target, but working natively allows you to use "normal" tools and workflows. We want to issue commands directly to a shell on the development board or phone prototype, and speed up the compilation step by distributing it to a faster machine such as your workstation. This isn't the usual way to do things, but I like working this way, and here's how to make it work faster.

This article explains how to configure a Debian PC host and a Debian target system so that development done on the target invokes the cross-compiler on the host. The advantage offered by this approach is a speed-up of compile times. Note that this does not speed up other aspects of building, such as source configuration (which can be slow for packages using GNU autotools), linking or installation.

We assume that a full Debian system is available for development on the target: packages can be built natively using gcc and a full toolchain (binutils, ld etc.), and tools such as automake, autoconf, libtool, version control systems etc. are available.

The setup we work with uses Debian on both the host PC and the target. The examples will use a debian-sh4 on the target, with the sh4-linux-gnu-gcc cross compiler installed on the build host. For other target architectures, simply replace all instances of sh4-linux-gnu- with the arch prefix, eg. arm-linux-gnueabi-.

In this article, commands executed natively on the target device will use the prompt target#, and commands executed on the x86 build host will use the prompt host#.

The first step is to ensure you can build software natively on the target. For GCC:

  
target$ gcc hello.c -o hello
and for autotools projects:
  
target$ ./configure target$ make

ccache

Next, install ccache:

  
target# apt-get install ccache

ccache keeps a cache of compiled object files, such that the same compilation does not need to be repeated. This cache exists outside of your source tree, so it persists across invocations of 'make clean'. It compares the pre-processed source files, so that compilation of a source file will happen if it or any of its included headers is changed. The usual way to use ccache is to simply set your C compiler to be "ccache gcc".

  
target$ ccache gcc hello.c -o hello
and for autotools projects:
  
target$ CC="ccache gcc" ./configure target$ make

Debian also sets things up so that if you put /usr/lib/ccache ahead of /usr/bin in your PATH, it will get used for native builds whenever gcc is invoked. That is useful to set up, but not necessary for this setup with distcc.

An aside about compiler naming

Before we move on to cross compiling, it's important to realize that the native compiler is also available with its full architecture prefix:

  
target$ ls -l /usr/bin/sh4-linux-gnu-gcc lrwxrwxrwx 1 root root 7 Mar 17 01:45 /usr/bin/sh4-linux-gnu-gcc -> gcc-4.4

The binary called sh4-linux-gnu-gcc does the same thing on both the host and target: you can simply think of it as a program that takes in a C file and produces an sh4 binary:


                +-------------------+ 
    C source -> | sh4-linux-gnu-gcc | -> sh4 binary
                +-------------------+ 

The distinction between "native" and "cross-" compiling is then just a matter of what machine you are running this compiler program on. If you run sh4-linux-gnu-gcc on an x86 machine, you are cross-compiling, but if you run sh4-linux-gnu-gcc on an sh4 machine then you are just compiling. Of course the compiler binaries are different; the point is that a shell script which calls the compiler by its full name would work without modification on either machine.

distcc

distcc allows you to use a compiler running on a different, faster machine. This involves running a server (distccd) there, and it is far easier to set up than it would seem.

First, ensure that we can cross-compile on the build host:

  
host$ sh4-linux-gnu-gcc hello.c -o hello host$ file hello sh4-linux-gnu-gcc hello.c -o hello host$ file hello hello: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

Next, we install distcc on the build host:

  
host# apt-get install distcc

To activate the server and tell it what clients to allow, edit /etc/default/distcc:

  
STARTDISTCC="true" ALLOWEDNETS="127.0.0.1 10.0.0.0/16"

and restart it:

  
host# /etc/init.d/distcc restart

You can check that it is running:

  
host# netstat -pant | grep distcc tcp 0 0 10.0.0.1:3632 0.0.0.0:* LISTEN 16142/distccd

So that we can ensure that compilation is running on the host, watch this log file in a separate window:

  
host# tail -f /var/log/distccd.log

Then, on the client (ie. the target system) we also install distcc:

  
target# apt-get install distcc

We do not need to modify the distcc configuration on the target as it will not be running the server, so Debian's defaults are fine. However, we do need to set an environment variable to specify which machine[s] to compile on.

  
target$ export DISTCC_HOSTS='host'

You run distcc in a similar manner to ccache, by simply setting your C compiler. Note that we are only distributing compilation, not linking, so we just run the compilation step:

  
target$ distcc sh4-linux-gnu-gcc -c hello.c

This should turn up in the host's distcc logs:

  
host# tail -f /var/log/distccd.log distccd[16390] (dcc_job_summary) client: 10.0.1.103:45983 COMPILE_OK exit:0 sig:0 core:0 ret:0 time:46ms sh4-linux-gnu-gcc hello.c

And back on the target, we have the hello.o file which was generated by the sh4-linux-gnu-gcc cross-compiler on the build host:

  
target$ ls -l *.o total 16 -rw-r--r-- 1 conrad conrad 884 Jun 11 07:28 hello.o target$ file hello.o hello.o: ELF 32-bit LSB relocatable, Renesas SH, version 1 MathCoPro/FPU/MAU Required (SYSV), not stripped

The C file was transferred over the network to the host, where distccd invoked the cross-compiler and then sent the results back to the target. The end result is the same as if sh4-linux-gnu-gcc had been run directly on the target, but we avoided using the slower CPU of the target system.

To fully take advantage of distcc, you can run distccd on multiple build hosts, and specify all their names in the DISTCC_HOSTS environment variable on the target. Then use eg. "make -j 10" to run multiple compiles in parallel, which will each then get farmed out to different build hosts.

Combining ccache and distcc

You can quite simply put these two tools together, by calling:

  
target$ ccache distcc sh4-linux-gnu-gcc -c hello.c
You can quite simply put these two tools together, by setting CCACHE_PREFIX to "distcc" before calling ccache:
  
target$ export CCACHE_PREFIX="distcc" target$ ccache sh4-linux-gnu-gcc -c hello.c
(Thanks to Joel Rosdahl for the correction).

The first time we run this the code is cross-compiled on the build host and sent back to the target, and ccache keeps track of that. The second time we run this, ccache notices that it already has a stored copy of the output hello.o, and decides to use that rather than calling the compiler. (From ccache's point of view, the compiler is "distcc sh4-linux-gnu-gcc").

For autotools project, you can simply do the following before calling ./configure:

  
target$ export CCACHE_PREFIX="distcc" target$ export CC="ccache sh4-linux-gnu-gcc"
After which the ./configure step will write Makefiles which specify to compile with ccache, so the rest of your build (ie. make -j 10) just works as normal without any new settings or any other change to your workflow.

For more discussion of combining distcc with ccache, see the distcc(1) man page.

Summary

By combining both ccache and distcc we can:

  1. avoid redundant compilations, and
  2. distribute required compilations to a faster build host.
The result is faster build times, which speeds up your development cycle and allows you to work more efficiently on the target system itself.

Syndicated 2010-06-15 00:00:00 (Updated 2010-06-17 03:56:52) from Conrad Parker

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!