Sayebackup.sh – deduplicating backups with rsync
(Image: Daniel Mitchell)
Due to popular request, I’m putting up a polished version of the backup script that we’ve been using over the years at Lanedo to backup our systems remotely. This script uses a special feature of rsync(1) v2.6.4 for the creation of backups which share storage space with previous backups by hard-linking files.
The various options needed for rsync and ssh to minimize transfer bandwidth over the Internet, time-stamping for the backups and handling of several rsync oddities warranted encapsulation of the logic into a dedicated script.
This example shows creation of two consecutive backups and displays the sizes.
$ sayebackup.sh -i ~/.ssh/id_examplecom email@example.com:mydir # create backup as bak-.../mydir
$ sayebackup.sh -i ~/.ssh/id_examplecom firstname.lastname@example.org:mydir # create second bak-2012...-snap/
$ ls -l # show all the backups that have been created
drwxrwxr-x 3 user group 4096 Dez 1 03:16 bak-2012-12-01-03:16:50-snap
drwxrwxr-x 3 user group 4096 Dez 1 03:17 bak-2012-12-01-03:17:12-snap
lrwxrwxrwx 1 user group 28 Dez 1 03:17 bak-current -> bak-2012-12-01-03:17:12-snap
$ du -sh bak-* # the second backup is smaller due to hard links
Usage: sayebackup.sh [options] sources...
--inc make reverse incremental backup
--dry run and show rsync with --dry-run option
--help print usage summary
-C <dir> backup directory (default: '.')
-E <exclfile> file with rsync exclude list
-l <account> ssh user name to use (see ssh(1) -l)
-i <identity> ssh identity key file to use (see ssh(1) -i)
-P <sshport> ssh port to use on the remote system
-L <linkdest> hardlink dest files from <linkdest>/
-o <prefix> output directory name (default: 'bak')
-q, --quiet suppress progress information
-c perform checksum based file content comparisons
-x disable crossing of filesystem boundaries
--version script and rsync versions
This script creates full or reverse incremental backups using the
rsync(1) command. Backup directory names contain the date and time
of each backup run to allow sorting and selective pruning.
At the end of each successful backup run, a symlink '*-current' is
updated to always point at the latest backup. To reduce remote file
transfers, the '-L' option can be used (possibly multiple times) to
specify existing local file trees from which files will be
hard-linked into the backup.
Upon each invocation, a new backup directory is created that contains
all files of the source system. Hard links are created to files of
previous backups where possible, so extra storage space is only required
for contents that changed between backups.
In incremental mode, the most recent backup is always a full backup,
while the previous full backup is degraded to a reverse incremental
backup, which only contains differences between the current and the
RSYNC_BINARY Environment variable used to override the rsync binary path.
Testbit Tools – Version 11.09 Release