Secure high-performance backup
Designing a backup system is a complex task requiring multiple decisions aiming for an equilibrium between multiple factors. Such factors are among others safety, performance, ease of use and price. It’s often recommended to combine multiple installations with different equilibria between these factors to guarantee recovery in case a disaster happens. For example, combining one installation that is safe and complex to use, with one that is less safe but easy to use. Here is presented a backup installation tailored for a specific usage. It’s designed with a specific equilibrium between these factors in mind.
Our backup installation is:
- As safe as possible without being completely offline. With ransomware on the rise, one solution is to keep a human-operated offline backup. However appropriate, this solution isn’t viable. As middle ground, our solution employs an independent backup server that pulls the data (instead of pushing, see below) automatically (i.e. without human intervention) using Rsync.
- Incremental and space efficient. We implemented incremental backup using file system snapshots. Snapshots avoid to duplicate any data and are a native feature of the Btrfs file system employed here.
- Fast recovery for large backup. Our solution is usable from gigabytes to dozens of terabytes. Using Btrfs, access to any incremental backup is equivalent in performance and allows for fast recovery.
- Longstanding technologies. Our solution combines the file system Btrfs integrated in the Linux kernel and Rsync using the Bash script btrfs-backup.
Pushing data is the most common strategy for backuping. For example, by sending files to a cloud-based storage or using a software, like Restic, to do so. However, these solutions suffer from a major flaw. In case the server pushing the data (main in the schema below) becomes compromised, chances to propagate the problem to the backup server are high. Alternatively, the data can be pulled from the main server onto the backup server. In this configuration, a compromised main server can’t directly compromise the backup server operating system. If the main server becomes compromised, compromised data might be pulled by the backup server but older incremental copies will remain available for recovery.
In this solution, the backup software (in green in the schema) must be installed and running on the backup server:
- the backup server must be a server with an OS and can’t be static cloud storage,
- the backup server must have access to the main server to get the data. To mitigate the risk involved by giving full access to the main server to the backup server:
- a backup-specific user is created with no special rights
- a copy the rsync program running on the main server, owned by the backup user, is given read-only access to the full main server data using the Linux capabilities.
On the backup server:
- Installed OS. Preferably Debian installed following these instructions. Instructions in this guide can be applied to any modern Linux distribution.
- Btrfs volume
NoteIn the setup described here, the Brtfs volumes are not encrypted based on the expectation that the physical security of the backup server is equivalent to that of the main server (which is also unencrypted).
- The backup software btrfs-backup
On the main server:
- Installed OS. Preferably Arch Linux installed following these instructions. To use another Linux distribution, the rsync-readcap package needs to be adapted to that distribution. Contributions are welcome.
- Rsync copy with
CAP_DAC_READ_SEARCHcapability installed using rsync-readcap package from the AUR.
- Prepare the Btrfs volume.
apt-get install btrfs-progs
- Create a unique partition on each drive occupying 100% of the available space (in this example two drives are used:
parted /dev/sda mklabel gpt parted /dev/sda mkpart primary 1MiB 100% print parted /dev/sdb mklabel gpt parted /dev/sdb mkpart primary 1MiB 100% print
- Format the brtfs volume named
backupcombining the two hard drives.
mkfs.btrfs -L backup --nodesize 32k --data single --metadata raid1 /dev/sda1 /dev/sdb1NoteIn this setup, data has no redundancy (metadata has RAID1); consequently drives of different sizes can be used together but if one fails, the backup will be lost. Higher nodesize of 32k is selected to decrease fragmentation (see mkfs.btrfs manual).
- Create the backup directory
mkdir /plus mkdir /plus/backup
- Add mounting point in
LABEL=backup /plus/backup btrfs defaults
- Mount using
- Install Rsync
apt-get install rsync
- Install btrfs-backup
- Download the latest release
- From the downloaded archive, copy the
btrfs-backup-X.X.X/backupBash script to
- Build the rsync-readcap package and install it manually:
pacman -U rsync-readcap-0.1-1-any.pkg.tar.zst
- Reinstall (or install) Rsync. That will create a copy of the
cap_dac_read_search+epcapabilities (using a pacman hook):
pacman -S rsync
- To confirm a copy of the
rsyncexecutable has been created with adequate capabilities:
$ getcap /var/lib/rsync-readcap/rsync /var/lib/rsync-readcap/rsync cap_dac_read_search=ep
- Setup SSH key for login from the backup to the main server:
- On backup server, create key as root:
ssh-keygen -t ed25519
- On main server (the rsyncr user is created by the rsync-readcap package)
.sshdirectory in the rsyncr user home directory:
mkdir /var/lib/rsync-readcap/.ssh chown rsyncr: /var/lib/rsync-readcap/.ssh chmod 700 /var/lib/rsync-readcap/.ssh
- Copy the Ed25519 public key from
/root/.ssh/id_ed25519.pub(backup) to new
- Change permissions of
chown rsyncr: /var/lib/rsync-readcap/.ssh/authorized_keys chmod 600 /var/lib/rsync-readcap/.ssh/authorized_keys
- On backup server, create key as root:
- On the backup server, connect manually to the main server:
- Write configuration file for btrfs-backup using these examples in
- Test the backup configuration manually using:
/root/backup/backup snap -c config.conf
- Optional. Install systemd timer to run backup script periodically. Copy backup_main.service and backup_main.timer to
/etc/systemd/system(main is an example, you can replace it by your backup name). Start and enable timer:
systemctl enable backup_main.timer systemctl start backup_main.timer
Since a reliable backup depends on healthy hard drives, consider installing Failing Disk Reporter (FDR). FDR monitors the hard-drives (using SMART) and send notifications when a drive is failing. FDR can report issues to Matrix or Slack.
apt-get install smartmontools) then follow these instructions.
The Btrfs volume is setup here without redundancy of the data. The RAID 5 and 6 implementations in Btrfs are not yet considered stable. In consequence, if one drive fails, the Btrfs volume is broken and the backup is lost.
If one drive is failing/fails:
- Replace the failing drive
- Following the Backup server section, Prepare (step 1) and Format (step 2) the Brtfs volume
- Reboot. At the next scheduled backup, a fresh complete snaphot will be created.
To add a drive to the backup Btrfs volume (in this example
- Create a unique partition on the drive occupying 100% of the available space
parted /dev/sdc mklabel gpt parted /dev/sdc mkpart primary 1MiB 100% print
- Add the partition to the Btrfs volume using btrfs-device:
btrfs device add -f /dev/sdc1 /plus/backup
-foption if the drive isn’t empty to force overwriting what was on the hard drive.
- Finally, to balance the data between the added and existing drives using btrfs-balance:
To display how balanced between drives are your data:
btrfs balance start --bg /plus/backup
btrfs device usage /plus/backup