Secure high-performance backup

Introduction

Designing a backup system is a complex task requiring multiple decisions aiming for an equilibrium between multiple factors. Such factors are among others safety, performance, ease of use and price. It’s often recommended to combine multiple installations with different equilibria between these factors to guarantee recovery in case a disaster happens. For example, combining one installation that is safe and complex to use, with one that is less safe but easy to use. Here is presented a backup installation tailored for a specific usage. It’s designed with a specific equilibrium between these factors in mind.

Our backup installation is:

As safe as possible without being completely offline. With ransomware on the rise, one solution is to keep a human-operated offline backup. However appropriate, this solution isn’t viable. As middle ground, our solution employs an independent backup server that pulls the data (instead of pushing, see below) automatically (i.e. without human intervention) using Rsync.
Incremental and space efficient. We implemented incremental backup using file system snapshots. Snapshots avoid to duplicate any data and are a native feature of the Btrfs file system employed here.
Fast recovery for large backup. Our solution is usable from gigabytes to dozens of terabytes. Using Btrfs, access to any incremental backup is equivalent in performance and allows for fast recovery.
Longstanding technologies. Our solution combines the file system Btrfs integrated in the Linux kernel and Rsync using the Bash script btrfs-backup.

Safer backup by pulling data

Pushing data is the most common strategy for backuping. For example, by sending files to a cloud-based storage or using a software, like Restic, to do so. However, these solutions suffer from a major flaw. In case the server pushing the data (main in the schema below) becomes compromised, chances to propagate the problem to the backup server are high. Alternatively, the data can be pulled from the main server onto the backup server. In this configuration, a compromised main server can’t directly compromise the backup server operating system. If the main server becomes compromised, compromised data might be pulled by the backup server but older incremental copies will remain available for recovery.

Pull vs push

In this solution, the backup software (in green in the schema) must be installed and running on the backup server:

the backup server must be a server with an OS and can’t be static cloud storage,
the backup server must have access to the main server to get the data. To mitigate the risk involved by giving full access to the main server to the backup server:
- a backup-specific user is created with no special rights
- a copy the rsync program running on the main server, owned by the backup user, is given read-only access to the full main server data using the Linux capabilities.

Setup overview

Install

On the backup server:

Installed OS. Preferably Debian installed following these instructions. Instructions in this guide can be applied to any modern Linux distribution.
Btrfs volume
NoteIn the setup described here, the Brtfs volumes are not encrypted based on the expectation that the physical security of the backup server is equivalent to that of the main server (which is also unencrypted).
The backup software btrfs-backup

On the main server:

Installed OS. Preferably Arch Linux installed following these instructions. To use another Linux distribution, the rsync-readcap package needs to be adapted to that distribution. Contributions are welcome.
Rsync copy with CAP_DAC_READ_SEARCH capability installed using rsync-readcap package from the AUR.

Installation

Backup server

Prepare the Btrfs volume.

Install btrfs-progs:
```
apt-get install btrfs-progs
```

Create a unique partition on each drive occupying 100% of the available space (in this example two drives are used: /dev/sda and /dev/sdb):

parted /dev/sda mklabel gpt
parted /dev/sda mkpart primary 1MiB 100% print
parted /dev/sdb mklabel gpt
parted /dev/sdb mkpart primary 1MiB 100% print

Format the brtfs volume named backup combining the two hard drives.
```
mkfs.btrfs -L backup --nodesize 32k --data single --metadata raid1 /dev/sda1 /dev/sdb1
```
NoteIn this setup, data has no redundancy (metadata has RAID1); consequently drives of different sizes can be used together but if one fails, the backup will be lost. Higher nodesize of 32k is selected to decrease fragmentation (see mkfs.btrfs manual).
Create the backup directory /plus/backup:
```
mkdir /plus
mkdir /plus/backup
```

Add mounting point in /etc/fstab (doc):

LABEL=backup              /plus/backup           btrfs        defaults

Mount using mount -a
Install Rsync
```
apt-get install rsync
```
Install btrfs-backup
1. Download the latest release
2. Create /root/backup directory
3. From the downloaded archive, copy the btrfs-backup-X.X.X/backup Bash script to /root/backup

Main server

Build the rsync-readcap package and install it manually:
```
pacman -U rsync-readcap-0.1-1-any.pkg.tar.zst
```
Reinstall (or install) Rsync. That will create a copy of the rsync executable as /var/lib/rsync-readcap/rsync with cap_dac_read_search+ep capabilities (using a pacman hook):
```
pacman -S rsync
```

To confirm a copy of the rsync executable has been created with adequate capabilities:

$ getcap /var/lib/rsync-readcap/rsync
/var/lib/rsync-readcap/rsync cap_dac_read_search=ep

Configuration

Setup SSH key for login from the backup to the main server:
- On backup server, create key as root:
```
ssh-keygen -t ed25519
```
- On main server (the rsyncr user is created by the rsync-readcap package)
  1. Create .ssh directory in the rsyncr user home directory:
```
mkdir /var/lib/rsync-readcap/.ssh
chown rsyncr: /var/lib/rsync-readcap/.ssh
chmod 700 /var/lib/rsync-readcap/.ssh
```
  2. Copy the Ed25519 public key from /root/.ssh/id_ed25519.pub (backup) to new /var/lib/rsync-readcap/.ssh/authorized_keys (main)
  3. Change permissions of authorized_keys:
```
chown rsyncr: /var/lib/rsync-readcap/.ssh/authorized_keys
chmod 600 /var/lib/rsync-readcap/.ssh/authorized_keys
```
On the backup server, connect manually to the main server:
```
ssh rsyncr@[main_server_ip_or_name]
```
Write configuration file for btrfs-backup using these examples in /root/backup/
Test the backup configuration manually using:
```
/root/backup/backup snap -c config.conf
```
Optional. Install systemd timer to run backup script periodically. Copy backup_main.service and backup_main.timer to /etc/systemd/system (main is an example, you can replace it by your backup name). Start and enable timer:
```
systemctl enable backup_main.timer
systemctl start backup_main.timer
```

Maintenance

Failing Disk Reporter (FDR)

Since a reliable backup depends on healthy hard drives, consider installing Failing Disk Reporter (FDR). FDR monitors the hard-drives (using SMART) and send notifications when a drive is failing. FDR can report issues to Matrix or Slack.

Install smartmontools (using apt-get install smartmontools) then follow these instructions.

Failing drive

The Btrfs volume is setup here without redundancy of the data. The RAID 5 and 6 implementations in Btrfs are not yet considered stable. In consequence, if one drive fails, the Btrfs volume is broken and the backup is lost.

If one drive is failing/fails:

Replace the failing drive
Following the Backup server section, Prepare (step 1) and Format (step 2) the Brtfs volume
Reboot. At the next scheduled backup, a fresh complete snaphot will be created.

Add physical space

To add a drive to the backup Btrfs volume (in this example /dev/sdc):

Create a unique partition on the drive occupying 100% of the available space

parted /dev/sdc mklabel gpt
parted /dev/sdc mkpart primary 1MiB 100% print

Add the partition to the Btrfs volume using btrfs-device:
```
btrfs device add -f /dev/sdc1 /plus/backup
```
Use -f option if the drive isn’t empty to force overwriting what was on the hard drive.
Finally, to balance the data between the added and existing drives using btrfs-balance:
```
btrfs balance start --bg /plus/backup
```
To display how balanced between drives are your data:
```
btrfs device usage /plus/backup
```