Backup Systems using ZFS

ZFS is much touted for its stupendous performance and resiliency, but few utilise it well in doing backups. The secret to ZFS for backups is in its snapshot system and the ability to send the differences between snapshots to a remote system.

The process is really very simple

  • Create a Snapshot 1
  • Send the entire snapshot 1 to the remote backup server
  • time passes
  • Create a snapshot 2
  • Send the differences between snapshot 1 and snapshot 2 to the remote server

The remote server now has an up to date version of the local filesystem, including all of the snapshot data, which might be needed for rollback.

How is this different from using rsync? There are a couple of standout points.

  • Rsync needs to search the entire filesystem to look for changed files. This puts a strain on the server if there are many directories, subdirectories and files. ZFS does not need to ‘search’ for the changes – it already ‘knows’ as part of how the filesystem works.
  • To update a file which has changed, rsync needs to read the file and calculate sliding window checksums to identify parts of the file to update. This is much more efficient than sending the entire contents of a large file, but still requires resources. Again, ZFS does not have to search for changed parts of files, it just “knows”.
  • Rsync, when inserting parts into a changed file, must rebuild a new copy of the entire file at the destination. For large files this is much more efficient than sending the whole file, but still uses disk resources. ZFS, on the other hand, just sends the changed blocks and adjusts the pointers in the filesystem to use the new blocks as part of the changed file.

Because ZFS is more efficient than rsync, particularly with its feature of not needing to search for changes, a ZFS setup allows more frequent updates to the backup system, eg hourly or every 10 minutes rather than using rsync daily or every 4 hours.

Still to come… The process of setting up remote ZFS backups

admin: