We want to allow our users to backup the data they have in their Persistence. USB sticks are fragile and small, they break and get lost. We also had bugs in Tails that led to the (apparent) loss of this data (#10976 and broken upgrades).

We benchmarked tons of existing backup solutions in Linux and the situation is pretty poor.

On top of that, the data in Persistence contains both files owned by the amnesia users but also data that is owned by root (eg. the Wi-Fi passwords in nm-system-connections or the configuration itself in persistence.conf). This mix of Unix users and permissions make it harder to find a solution that work out of the box. See also the Full goal below.

So, as a start, let's focus on this user scenario and solve it as good and cheaply as possible:

S1: As a Tails user, I want to create another Tails with all my data so that I can recover from a lost or broken Tails.

Other user scenarios are possible for backups but I propose to ignore them for now. For example:

  • S2: My house burns down and I loose all my devices.
  • S3: I want to cross a border with no data on me.
  • S4: I want to recover file that I had some months ago.

As the bare minimum for S1, I need a backup that is:

  • Fresh. If backups are tedious and slow to make and update, people will only update them infrequently and won't have fresh backups when they need it. We should make it as easy as possible to update backups.

    Deja Dup is interesting in this regard: as a user, I am notified to update my backups regularly and backups start automatically and in the background as soon as I plug my backup device.

  • Offline. Online backups can be useful in different scenarios (S2 and S3) but they are also slower to restore and more complicated to design as they depend on the kind of cloud or online storage solution the user has access to.

Ideally, the backup for S1 should also be:

  • Fast and simple. As a user, I might be panicking because I think that seem to have lost all my data and any extra hoop that I will have to jump through can add to that panic.

  • Full. The data stored in Persistence but that is not owned by the amnesia user might not be super critical to loose (eg. Wi-Fi passwords) but providing a full backup that deals with all the data, independently of its Unix user, goes along with the goal of being Fast and simple.

Clone Persistence in Tails Installer

Backup scenarios have 3 experiences:

  • Creation
  • Update
  • Recovery

Recovery

In the case of S1, being able to use Tails again with all my data after my USB stick breaks.

The easiest possible recovery scenario for S1 is to already have all my data in another Tails, my "backup Tails". If I loose my Tails, I can start using my backup Tails right away.

Better UX

To prevent people from confusing their current Tails with their backup Tails, the backup Tails could be aware that it is a backup Tails and display some warning when first started. Otherwise, changes made on the backup Tails would be overwritten when the backup is updated.

Implementation note:

  • This could be a flag stored in the backup Persistence and that triggers a warning when opened in Tails Greeter.

Creation

The creation experience could be to clone my Tails using Tails Installer and choose to also clone the Persistence.

If the backup USB stick is new, Tails is installed and the Persistence is copied. If the backup USB stick already has Tails installed, both the Tails system on it and the backup of my Persistence are updated.

Before the backup Persistence is being created, the user is prompted for a passphrase to create the LUKS volume of the backup Persistence.

Implementation note:

  • It might be possible to reuse the same LUKS header to create the backup Persistence without prompting for a passphrase.

    At first glance:

    • To create the backup Persistence, cryptsetup luksHeaderBackup/luksHeaderRestore should work.
    • To unlock the backup Persistence, sudo dmsetup table --showkeys TailsData_unlocked should dump the master key from memory and cryptsetup --master-key-file should unlock with the master key.

Update

The action of updating the backups regularly (and maybe being reminded to do so). In the backend it could use rsync to only copy the files that have changed.

Before the backup Persistence is being updated, the user is prompted for a passphrase.

Updating also updates the system partition of the backup Tails.

Implementation notes:

  • We want to operate at the file system level to speed things up. We need a tool that allows copying EVERYTHING from the file system (ACL, extended attributes, etc.).

    See anarcat: rsync oneliner: a study of a complex commandline on how to do that with rsync.

  • We have to make sure that applications that could be problematic if backed up while running are shut down (eg. Thunderbird) while not bothering the user for others (eg. NetworkManager).

Better UX

The update experience could be improved, like Deja Dup does, by:

  • Displaying automatic reminders

  • Starting the update automatically when the backup Tails is plugged in.

Discussion

Pros

  • Maybe the fastest and simplest recovery experience possible.

  • The creation experience is pretty simple as well: all I need is a USB stick that is as big as my current Tails and it will fit all my data. Tails Installer don't have to display more options.

  • The same experience can be used to migrate to a bigger or faster USB stick. (##14624)

Cons

  • It won't solve any of the other scenarios. Building backup logic into Tails Installer might also not be a good investment in case we want to solve the others scenarios later on in the future. For example, the recovery experiences for each of S2, S3, and S4 would be completely different.

Future

Once we have a solid solution for S1, we might able to reuse some of it's UX and code to solve scenarios S2, S3, and S4.

They would require:

  • Remote storage
  • Backup history
  • Include and exclude lists

We should keep these in mind while solving S1 and find a sweet spot between solving S1 and building foundations to support other backups scenarios in the future.

User research

Vietnam

From Understanding Internet Freedom: Vietnam's Digital Activists by SecondMuse:

  • Bloggers are more focused on mitigating risk around the storage of data rather than the transmission of that data. All of the bloggers have experienced harassment or arrest by police and security officials, and many of them have had their devices taken away as a result. They view the confiscation of devices as the highest point of risk associated with their work.

  • Tool security design must assume the likelihood of a device being confiscated by authorities as an essential part of the threat model. Physical devices are commonly confiscated by authorities, and bloggers have widely cited the risks posed by confiscation due to the information they keep on their devices. Carefully consider this aspect of the threat model when designing security features for a tool.

  • Protection of devices and hardware: The bloggers were unanimously concerned about the possibility that arrest might result in the confiscation of their computers or mobile devices, making any information saved on those devices vulnerable. Notwithstanding, only a few bloggers mentioned behaviors addressing this potential vulnerability, including [...]. Just one blogger discussed putting sensitive information on an encrypted hard disk that he stored in a safe place.

From Understanding Internet Freedom: Tunisia's Journalists and Bloggers:

  • Backing up documents to protect the data and themselves: Preventing data loss was one primary motivation for backing up information, and particularly sensitive documents. One group of journalists also highlighted the practice of making physical copies of sensitive documents and keeping them in many different places. In presenting their security strategy for a blogger releasing sensitive information, that group emphasized “contacting other acquaintances to [let them] know that if anything happens to the blogger, they will be publishing [the information] all over the internet and making a big scandal.”

Archive of the tests of existing backup solutions

Archive the tests of existing backup solutions done until 2019

GNOME Backups (Deja Dup)

Pros:

  • Has a simple interface for configuring backups. The nicest we could find and with a clear mission statment.
  • Restoration of single files or folders is integrated in Files.
  • Notifications to start backups periodically.
  • Supports different kinds of remote: FTP, SSH, WebDAV, and Windows share are displayed in the graphical interface, duplicity supports many more.

Cons:

  • Uses duplicity as back end with the following drawbacks:
    • It forces full backups now and then, making them unexpectedly extremelly slower than regular incremental ones.
    • Restoring a single file is slow.
    • Impossible to navigate the file system, either of the last or past increments without doing a full restoration.
  • No feedback whatsoever on the progress of the backup when they get started automatically.
  • Doesn't know how to ask for root password for backing up files it cannot access.
  • If files are mounted or linked in a different place, you can only restore from Nautilus the original places backed up by Deja Dup. That's a problem for dotfiles which are single files symlinked to $HOME and you can't easily get both the Nautilus integration and the full restoration.
  • You can only add folders to be backed up or ignored by Deja-Dup (and not single files).
  • Fails to restore symlinks.

See #9888.

Tar + GPG

A simple approach might be to combine tar and GPG:

tar cjf - . | gpg --cipher-algo AES -c - > /home/amnesia/YYYY-MM-DD-backup.tbz2.gpg

Initial implementation might be easy but that will probably end up being quite a big piece of custom code.

Duplicity

http://duplicity.nongnu.org/

It supports something that's basically "tar | gpg" for the first iteration, and it also leaves room for future improvements, thanks to its support for incremental and remote backups. Also, it allows users to restore or inspect their backups outside of Tails, without having to manually decipher yet another backup file format.

Cons

  • Duplicity creates tons of messy files on the file system.
  • And requires users to do a full backup from time to time when using incremental backups

Loopback LUKS

Pros

  • One file per backup.

Cons

  • We still need to find another tool to create the device and copy the files.
  • Maybe backups done this way would be much bigger than duplicity backups.
  • Duplicity supports incremental backups even if they have some limitations.

Obnam

Obnam only supports encrypting to a GnuPG key, so this would require another layer of encryption (such as a LUKS container file, or something else) if we want to support encrypting with a passphrase (and I think we should).

grsync + encrypted device

grsync is a GUI for the renown and rock-solid rsync, already shipped in Tails.

Grsync is packaged for debian.

The documentation for the creation of the encrypted device is already written.

It has a rock-science interface, basically displaying the complexity of the command line options in a graphical way.

How to test?

  • create an encrypted device.
  • install the grsync package.
  • paste those lines in a .grsync file, then double-click on it. (grsync ask you first to confirm the profile you want to use. Just click on "Open")
  • define the destination (i.e your encrypted device)
  • test wildly! and please report your results

Pros

  • not that much things to code
  • grsync can be easily preconfigurated, eventually with multiple profiles
  • this solution separates backup and encryption work
  • allows remote backups

Features to request

  • grsync should check if enough space is available on the destination before running. Update: rsync 3.1.0 introduces a --preallocate option. (Tails actually ships rsync 3.0.9 Tails 2.6 ships rsync 3.1.1)
  • grsync should ask for confirmation when the option "Delete on the destination" is activated
  • when user double-click on a .grsync file, a window appears to confirm which file to open. This may be confusing.

Misc

  • some files are normally not readable by rsync (for example persistence.conf, apt/*) Grsync can bypass that with the option "Run as superuser", we should investigate the consequences of using such an option. We still have the possibility to ignore those files: we just have then to add --exclude="apt" in the preconfiguration file.
  • decide if we activate the option --delete by default.
  • test restoration (see File → Invert source and destination). Then, at least, check files permissions.
  • test backup + restoration with symlinks and hardlinks in the Persistent folder.
  • eventually test remote backups.
  • see the thread on tails-dev

rdup

https://github.com/miekg/rdup

rdup separates the logic of backing up from the actual copying. It supports filters to compress and encrypt individual files (typically using gpg or mcrypt) as well as path names and can do both full as well as incremental backups.

Pros

  • simple and small command line programs
  • more sophisticated than tar+gpg and probably addresses many of the corner cases that would otherwise have to be handled by increasingly complicated scripts
  • in Debian Squeeze / Wheezy / testing

Cons

  • still requires a script of some sort to drive it
  • probably requires a gui to make it simple to use

borgbackup

https://borgbackup.readthedocs.io/en/stable/index.html

Borg is the perfect backup back end. It supports increments, encryption, data deduplication, local and remote backups, and mounting backups as FUSE file systems. And it way faster than obnam which advertises similar properties. But it doesn't have a graphical user interface.

Packages for borgbackup are in Jessie Backports and in Strech

restic

https://restic.github.io/

Ristic looks very similar to borgbackup. It is a small CLI tool for incremental, authenticated, and confidential backups of files.

It is not clear where the tools differ and it would be nice to have a comparison of both tools.

Packages are for restic are in Strech.

Duplicati

  • Mono application (.NET).
  • Unofficial Debian package. I found no reference to "duplicati" on bugs.debian.org.
  • The client side has both:
    • a command-line interface
    • a web server interface running on localhost:8200: https://duplicati.readthedocs.io/en/latest/03-using-the-graphical-user-interface/. I could pierce a hole in our firewall to access this from Firefox ESR in Tails but then I couldn't make it connect to an SSH server. I'm not sure how this would integrate with Tails in general but it doesn't seem the right user interface for our users.
  • Backend supports for many types of remote storage: https://duplicati.readthedocs.io/en/latest/05-storage-providers/.
  • Backend supports incremental backups and deduplication.
  • I didn't manage to make the SSH backend work from the command line (tried torsocks) but while doing so I can see in Onion Circuits that Duplicati is calling back home in several ways: checking for update and reporting usage stats.

spideroak

SpiderOak is a commercial online encrypted backup service. Reading their website might be useful for the user research part of this project.

Script by a2

Other solutions

  • sbackup, Simple Backup: unmaintained since 2008.

  • Lucky Backup: seems very oldish and not really active.

  • Back In Time which has a GNOME frontend. It does snapshots with hardlinks to reduce space. Can do local and SSH as remote.