Backing up and restoring Workbench

Backing up Data Science & AI Workbench protects your data in case of accidents (deletion of important data) or technical issues (failed hard drive). You can back up at any time, but refer to your company’s Disaster Recovery policy for best practices.

Do not attempt to restore backup files created from a different version of Workbench. To upgrade your version of Workbench, reference Upgrading Workbench.

Anaconda recommends the use of managed persistence to ensure open sessions and deployments are captured by the backup process. If you are not using managed persistence, have all users save their work, stop any open sessions and deployments, and log out of the platform during the backup process.

The backup/restore script supports synchronizing your production cluster to a “hot” backup cluster at periodic intervals. This is commonly used for Disaster Recovery. To learn more about this process, please speak with our Integration Team.

Obtaining backup restore tools

The ae5-conda environment contains all the tools you need to backup and restore Workbench; for more information, see Administration server.

Download the environment installer file.
If you have previously installed the ae5-conda environment, it is a good idea to update the ae5_backup_restore package to the latest version before continuing. You can update the package by running the command:
conda update ae5_backup_restore

Install and activate the environment by running the following commands:

chmod +x ae5-conda-latest-Linux-x86_64.sh
bash ae5-conda-latest-Linux-x86_64.sh
source ~/ae5-conda/bin/activate

Verify your installation by running the following command:
```
ae_backup.sh -h
```
If your terminal returns the usage help text, then your installation of the backup/restore script was successful! You are now ready to run the backup script.

Run the backup script

Run the ae_backup.sh script to create backup files of your cluster in the current directory:

bash ae_backup.sh

Or specify a destination for your backup files:

bash ae_backup.sh /your/file/path/here

The backup script creates two tarball files:

ae5_config_db_YYYYMMDDHHMM.tar.gz
ae5_data_YYYMMDDHHMM.tar.gz

YYYYMMDDHHMM is the format for the timestamp of your backup data.

The ae5_config_db file stores your Kubernetes resources and Postgres data.

The ae5_data file stores your /opt/anaconda/storage data.

The backup script does not back up the package repository.

Backup command line options

Option	Description
`-h` `--help`	Prints help and exits.
`-d <DIR>` `--ae-data <DIR>`	Changes the location of the Workbench storage. The default location is `/opt/anaconda/storage`. Do not update the location of Workbench storage on Gravity clusters!
`-b <DIR>` `--backup-dir <DIR>`	Changes the location where the backup files are saved. The default location is the current directory. Use this option when the space in the current directory is insufficient to hold the backup.
`-s` `--skip-clean`	Prevents the removal of intermediate files generated during the backup process. This is useful for informational or debugging purposes.
`-c` `--config-db`	The script will only create the config/postgres tarball without a data tarball. This is useful if combined with an alternate mechanism for taking snapshots or backups of the data.
`-r` `--repository`	Includes the full package repository in the data tarball. By default, this is not included because the repository is typically large and incompressible.

Restore from backup data

The restore script requires backup files to be created from the same output of the backup script. Do not attempt to load files that were created from different backups.

Run the restore script to restore your cluster from previously-created backup data:

bash ae_restore.sh ae5_config_db_YYYYMMDDHHMM.tar.gz ae5_data_YYYYMMDDHHMM.tar.gz

Restoration modes

The restore script has three different modes for data restoration that can be used to customize how Workbench is restored.

Restoring to the original host

In this mode, all resources are restored from backup, except for the base ingress specification.

This mode is used when a clean reinstall of an existing cluster has been performed and you wish to perform a full restoration from backup. User workload will be restored (deployments, sessions, jobs), except they will be placed in a paused state. The script provides instructions on how to unpause user workload once the administrator is satisfied that the restoration has completed successfully.

Restoring to a different host without a hostname change

In this mode, only some resources are restored, as described below.

Restored data:

Kubernetes secrets (non-ssl)
User/Project Data
Postgres

Non-restored data:

Hostname
SSL certificates
Configmaps
Ingress
Kubernetes resources for user workload

This mode is used if you wish to restore the backup to a separate existing cluster for inspection. By preserving the cluster’s native configuration, the operation of the cluster is preserved but disconnected from the source.

Restoring to a different host, but with a hostname change

This mode fully restores all resources, including the deployments and scheduled jobs. The ingress is also updated in this case to reflect the new hostname. This is used if you need to replace a faulty master node with a hot backup that was already running under a different hostname.

Restoration command line options

Option	Description
`-h` `--help`	Prints help and exits.
`-d <DIR>` `--ae-data <DIR>`	Changes the location of the Workbench storage. Default: /opt/anaconda/storage. Should not be changed when used on a Gravity cluster.
`-b <DIR>` `--backup-dir <DIR>`	Changes the location where backup files are found. Default: current directory. Use when space in the current directory is insufficient.
`-s` `--skip-clean`	Prevents the removal of intermediate files generated during the backup process. Useful for debugging or informational purposes.
`-u` `--update-hostname`	Allows the hostname to be modified. Automatically triggers `--restore-certs` and `--restore-configmap` when supplied. If this option is not supplied, the existing SSL certificates and configmap are used.
`--restore-certs`	Restores SSL certificates from a backup, even if the hostname does not change.
`--restore-configmap`	Restores the system’s configmap from a backup, even if the hostname does not change.
`-c` `--config-only`	Only restores configuration data (SSL, secrets, configmaps, and so on) without modifying the Postgres database and data.
`--db-version=`	Specifies the PostgreSQL version to use when patching the database StatefulSet during a restore. The version must already exist in the same registry as the current PostgreSQL image. If not, the restore process will fail until the image is available.
`-w` `--wait`	Waits for system pods to stabilize before exiting the script.
`-p` `--pause`	Leaves the cluster in a paused state upon completion of the restore process.
`-y` `--yes`	Skips confirmation prompts during restore. Use with caution.

Bring your own Kubernetes

Customer supplied Kubernetes clusters (non-gravity) can take advantage of this backup/restore script. However the backup/restore process will be slightly different.

When taking a backup, you will need to supply the -c, --config-db command line argument, as the backup script will only be able to capture your Workbench configuration data. This will not capture user/project data, and you will need to ensure you are taking regular backups of your provided storage solution. This includes the Persistent Volume used for both anaconda-storage and anaconda-persistence that were configured at time of install.

When restoring from a backup, you will need to supply the -c, --config-only command line option, as the restore script will only be able to restore your Workbench configuration data. This will not restore user/project data, and you will need to ensure you have also restored a backup of your provided storage solution.

Data Science & AI Workbench

​Obtaining backup restore tools

​Run the backup script

​Backup command line options

​Restore from backup data

​Restoration modes

​Restoring to the original host

​Restoring to a different host without a hostname change

​Restoring to a different host, but with a hostname change

​Restoration command line options

​Bring your own Kubernetes

Obtaining backup restore tools

Run the backup script

Backup command line options

Restore from backup data

Restoration modes

Restoring to the original host

Restoring to a different host without a hostname change

Restoring to a different host, but with a hostname change

Restoration command line options

Bring your own Kubernetes