Backing up Data Science & AI Workbench protects your data in case of accidents (deletion of important data) or technical issues (failed hard drive). You can back up at any time, but refer to your company’s Disaster Recovery policy for best practices.

Do not attempt to restore backup files created from a different version of Workbench. To upgrade your version of Workbench, reference Upgrading Workbench.

Anaconda recommends the use of managed persistence to ensure open sessions and deployments are captured by the backup process. If you are not using managed persistence, have all users save their work, stop any open sessions and deployments, and log out of the platform during the backup process.

The backup/restore script supports synchronizing your production cluster to a “hot” backup cluster at periodic intervals. This is commonly used for Disaster Recovery. To learn more about this process, please speak with our Integration Team.

Obtaining backup restore tools

The ae5-conda environment contains all the tools you need to backup and restore Workbench; for more information, see Administration server.

  1. Download the environment installer file.

  2. Install and activate the environment by running the following commands:

    chmod +x ae5-conda-latest-Linux-x86_64.sh
    bash ae5-conda-latest-Linux-x86_64.sh
    source ~/ae5-conda/bin/activate
    
  3. Verify your installation by running the following command:

    ae_backup.sh -h
    

    If your terminal returns the usage help text, then your installation of the backup/restore script was successful! You are now ready to run the backup script.

Run the backup script

Run the ae_backup.sh script to create backup files of your cluster in the current directory:

bash ae_backup.sh

Or specify a destination for your backup files:

bash ae_backup.sh /your/file/path/here

The backup script creates two tarball files:

ae5_config_db_YYYYMMDDHHMM.tar.gz
ae5_data_YYYMMDDHHMM.tar.gz
  • YYYYMMDDHHMM is the format for the timestamp of your backup data.
  • The ae5_config_db file stores your Kubernetes resources and Postgres data.
  • The ae5_data file stores your /opt/anaconda/storage data.
  • The backup script does not back up the package repository.

Backup command line options

OptionDescription
  • -h
  • --help
Prints help and exits.
  • -d <DIR>
  • --ae-data <DIR>
Changes the location of the Workbench storage. The default location is /opt/anaconda/storage.
Do not update the location of Workbench storage on Gravity clusters!
  • -b <DIR>
  • --backup-dir <DIR>
Changes the location where the backup files are saved. The default location is the current directory. Use this option when the space in the current directory is insufficient to hold the backup.
  • -s
  • --skip-clean
Prevents the removal of intermediate files generated during the backup process. This is useful for informational or debugging purposes.
  • -c
  • --config-db
The script will only create the config/postgres tarball without a data tarball. This is useful if combined with an alternate mechanism for taking snapshots or backups of the data.
  • -r
  • --repository
Includes the full package repository in the data tarball. By default, this is not included because the repository is typically large and incompressible.

Restore from backup data

The restore script requires backup files to be created from the same output of the backup script. Do not attempt to load files that were created from different backups.

Run the restore script to restore your cluster from previously-created backup data:

bash ae_restore.sh ae5_config_db_YYYYMMDDHHMM.tar.gz ae5_data_YYYYMMDDHHMM.tar.gz

Restoration modes

The restore script has three different modes for data restoration that can be used to customize how Workbench is restored.

Restoring to the original host

In this mode, all resources are restored from backup, except for the base ingress specification.

This mode is used when a clean reinstall of an existing cluster has been performed and you wish to perform a full restoration from backup. User workload will be restored (deployments, sessions, jobs), except they will be placed in a paused state. The script provides instructions on how to unpause user workload once the administrator is satisfied that the restoration has completed successfully.

Restoring to a different host without a hostname change

In this mode, only some resources are restored, as described below.

Restored data:

  • Kubernetes secrets (non-ssl)
  • User/Project Data
  • Postgres

Non-restored data:

  • Hostname
  • SSL certificates
  • Configmaps
  • Ingress
  • Kubernetes resources for user workload

This mode is used if you wish to restore the backup to a separate existing cluster for inspection. By preserving the cluster’s native configuration, the operation of the cluster is preserved but disconnected from the source.

Restoring to a different host, but with a hostname change

This mode fully restores all resources, including the deployments and scheduled jobs. The ingress is also updated in this case to reflect the new hostname. This is used if you need to replace a faulty master node with a hot backup that was already running under a different hostname.

Restoration command line options

OptionDescription
  • -h
  • --help
Prints help and exits.
  • -d <DIR>
  • --ae-data <DIR>
Changes the location of the Workbench storage. Default: /opt/anaconda/storage. Should not be changed when used on a Gravity cluster.
  • -b <DIR>
  • --backup-dir <DIR>
Changes the location where backup files are found. Default: current directory. Use when space in the current directory is insufficient.
  • -s
  • --skip-clean
Prevents the removal of intermediate files generated during the backup process. Useful for debugging or informational purposes.
  • -u
  • --update-hostname
Allows the hostname to be modified. Automatically triggers --restore-certs and --restore-configmap when supplied. If this option is not supplied, the existing SSL certificates and configmap are used.
  • --restore-certs
Restores SSL certificates from a backup, even if the hostname does not change.
  • --restore-configmap
Restores the system’s configmap from a backup, even if the hostname does not change.
  • -c
  • --config-only
Only restores configuration data (SSL, secrets, configmaps, etc.) without modifying the Postgres database and data.
  • --db-version=
Specifies the PostgreSQL version to use when patching the database StatefulSet during a restore. The version must already exist in the same registry as the current PostgreSQL image. If not, the restore process will fail until the image is available.
  • -w
  • --wait
Waits for system pods to stabilize before exiting the script.
  • -p
  • --pause
Leaves the cluster in a paused state upon completion of the restore process.
  • -y
  • --yes
Skips confirmation prompts during restore. Use with caution.

Bring your own Kubernetes

Customer supplied Kubernetes clusters (non-gravity) can take advantage of this backup/restore script. However the backup/restore process will be slightly different.

When taking a backup, you will need to supply the -c, --config-db command line argument, as the backup script will only be able to capture your Workbench configuration data. This will not capture user/project data, and you will need to ensure you are taking regular backups of your provided storage solution. This includes the Persistent Volume used for both anaconda-storage and anaconda-persistence that were configured at time of install.

When restoring from a backup, you will need to supply the -c, --config-only command line option, as the restore script will only be able to restore your Workbench configuration data. This will not restore user/project data, and you will need to ensure you have also restored a backup of your provided storage solution.