Preparing a Gravity environment for Workbench
Determining the resource requirements for a Kubernetes cluster depends on a number of different factors, including what type of applications you are going to be running, the number of users that are active at once, and the workloads you will be managing within the cluster. Data Science & AI Workbench’s performance is tightly coupled with the health of your Kubernetes stack, so it is important to allocate enough resources to manage your users workloads. Generally speaking, your system should contain at least 1 CPU, 1GB of RAM, and 5GB of disk space for each project session or deployment.
To install Workbench successfully, your systems must meet or exceed the requirements listed below. Anaconda has created a pre-installation checklist to help prepare you for installation. The checklist verifies that your cluster has the necessary resources are reserved and is ready to install Workbench. Anaconda’s Implementation team will review the checklist with you prior to your installation.
You can initially install Workbench on up to five nodes. Once initial installation is complete, you can add or remove nodes as needed. Anaconda recommends having one master and one worker node per cluster. For more information, see Adding and removing nodes.
For historical information and details regarding Anaconda’s policies related to Gravity, see our Gravity update policy.
Hardware requirements
Anaconda’s hardware recommendations ensure a reliable and performant Kubernetes cluster.
The following are minimum specifications for the master and worker nodes, as well as the entire cluster.
Master node | Minimum |
---|---|
CPU | 16 cores |
RAM | 64GB |
Disk space in /opt/anaconda | 500GB |
Disk space in /var/lib/gravity | 300GB |
Disk space in /tmp or $TMPDIR | 50GB |
- Disk space reserved for
/var/lib/gravity
is utilized as additional space to accommodate upgrades. Anaconda recommends having this available during installation. - The
/var/lib/gravity
volume must be mounted on local storage. Core components of Kubernetes run from this directory, some of which are extremely intolerant of disk latency. Therefore, Network-Attached Storage (NAS) and Storage Area Network (SAN) solutions are not supported for this volume. - Disk space reserved for
/opt/anaconda
is utilized for project and package storage (including mirrored packages). - Anaconda recommends that you set up the
/opt/anaconda
and/var/lib/gravity
partitions using Logical Volume Management (LVM) to provide the flexibility needed to accommodate easier future expansion. - Currently
/opt
and/opt/anaconda
must be anext4
orxfs
filesystem, and cannot be an NFS mountpoint. Subdirectories of/opt/anaconda
may be mounted through NFS. For more information, see Mounting an external file share.
Installations of Workbench that utilize an xfs
filesystem must support d_type
file labeling to work properly. To support d-type
file labeling, set ftype=1
by running the following command prior to installing Workbench.
This command will erase all data on the specified device! Make sure you are targeting the correct device and that you have backed up any important data from it before proceeding.
Worker node | Minimum |
---|---|
CPU | 16 cores |
RAM | 64GB |
Disk space in /var/lib/gravity | 300GB |
Disk space in /tmp or $TMPDIR | 50GB |
When installing Workbench on a system with multiple nodes, verify that the clock of each node is in sync with the others prior to installation. Anaconda recommends using the Network Time Protocol (NTP) to synchronize computer system clocks automatically over a network. For step-by-step instructions, see How to Synchronize Time with Chrony NTP in Linux.
Disk IOPS requirements
Master and worker nodes require a minimum of 3000 concurrent Input/Output operations Per Second (IOPS).
Hard disk manufacturers report sequential IOPS, which are different than concurrent IOPS. On-premise installations require servers with disks that support a minimum of 50 sequential IOPS. Anaconda recommends using Solid State Drive (SSD) or better.
Cloud performance requirements
Requirements for running Workbench in the cloud relate to compute power and disk performance. Make sure your chosen cloud platform meets these minimum specifications:
Anaconda recommends an instance type no smaller than m4.4xlarge
for both master and worker nodes. You must have a minimum of 3000 IOPS.
Operating system requirements
Workbench currently supports the following Linux versions:
- RHEL/CentOS 7.x, 8.x
- Ubuntu 16.04
- SUSE 12 SP2, 12 SP3, 12 SP5 (Requires you set
DefaultTasksMax=infinity
in/etc/systemd/system.conf
)
Some versions of the RHEL 8.4 AMI on AWS are bugged due to a combination of a bad ip rule
and the networkmanager service. Remove the bad rule and disable the networkmanager service prior to installation.
Security requirements
- If your Linux system utilizes an antivirus scanner, make sure the scanner excludes the
/var/lib/gravity
volume from its security scans. - Installation requires that you have
sudo
access. - Nodes running CentOS or RHEL must make sure that Security Enhanced Linux (SELinux) set to either
disabled
orpermissive
mode in the/etc/selinux/config
file.
Check the status of SELinux by running the following command:
Kernel module requirements
Kubernetes relies on certain functionalities provided by the Linux kernel. The Workbench installer verifies that the following kernel modules (required for Kubernetes to function properly) are present, and notifies you if any are not loaded.
Linux Distribution | Version | Required Modules |
---|---|---|
CentOS | 7.2 | bridge, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay |
CentOS | 7.3-7.7, 8.0 | br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay |
RedHat Linux | 7.2 | bridge, ebtable_filter, ebtables, iptable_filter, iptable_nat |
RedHat Linux | 7.3-7.7, 8.0 | br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay |
Ubuntu | 16.04 | br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay |
Suse | 12 SP2, 12 SP3, 12 SP5 | br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay |
Module Name | Purpose |
---|---|
bridge | Enables Kubernetes iptables-based proxy to operate |
br_nerfilter | Enables Kubernetes iptables-based proxy to operate |
overlay | Enables the use of the overlay or overlay2 Docker storage driver |
ebtable_filter | Allows a service to communicate back to itself via internal load balancing when necessary |
ebtables | Allows a service to communicate back to itself via internal load balancing when necessary |
iptable_filter | Ensures the firewall rules set up by Kubernetes function properly |
iptable_nat | Ensures the firewall rules set up by Kubernetes function properly |
Verify a module is loaded by running the following command:
If the command produces a return, the module is loaded.
If necessary, run the the following command to load a module:
If your system does not load modules at boot, you must run the following command—for each module—to ensure they are loaded on every reboot:
System control settings
Workbench requires the following Linux sysctl
settings to function properly:
sysctl setting | Purpose |
---|---|
net.bridge.bridge-nf-call-iptables | Communicates with bridge kernel module to ensure Kubernetes iptables-based proxy operates |
net.bridge.bridge-nf-call-ip6tables | Communicates with bridge kernel module to ensure Kubernetes iptables-based proxy operates |
fs.may_detach_mounts | Allows the unmount operation to complete even if there are active references to the filesystem remaining |
net.ipv4.ip_forward | Required for internal load balancing between servers to work properly |
fs.inotify.max_user_watches | Set to 1048576 to improve cluster longevity |
If necessary, run the following command to enable a system control setting:
To persist system settings on boot, run the following for each setting:
GPU requirements
Workbench requires that you install a supported version of the NVIDIA Compute Unified Device Architecture (CUDA) driver on the host OS of any GPU worked node.
Currently, Workbench supports the following CUDA driver versions:
- CUDA
10.2
- CUDA
11.2
- CUDA
11.4
- CUDA
11.6
Notify your Anaconda Implementation team member which CUDA version you intend to use, so they can provide the correct installer.
You can obtain the driver you need a few different ways.
- Use the package manager or the Nvidia runfile to download the file directly.
- For SLES, CentOS, and RHEL, you can get a supported driver using
rpm (local)
orrpm (network)
. - For Ubunutu, you can get a driver using
deb (local)
ordeb (network)
.
GPU deployments should use one of the following models:
- Tesla V100 (recommended)
- Tesla P100 (adequate)
Theoretically, Workbench will work with any GPU card compatible with the CUDA drivers, as long as they are properly installed. Other cards supported by CUDA 11.6
:
- A-Series: NVIDIA A100, NVIDIA A40, NVIDIA A30, NVIDIA A10
- RTX-Series: RTX 8000, RTX 6000, NVIDIA RTX A6000, NVIDIA RTX A5000, NVIDIA RTX A4000, NVIDIA T1000, NVIDIA T600, NVIDIA T400
- HGX-Series: HGX A100, HGX-2
- T-Series: Tesla T4
- P-Series: Tesla P40, Tesla P6, Tesla P4
- K-Series: Tesla K80, Tesla K520, Tesla K40c, Tesla K40m, Tesla K40s, Tesla K40st, Tesla K40t, Tesla K20Xm, Tesla K20m, Tesla K20s, Tesla K20c, Tesla K10, Tesla K8
- M-Class: M60, M40 24GB, M40, M6, M4
Support for GPUs in Kubernetes is still a work in progress, and each cloud vendor provides different recommendations. For more information about GPUs, see Understanding GPUs.
Network requirements
Workbench requires the following network ports to be externally accessible:
These ports need to be externally accessible during installation only, and can be closed after completing the install process:
The following ports are used for cluster operation, and must be open internally, between cluster nodes:
Make sure that the firewall is permanently set to keep the required ports open, and will save these settings across reboots. Then restart the firewall to load your changed settings.
There are various tools you can use to configure firewalls and open required ports, including iptables
, firewall-cmd
, susefirewall2
, and more!
You’ll also need to update your firewall settings to ensure that the 10.244.0.0/16
pod subnet and 10.100.0.0/16
service subnet are accessible to every node in the cluster, and grant all nodes the ability to communicate via their primary interface.
For example, if you’re using iptables
:
If you plan to use online package mirroring, allowlist the following domains in your network’s firewall settings:
- repo.anaconda.com
- anaconda.org
- conda.anaconda.org
- binstar-cio-packages-prod.s3.amazonaws.com
To use Workbench in conjucntion with Anaconda Navigator in online mode, allowlist the following sites in your network’s firewall settings as well:
- https://repo.anaconda.com (or for older versions of Navigator and conda)
- https://conda.anaconda.org if any users will use conda-forge and other channels on Anaconda.org
- google-public-dns-a.google.com (8.8.8.8:53) to check internet connectivity with Google Public DNS
TLS/SSL certificate requirements
Workbench uses certificates to provide transport layer security for the cluster. Self-signed certificates are generated during the initial installation. Once installation is complete, you can configure the platform to use your organizational TLS/SSL certificates.
You can purchase certificates commercially, or generate them using your organization’s internal public key infrastructure (PKI) system. When using an internal PKI-signed setup, the CA certificate is inserted into the Kubernetes secret.
In either case, the configuration will include the following:
- A certificate for the root certificate authority (CA)
- An intermediate certificate chain
- A server certificate
- A certificate private key
For more information about TLS/SSL certificates, see Updating TLS/SSL certificates.
DNS requirements
Workbench assigns unique URL addresses to deployments by combining a dynamically generated universally unique identifier (UUID) with your organization’s domain name, like this: https://uuid001.anaconda.yourdomain.com
.
This requires the use of wildcard DNS entries that apply to a set of domain names such as *.anaconda.yourdomain.com
.
For example, if you are using the domain name anaconda.yourdomain.com
with a master node IP address of 12.34.56.78
, the DNS entries would be as follows:
The wildcard subdomain’s DNS entry points to the Workbench master node.
The master node’s hostname and the wildcard domains must be resolvable with DNS from the master nodes, worker nodes, and the end user machines. To ensure the master node can resolve its own hostname, distribute any /etc/hosts
entries to the gravity environment.
If dnsmasq
is installed on the master node or any worker nodes, you’ll need to remove it from all nodes prior to installing Workbench.
Verify dnsmasq
is disabled by running the following command:
If necessary, stop and disable dnsmasq
, run the following commands:
Browser requirements
Workbench supports the following web browsers:
- Chrome 39+
- Firefox 49+
- Safari 10+
The minimum browser screen size for using the platform is 800 pixels wide and 600 pixels high.
Verifying system requirements
The installer performs pre-installation checks, and only allows installation to continue on nodes that are configured correctly, and include the required kernel modules. If you want to perform the system check yourself prior to installation, you can run the following commands from the installer directory, ~/anaconda-enterprise-<VERSION>
, on your intended master and worker nodes:
To perform system checks on the master
node, run the following command as sudo or
root user:
To perform system checks on a worker
node, run the following command as sudo or
root user:
If all of the system checks pass and all requirements are met, the output from the above commands will be empty. If the system checks fail and some requirements are not met, the output will indicate which system checks failed.
Pre-installation checklist
Anaconda has created this pre-installation checklist to help you verify that you have properly prepared your environment prior to installation. You can run the system verification checks to automatically verify many of the requirements for you.
System verficication checks are not comprehensive, so make sure you manually verify the remaining requirements.
Was this page helpful?