Deploying RAPIDS in the Cloud

Deploy Now

RAPIDS’s GPU accelerated data science tools can be deployed on all of the major clouds, allowing anyone to take advantage of the speed increases and TCO reductions that RAPIDS enables.

RAPIDS can be deployed in a number of ways, from hosted Jupyter notebooks, to the major HPO services, all the way up to large-scale clusters via Dask or Kubernetes. Deploying on the cloud will require you to make use of supported GPU instances. Each major cloud provider has GPU instances that are supported by RAPIDS with varying capabilities and price points - the below charts identifies the major instance types of each cloud.

Cloud Providers

For the various deployment options on each cloud, as well as instructions and links to more details, please select the cloud provider you wish to deploy on.

aws

Amazon Web Services

RAPIDS can be deployed on Amazon Web Services (AWS) in several ways:

Single EC2 instance Cluster using Dask Cluster using Kubernetes and EKS/GKE/AKS Amazon Sagemaker

Cloud
Provider
Inst.
Type
Inst.
Name
GPU
Count
GPU
Type
xGPU
RAM
xGPU
Perf.
AWS G4 g4dn.xlarge 1 T4 16 (GB) 8.1 (TFLOPS)
AWS G4 g4dn.8xlarge 1 T4 16 (GB) 8.1 (TFLOPS)
AWS G4 g4dn.12xlarge 4 T4 16 (GB) 8.1 (TFLOPS)
AWS G4 g4dn.16xlarge 1 T4 16 (GB) 8.1 (TFLOPS)
AWS G4 g4dn.metal 8 T4 16 (GB) 8.1 (TFLOPS)
AWS P3 p3.2xlarge 1 V100 16 (GB) 14.1 (TFLOPS)
AWS P3 p3.8xlarge 4 V100 16 (GB) 14.1 (TFLOPS)
AWS P3 p3.16xlarge 8 V100 16 (GB) 14.1 (TFLOPS)
AWS P3 p3dn.24xlarge 8 V100 32 (GB) 14.1 (TFLOPS)

Jump to Top

AWS Single Instance (EC2)

There are multiple ways you can deploy RAPIDS on a single instance, but the easiest is to use the RAPIDS docker image:

1. Initiate. Initiate an instance supported by RAPIDS. See the introduction section for a list of supported instance types. It is recommended to use an AMI that already includes the required NVIDIA drivers, such as the Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver or the AWS Deep Learning AMI.

2. Credentials. Using the credentials supplied by AWS, log into the instance via SSH. For a short guide on launching your instance and accessing it, read the Getting Started with Amazon EC2 documentation.

3. Install. Install docker in the AWS instance. This step is not required if you are using AWS Deep Learning AMI.

4. Install. Install RAPIDS docker image. The docker container can be customized by using the options provided in the Getting Started page of RAPIDS. Example of an image that can be used is provided below:

>>> docker pull rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04
>>> docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04

5. Test RAPIDS. Test it! The RAPIDS docker image will start a Jupyter notebook instance automatically. You can log into it by going to the IP address provided by AWS on port 8888.

Jump to Top

AWS Cluster via Dask

RAPIDS can be deployed on ECS using Dask’s dask-cloudprovider management tools. For more details, see our blog post on deploying on ECS.

1. Setup AWS credentials. First, you will need AWS credentials to allow us to interact with the AWS CLI. If someone else manages your AWS account, you will need to get these keys from them. You can provide these credentials to dask-cloudprovider in a number of ways, but the easiest is to setup your local environment using the AWS command line tools:

>>> pip install awscli
>>> aws configure

2. Install dask-cloudprovider. To install, you will need to run the following:

>>> pip install dask-cloudprovider

3. Create an EC2 cluster: In the AWS console, visit the ECS dashboard. From the “Clusters” section on the left hand side, click “Create Cluster”.

Make sure to select an EC 2 Linux + Networking cluster so that we can specify our networking options.

Give the cluster a name EX. rapids-cluster.

Change the instance type to one that supports RAPIDS-supported GPUs (see introduction section for list of supported instance types). For this example, we will use p3.2xlarge, each of which comes with one NVIDIA V100 GPU.

In the networking section, select the default VPC and all the subnets available in that VPC.

All other options can be left at defaults. You can now click “create” and wait for the cluster creation to complete.

4. Create a Dask cluster:

Get the Amazon Resource Name (ARN) for the cluster you just created.

Set AWS_DEFAULT_REGION environment variable to your default region:

export AWS_DEFAULT_REGION=[REGION]

[REGION] = code fo the region being used.

Create the ECSCluster object in your Python session:

>>> from dask_cloudprovider import ECSCluster
>>> cluster = ECSCluster(
                            cluster_arn=[CLUSTER_ARN],
                            n_workers=[NUM_WORKERS],
                            worker_gpu=[NUM_GPUS],
                            fargate_scheduler=True
                         )

[CLUSTER_ARN] = The ARN of an existing ECS cluster to use for launching tasks [NUM_WORKERS] = Number of workers to start on cluster creation.
[NUM_GPUS] = The number of GPUs to expose to the worker.

5. Test RAPIDS. Create a distributed client for our cluster:

>>> from dask.distributed import Client
>>> client = Client(cluster)

Load sample data and test the cluster!

>>> import dask, cudf, dask_cudf
>>> ddf = dask.datasets.timeseries()
>>> gdf = ddf.map_partitions(cudf.from_pandas)
>>> gdf.groupby(‘name’).id.count().compute().head()
Out[34]:
Xavier 99495
Oliver 100251
Charlie 99354
Zelda 99709
Alice 100106
Name: id, dtype: int64

6. Cleanup. Your cluster will continue to run (and incur charges!) until you shut it down. You can either scale the number of nodes down to zero instances, or shut it down altogether. If you are planning to use the cluster again soon, it is probably preferable to reduce the nodes to zero.

Jump to Top

AWS Cluster via Kubernetes

RAPIDS can be deployed on AWS via AWS’s managed Kubernetes service (EKS) using Helm. More details can be found at our helm docs.

1. Install. Install and configure dependencies in your local environment: kubectl, helm, awscli, and eksctl.

2. Public Key. Create a public key if you don’t have one.

3. Create your cluster:

>>> eksctl create cluster \
    --name [CLUSTER_NAME] \
    --version 1.14 \
    --region [REGION] \
    --nodegroup-name gpu-workers \
    --node-type [NODE_INSTANCE] \
    --nodes  [NUM_NODES] \
    --nodes-min 1 \
    --nodes-max [MAX_NODES] \
    --node-volume-size [NODE_SIZE] \
    --ssh-access \
    --ssh-public-key ~/path/to/id_rsa.pub \
    --managed

[CLUSTER_NAME] = Name of the EKS cluster. This will be auto generated if not specified.
[NODE_INSTANCE] = Node instance type to be used. Select one of the instance types supported by RAPIDS Refer to the introduction section for a list of supported instance types.
[NUM_NODES] = Number of nodes to be used.
[MAX_NODES] = Maximum size of the nodes.
[NODE_SIZE] = Size of the nodes.

Update the path to the ssh-public-key to point to the folder and file where your public key is saved.

4. Install GPU addon:

>>> kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

5. Install RAPIDS helm repo:

>>> helm repo add rapidsai https://helm.rapids.ai
>>> helm repo update

6. Install helm chart:

>>> helm install rapidstest rapidsai/rapidsai

7. Accessing your cluster:

>>> kubectl get svc
NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                       AGE
kubernetes           ClusterIP      10.100.0.1       <none>        443/TCP                       14m
rapidsai-jupyter     LoadBalancer   10.100.208.179   1.2.3.4       80:32332/TCP                  3m30s
rapidsai-scheduler   LoadBalancer   10.100.19.121    5.6.7.8       8786:31779/TCP,80:32011/TCP   3m30s

You can now visit the external IP of the rapidsai-jupyter service in your browser!

Jump to Top

AWS Sagemaker

RAPIDS also works with AWS Sagemaker. We’ve written a detailed guide with examples for how to use Sagemaker with RAPIDS, but the simplest version is:

1. Start. Start a Sagemaker hosted Jupyter notebook instance on AWS.

2. Clone. Clone the example repository which includes all required setup and some example data and code.

3. Run. Start running the sagemaker-rapids.ipynb jupyter notebook.

For more details, including on running large-scale HPO jobs on Sagemaker with RAPIDS, check out the detailed guide and examples.

Jump to Top

azure

Microsoft Azure

RAPIDS can be deployed on Microsoft Azure via several methods: Single instance Cluster via Dask Cluster via Kubernetes Azure’s AzureML service

Cloud
Provider
Inst.
Type
Inst.
Name
GPU
Count
GPU
Type
xGPU
RAM
xGPU
Perf.
Azure NDs Series ND6s 1 P40 24 (GB) 11.7 (TFLOPS)
Azure NDs Series ND12s 2 P40 24 (GB) 11.7 (TFLOPS)
Azure NDs Series ND24s 4 P40 24 (GB) 11.7 (TFLOPS)
Azure NDs Series ND24rs 4 P40 24 (GB) 11.7 (TFLOPS)
Azure NCs v2 Series NC6s v2 1 P100 16 (GB) 10.6 (TFLOPS)
Azure NCs v2 Series NC12s v2 2 P100 16 (GB) 10.6 (TFLOPS)
Azure NCs v2 Series NC24s v2 4 P100 16 (GB) 10.6 (TFLOPS)
Azure NCs v2 Series NC24rs v2 4 P100 16 (GB) 10.6 (TFLOPS)
Azure NCs v3 Series NC6s v3 1 V100 16 (GB) 14.1 (TFLOPS)
Azure NCs v3 Series NC12s v3 2 V100 16 (GB) 14.1 (TFLOPS)
Azure NCs v3 Series NC24s v3 4 V100 16 (GB) 14.1 (TFLOPS)
Azure NCs v3 Series NC24rs v3 4 V100 16 (GB) 14.1 (TFLOPS)
Azure NDs v2 Series ND40rs 8 V100 32 (GB) 14.1 (TFLOPS)

Jump to Top

Azure Single Instance (VM)

There are multiple ways you can deploy RAPIDS on a single VM instance, but the easiest is to use the RAPIDS docker image:

1. Initiate VM. Initiate a VM instance using a VM supported by RAPIDS. See the introduction section for a list of supported instance types. It is recommended to use an image that already includes the required NVIDIA drivers, such as this one.

2. Credentials. Using the credentials supplied by Azure, log into the instance via SSH.

3. Docker Permissions. Setup docker user permissions.

4. Install. Install RAPIDS docker image. The docker container can be customized by using the options provided in the Getting Started page of RAPIDS. Example of an image that can be used is provided below:

>>> docker pull rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04
>>> docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04

5. Test RAPIDS. Test it! The RAPIDS docker image will start a Jupyter notebook instance automatically. You can log into it by going to the IP address provided by Azure on port 8888.

Jump to Top

Azure Cluster via Dask

RAPIDS can be deployed on a Dask cluster on Azure ML Compute using dask-cloudprovider.

1. Install. Install Azure tools (azure-cli).

2. Install dask-cloudprovider:

>>> pip install dask-cloudprovider

3. Config. Create your workspace config file -see Azure docs for details.

4. Setup. Setup your Azure ML Workspace using the config file created in the previous step:

>>> from azureml.core import Workspace
>>> ws = Workspace.from_config()

5. Create the AzureMLCluster:

>>> from dask_cloudprovider import AzureMLCluster
>>> cluster = AzureMLCluster(ws)

6. Run Notebook. In a Jupyter notebook, the cluster object will return a widget allowing you to scale up and containing links to the Jupyter Lab session running on the headnode and Dask dashboard, which are forwarded to local ports for you -unless running on a remote Compute Instance.

Jump to Top

Azure Cluster via Kubernetes

RAPIDS can be deployed on a Kubernetes cluster on Azure using Helm. More details can be found at our helm docs.

1. Install. Install and configure dependencies on your local environment: kubectl, helm, and az (azure-cli).

2. Configure. Configure az and create a resource group if you don’t already have one.

>>> az login
>>> az group create --name [RESOURCE_GROUP] --location [REGION]

[RESOURCE_GROUP] = resource group to be created.
[REGION] = the location where the resource group should be created.

3. Create your cluster:

>>> az aks create \
    --resource-group [RESOURCE_GROUP] \
    --name [CLUSTER_NAME] \
    --node-vm-size [VM_SIZE] \
    --node-count [NUM_NODES]

[CLUSTER_NAME] = Name of the managed cluster.
[NUM_NODES] = Number of nodes in the Kubernetes node pool.
[VM_SIZE] = the size of the VM you would like to target. This must include a RAPIDS-compatible GPU. Ex. Standard_NC12

Please refer to the Microsoft Azure CLI documentation for more information.

4. Update your local kubectl config file:

>>> az aks get-credentials --resource-group myResourceGroup --name rapids

5. Install. Install Kubernetes NVIDIA Device Plugin:

>>> helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
>>> helm repo update
>>> helm install \
    --version=0.6.0 \
    --generate-name \
    nvdp/nvidia-device-plugin

6. Install RAPIDS helm repo:

>>> helm repo add rapidsai https://helm.rapids.ai
>>> helm repo update

7. Install helm chart:

>>> helm install rapidstest rapidsai/rapidsai

8. Accessing your cluster:

>>> kubectl get svc
NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                       AGE
kubernetes           ClusterIP      10.100.0.1       <none>        443/TCP                       14m
rapidsai-jupyter     LoadBalancer   10.100.208.179   1.2.3.4       80:32332/TCP                  3m30s
rapidsai-scheduler   LoadBalancer   10.100.19.121    5.6.7.8       8786:31779/TCP,80:32011/TCP   3m30s

You can now visit the external IP of the rapidsai-jupyter service in your browser!

Jump to Top

AzureML Service

RAPIDS can be deployed at scale using Azure Machine Learning Service–and easily scales up to any size needed. We have written a detailed guide with helper scripts to get everything deployed, but the high level procedure is:

1. Create. Create your Azure Resource Group.

2. Workspace. Within the Resource Group, create an Azure Machine Learning service Workspace.

3. Config. Within the Workspace, download the config.json file and verify that subscription_id, resource_group, and workspace_name are set correctly for your environment.

4. Quota. Within your Workspace, check your Usage + Quota to ensure you have enough quota to launch your desired cluster size.

5. Clone. From your local machine, clone the RAPIDS demonstration code and helper scripts.

6. Run Utility. Run the RAPIDS helper utility script to initialize the Azure Machine Learning service Workspace:

>>> ./start_azureml.py \
 --config=[CONFIG_PATH] \
 --vm_size=[VM_SIZE] \
 --node_count=[NUM_NODES]

[CONFIG_PATH] = the path to the config file you downloaded in step three.

7. Start. Open your browser to http://localhost:8888 and get started!

See the guide or GitHub for more details.

Jump to Top

gcp

Google Cloud

RAPIDS can be used in Google Cloud in several different ways: Single instance Cluster using Dask (via Dataproc) Cluster using Kubernetes On CloudAI

Cloud
Provider
Inst.
Type
Inst.
Name
GPU
Count
GPU
Type
xGPU
RAM
xGPU
Perf.
Google Cloud GPU Compute Workload Addon Any Machine Type As desired P4 8 (GB) 5.5 (TFLOPS)
Google Cloud GPU Compute Workload Addon Any Machine Type As desired P100 16 (GB) 10.6 (TFLOPS)
Google Cloud GPU Compute Workload Addon Any Machine Type As desired T4 16 (GB) 8.1 (TFLOPS)
Google Cloud GPU Compute Workload Addon Any Machine Type As desired V100 16 (GB) 14.1 (TFLOPS)
Google Cloud A2 TBD - In beta As desired A100 40 (GB) 19.5 (TFLOPS)

Jump to Top

Google Single Instance

RAPIDS can be deployed on Google Cloud as a single instance:

1. Create. Create a Project in your Google Cloud account.

2. Launch VM. See the introduction section for a list of supported GPUs. We recommend using an image that already includes prerequisites such as drivers and docker, such as the NVIDIA GPU-Optimized Image for Deep Learning, ML & HPC VM image.

3. Drivers. Enter Y (Yes) when asked if you would like to download the latest NVIDIA drivers.

4. Permissions. Setup Docker user permission.

5. Install. Install RAPIDS docker image. The docker container can be customized by using the options provided in the Getting Started page of RAPIDS. Example of an image that can be used is provided below:

>>> docker pull rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04-py3.7
>>> docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:cuda10.2-runtime-ubuntu18.04-py3.7

6. Test RAPIDS. The above command should start your docker container. To test the container, start a python instance and then import any one of the RAPIDS libraries in it.

Jump to Top

Google Cluster via Dask (Dataproc)

RAPIDS can be deployed on Google Cloud Dataproc using Dask. We have helper scripts and detailed instructions to help.

1. Create Dataproc cluster with Dask RAPIDS. Use the gcloud command to create a new cluster with the below initialization action. Because of an Anaconda version conflict, script deployment on older images is slow, we recommend using Dask with Dataproc 2.0+.

>>> export GCS_BUCKET=[BUCKET_NAME]
>>> export CLUSTER_NAME=[CLUSTER_NAME]
>>> export REGION=[REGION]
>>> export DASK_RUNTIME=[DASK_RUNTIME]
>>> gcloud dataproc clusters create $CLUSTER_NAME \
    --region $REGION \
    --image-version preview-ubuntu18 \
    --master-machine-type n1-standard-32 \
    --master-accelerator type=nvidia-tesla-t4,count=2 \
    --worker-machine-type n1-standard-32 \
    --worker-accelerator type=nvidia-tesla-t4,count=2 \
    --optional-components=ANACONDA \
    --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh,gs://goog-dataproc-initialization-actions-${REGION}/dask/dask.sh,gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh \
    --initialization-action-timeout=60m \
    --metadata gpu-driver-provider=NVIDIA,dask-runtime=${DASK_RUNTIME},rapids-runtime=DASK \
    --enable-component-gateway

[BUCKET_NAME] = name of the bucket to use.
[CLUSTER_NAME] = name of the cluster.
[REGION] = name of region where cluster is to be created.
[DASK_RUNTIME] = Dask runtime could be set to either yarn or standalone.

2. Run Dask RAPIDS Workload. Once the cluster has been created, the Dask scheduler listens for workers on port 8786, and its status dashboard is on port 8787 on the Dataproc master node. To connect to the Dask web interface, you will need to create an SSH tunnel as described in the Dataproc web interfaces documentation. You can also connect using the Dask Client Python API from a Jupyter notebook, or from a Python script or interpreter session.

For more, see our detailed instructions and helper scripts.

Jump to Top

Google Cluster via Kubernetes

RAPIDS can be deployed in a Kubernetes cluster on GCP. For more information, see the detailed instructions and helm charts.

1. Install. Install and configure dependencies in your local environment: kubectl, helm, gcloud.

2. Configure cloud:

>>> gcloud init

3. Set your default computer zone:

>>> gcloud config set compute/zone [REGION]

4. Create the cluster:

>>> gcloud container clusters create \
    rapids \
    --machine-type n1-standard-4 \
    --accelerator type=[GPU_TYPE],count=[NUM_GPU] \
    --region [REGION] \
    --node-locations [NODE_REGION] \
    --num-nodes [NUM_NODES] \
    --min-nodes 0 \
    --max-nodes [MAX_NODES] \
    --enable-autoscaling

[GPU_TYPE] = the type of GPU. See the introduction section for a list of supported GPU types. Ex. nvidia-tesla-v100.
[NUM_GPU] = the number of GPUs.
[NODE_REGION] = The node locations to be used in the default regions. Ex. us-west1-b
[NUM_NODES] = number of nodes to be created in each of the cluster’s zones.
[MAX_NODES] = Maximum number of nodes to which the node pool specified by --node-pool (or default node pool if unspecified) can scale.

Example:

>>> gcloud container clusters create \
    rapids \
    --machine-type n1-standard-4 \
    --accelerator type=nvidia-tesla-v100,count=2 \
    --region us-west1 \
    --node-locations us-west1-a,us-west1-b \
    --num-nodes 1 \
    --min-nodes 0 \
    --max-nodes 4 \
    --enable-autoscaling

5. Update local kubectl:

>>> gcloud container clusters get-credentials rapids

6. Install kubectl GPU add on:

>>> kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

7. Install RAPIDS helm repo:

>>> helm repo add rapidsai https://helm.rapids.ai
>>> helm repo update

8. Install the helm chart:

>>> helm install rapidstest rapidsai/rapidsai

9. Access your cluster:

>>> kubectl get svc
NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                       AGE
kubernetes           ClusterIP      10.100.0.1       <none>        443/TCP                       14m
rapidsai-jupyter     LoadBalancer   10.100.208.179   1.2.3.4       80:32332/TCP                  3m30s
rapidsai-scheduler   LoadBalancer   10.100.19.121    5.6.7.8       8786:31779/TCP,80:32011/TCP   3m30s

To run notebooks on jupyter in your browser, visit the external IP of rapidsai-jupyter.

Jump to Top

Google CloudAI

RAPIDS can be deployed on Google’s CloudAI platform. This deployment can range from a simple pre-made notebook (instructions below!) all the way up to a custom training container and HPO job. For more, see our detailed instructions and helper scripts.

1. Login. Log into your GCP console.

2. Select. Select AI-Platform, then Notebooks.

3. Create and Run. Select a “Create new notebook” and select the RAPIDS XGBoost variant (comes with Conda installed):

  • Select ‘install gpu driver for me’
  • Select ‘customize’
  • Pick the CUDA variant you want (10.1, 10.0, etc..)
  • Select a GPU type
  • Select the number of GPUs
  • Launch your notebook service

4. Run Script. Once JupyterLab is running:

  • Open a new terminal
  • Copy the rapids-py37-kernel.sh GCP script into the local environment.
  • Run the script
  • Once completed, you will have a new kernel in your jupyter notebooks called rapids_py37 which will have RAPIDS installed.

For more details, or for other ways to deploy on Google CloudAI, see the detailed instructions and helper scripts.

Jump to Top

TRY RAPIDS in the Cloud