Get RAPIDS Now

SELECT RELEASE

Getting Started

The RAPIDS data science framework is a collection of libraries for executing end-to-end data science pipelines completely in the GPU.

GPU Accelerated Data Science

RAPIDS uses optimized NVIDIA CUDA® primitives and high-bandwidth GPU memory to accelerate data preparation and machine learning. The goal of RAPIDS is not only to accelerate the individual parts of the typical data science workflow, but to accelerate the complete end-to-end workflow.

It is designed to have a familiar look and feel to data scientists working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics:

import cudf
df = cudf.read_csv('path/to/file.csv')
for column in df.columns:
    print(df[column].mean())

Test Drive RAPIDS Now

Jump right into a GPU powered RAPIDS notebook, online, with either SageMaker Studio Lab or Colab (currently only supports RAPIDS v21.12):

Studio Lab

CoLab

Installation Overview

In four steps, easily install RAPIDS on a local system or cloud instance with a CUDA enabled GPU for either Conda or Docker and then explore our user guides and examples.

Step 1: Provision A System

  • Check system requirements
  • Choose a cloud or local system

Step 2: Install Environment

  • Choose to use Conda or Docker
  • Choose to Build from source

Step 3: Install RAPIDS

  • Select and install RAPIDS libraries

Step 4: Learn More

  • Check out examples and user guides

Step 1: Provision A System

System Requirements

All provisioned systems need to be RAPIDS capable. Here’s what is required:

GPU: NVIDIA Pascal™ or better with compute capability 6.0+ More details

OS: Ubuntu 18.04/20.04 or CentOS 7/8 with gcc/++ 9.0+

CUDA & NVIDIA Drivers: One of the following supported versions:

RAPIDS Cloud Systems

Learn how to deploy RAPIDS on
Cloud Service Providers

AWS

Azure ML

GCP

Paperspace

RAPIDS Local Systems

Aside from the system requirements, other considerations for best performance include:

  • SSD drive (NVMe preferred)
  • Approximately 2:1 ratio of host RAM to total GPU Memory (especially useful for Dask)
  • NVLink if with 2 or more GPUs

We suggest taking a look at the sample workflow in our Docker container, which illustrates how straightforward a basic XGBoost model training and testing workflow runs with RAPIDS.

Step 2: Install Environment

For most installations, you will need a Conda or Docker environments installed for RAPIDS. Note, these examples are for Ubuntu. Please modify appropriately for CentOS or RHEL.

Conda

RAPIDS can use either a minimal conda installation with Miniconda or a full installation of Anaconda. Below is a quick installation guide using miniconda.

1. Download and Run Install Script. Copy the command below to download and run the miniconda install script:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

2. Customize Conda and Run the Install. Use the terminal window to finish installation. Note, we recommend enabling conda-init.

3. Start Conda. Open a new terminal window, which should now show Conda initialized.

Build From Source

To build RAPIDS from source, check each libraries` readme. For example the cuDF README has details for source environment setup and build instructions. Further links are provided in the selector tool. If additional help is needed reach out on our Slack Channel.

Where is PIP?

Refer to this blog post for details on why PIP is not currently supported. PIP may be supported in future releases.

Docker

RAPIDS requires both Docker CE v19.03+ and nvidia-container-toolkit installed.

1. Download and Install. Copy command below to download and install the latest Docker CE Edition:

curl https://get.docker.com | sh

2. Install Latest NVIDIA Docker. For example, this is the Ubuntu Example:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2

3. Start Docker. In new terminal window run:

sudo service docker stop
sudo service docker start

4a. Test NVIDIA Docker:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

4b. Legacy Docker Users. Docker CE v18 & nvidia-docker2 users will need to replace the following for compatibility:

'docker run --gpus all' with 'docker run --runtime=nvidia'

Step 3: Install RAPIDS

RAPIDS is available in conda packages, docker images, and from source builds. Use the tool below to select your preferred method, packages, and environment to install RAPIDS. Certain combinations may not be possible and are dimmed automatically. Be sure you’ve met the required Prerequisites above and see the Next Steps below.

Release Selector

Release
Arch
CUDA
Python
Method
Packages
Additional Packages
Image OS
Image Type
Image Options
Command

Step 4: Learn More

Once installation has been successful, explore the capabilities of RAPIDS with the provided notebooks, tutorials, and guides below.

RAPIDS on Conda

Get Example Notebooks

1. Install Jupyter Lab. If it or Jupyter Notebook is not already installed.

2. Get Notebooks. See links to the RAPIDS Notebooks and Community Notebooks below.

3. Run RAPIDS. Use Python directly or start JupyterLab as below:

jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token='**your token**'

4. Check out the RAPIDS tutorials and workflows examples.

5. Explore. See our integrations or install other favorite Data Science or Machine Learning libraries.

RAPIDS User Guide Repositories

Go to RAPIDS Notebooks or clone directly:

git clone https://github.com/rapidsai/notebooks.git
git submodule update --init --remote --no-single-branch --depth 1

Go To RAPIDS Community Notebooks or clone directly:

git clone https://github.com/rapidsai-community/notebooks-contrib.git

Go To Cloud ML Notebooks or clone directly:

git clone https://github.com/rapidsai/cloud-ml-examples.git

RAPIDS on Docker

Running Multi-Node/
Multi-GPU (MNMG) Environment

To start the container in an MNMG environment:

docker run -t -d --gpus all --shm-size=1g --ulimit memlock=-1 -v $PWD:/ws <container label>

The standard docker command may be sufficient, but the additional arguments ensures more stability. See the NCCL docs and UCX docs for more details on MNMG usage.

Start / Stop Jupyter Lab Notebooks

Either the standard single GPU or the modified MNMG Docker command above should auto-run a Jupyter Lab Notebook server. If it does not, or a restart is needed, run the following command within the Docker container to launch the notebook server:

bash /rapids/utils/start-jupyter.sh

If, for whatever reason, you need to shut down the Jupyter Lab server, use:

bash /rapids/utils/stop-jupyter.sh

NOTE: Defaults will run JupyterLab on your host machine at port: 8888.

Explore RAPIDS Demo Notebooks

RAPIDS demo notebooks can be found in the notebooks directory:

/rapids/notebooks/cuml (Machine Learning Algorithms)

/rapids/notebooks/cugraph (Graph Analytics)

/rapids/notebooks/cuspatial (Spatial Analytics)

/rapids/notebooks/cusignal (Signal Analytics)

/rapids/notebooks/clx (Cyber Security Log Analytics)

/rapids/notebooks/xgboost (XGBoost)

You can get more RAPIDS tutorials and workflow examples by git cloning the RAPIDS Community Notebooks.

Advanced Usage

See the RAPIDS Container README for more information about using custom datasets. Docker Hub and NVIDIA GPU Cloud host RAPIDS containers with full list of available tags.