The RAPIDS data science framework is a collection of libraries for executing end-to-end data science pipelines completely in the GPU.
RAPIDS uses optimized NVIDIA CUDA® primitives and high-bandwidth GPU memory to accelerate data preparation and machine learning. The goal of RAPIDS is not only to accelerate the individual parts of the typical data science workflow, but to accelerate the complete end-to-end workflow.
It is designed to have a familiar look and feel to data scientists working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics:
import cudf
df = cudf.read_csv('path/to/file.csv')
for column in df.columns:
print(df[column].mean())
Jump right into a GPU powered RAPIDS notebook, online, with SageMaker Studio Lab (free account required):
In four steps, easily install RAPIDS on a local system or cloud instance with a CUDA enabled GPU for either Conda or Docker and then explore our user guides and examples. Pip packages are here with experimental access!
All provisioned systems need to be RAPIDS capable. Here’s what is required:
GPU: NVIDIA Pascal™ or better with compute capability 6.0+
OS: One of the following OS versions:
Ubuntu 18.04/20.04 or CentOS 7 / Rocky Linux 8 with
gcc/++
9.0+
Windows 11 using WSL2 See separate install guide
RHEL 7/8 support is provided through CentOS 7 / Rocky Linux 8 builds/installs
CUDA & NVIDIA Drivers: One of the following supported versions:
Aside from the system requirements, other considerations for best performance include:
We suggest taking a look at the sample workflow in our Docker container, which illustrates how straightforward a basic XGBoost model training and testing workflow runs with RAPIDS.
For most installations, you will need a Conda or Docker environments installed for RAPIDS. Note, these examples are structured for installing on Ubuntu. Please modify appropriately for CentOS / Rocky Linux. Windows 11 has a separate installation guide.
RAPIDS can use either a minimal conda installation with Miniconda or a full installation of Anaconda. Below is a quick installation guide using miniconda.
1. Download and Run Install Script. Copy the command below to download and run the miniconda install script:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
2. Customize Conda and Run the Install. Use the terminal window to finish installation. Note, we recommend enabling conda-init
.
3. Start Conda. Open a new terminal window, which should now show Conda initialized.
To build RAPIDS from source, check each libraries` readme. For example the cuDF README has details for source environment setup and build instructions. Further links are provided in the selector tool. If additional help is needed reach out on our Slack Channel.
Pip installation of RAPIDS is back! You can try our experimental pip packages here
RAPIDS requires both Docker CE v19.03+ and nvidia-container-toolkit installed.
1. Download and Install. Copy command below to download and install the latest Docker CE Edition:
curl https://get.docker.com | sh
2. Install Latest NVIDIA Docker. For example, this is the Ubuntu Example:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
3. Start Docker. In new terminal window run:
sudo service docker stop
sudo service docker start
4a. Test NVIDIA Docker:
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
4b. Legacy Docker Users. Docker CE v18 & nvidia-docker2 users will need to replace the following for compatibility:
'docker run --gpus all' with 'docker run --runtime=nvidia'
RAPIDS is available in conda packages, docker images, and from source builds. Use the tool below to select your preferred method, packages, and environment to install RAPIDS. Certain combinations may not be possible and are dimmed automatically. Be sure you’ve met the System Requirements above and see the Next Steps below.
Once installation has been successful, explore the capabilities of RAPIDS with the provided notebooks, tutorials, and guides below.
1. Install Jupyter Lab. If it or Jupyter Notebook is not already installed.
2. Get Notebooks. See links to the RAPIDS Notebooks and Community Notebooks below.
3. Run RAPIDS. Use Python directly or start JupyterLab as below:
jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token='**your token**'
4. Check out the RAPIDS tutorials and workflows examples.
5. Explore. See our integrations or install other favorite Data Science or Machine Learning libraries.
Go to RAPIDS Notebooks or clone directly:
git clone https://github.com/rapidsai/notebooks.git
git submodule update --init --remote --no-single-branch --depth 1
Go To RAPIDS Community Notebooks or clone directly:
git clone https://github.com/rapidsai-community/notebooks-contrib.git
Go To Cloud ML Notebooks or clone directly:
git clone https://github.com/rapidsai/cloud-ml-examples.git
To start the container in an MNMG environment:
docker run -t -d --gpus all --shm-size=1g --ulimit memlock=-1 -v $PWD:/ws <container label>
The standard docker command may be sufficient, but the additional arguments ensures more stability. See the NCCL docs and UCX docs for more details on MNMG usage.
Either the standard single GPU or the modified MNMG Docker command above should auto-run a Jupyter Lab Notebook server. If it does not, or a restart is needed, run the following command within the Docker container to launch the notebook server:
bash /rapids/utils/start-jupyter.sh
If, for whatever reason, you need to shut down the Jupyter Lab server, use:
bash /rapids/utils/stop-jupyter.sh
NOTE: Defaults will run JupyterLab on your host machine at port: 8888
.
RAPIDS demo notebooks can be found in the notebooks directory:
/rapids/notebooks/cuml
(Machine Learning Algorithms)
/rapids/notebooks/cugraph
(Graph Analytics)
/rapids/notebooks/cuspatial
(Spatial Analytics)
/rapids/notebooks/cusignal
(Signal Analytics)
/rapids/notebooks/clx
(Cyber Security Log Analytics)
/rapids/notebooks/xgboost
(XGBoost)
You can get more RAPIDS tutorials and workflow examples by git cloning
the RAPIDS Community Notebooks.
See the RAPIDS Container README for more information about using custom datasets. Docker Hub and NVIDIA GPU Cloud host RAPIDS containers with full list of available tags.