Learn how to use RAPIDS

DOWNLOAD CHEAT SHEET

Introduction to RAPIDS

The RAPIDS data science framework includes a collection of libraries for executing end-to-end data science pipelines completely in the GPU. It is designed to have a familiar look and feel to data scientists working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics.
import cudf
gdf = cudf.read_csv('path/to/file.csv')
for column in gdf.columns:
    print(gdf[column].mean())

RAPIDS uses optimized NVIDIA® CUDA® primitives and high-bandwidth GPU memory to accelerate data preparation and machine learning. The goal of RAPIDS is not only to accelerate the individual parts of the typical data science workflow, but to accelerate the complete end-to-end workflow.

We suggest that you take a look at the sample workflow in our Docker container (described below), which illustrates just how straightforward a basic XGBoost model training and testing looks in RAPIDS.

Get RAPIDS

RAPIDS is available as conda or pip packages, docker images, and from source builds. Use the tool below to select your preferred method, packages, and environment to install RAPIDS. Certain combinations may not be possible and are dimmed automatically. Be sure to review the prerequisites section for more details about requirements to use RAPIDS.
 
Method
Release
Packages
Linux
Python
CUDA
Command
    Preferred  
  Beta  
Conda
Docker
Pip
Source
Stable (0.6)
Nightly (0.7dev)
cuDF
cuML
cuDF & cuML
Ubuntu 16.04
Ubuntu 18.04
CentOS 7
Python 3.6
Python 3.7
CUDA 9.2
CUDA 10.0
# Javascript is needed for this tool to run, please make sure it is enabled

COPY COMMAND MORE DETAILS

Using RAPIDS

Learn how to use RAPIDS with the method of your choice.


Conda Install

You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

For instructions on how to build a development conda environment, see the cuDF README for more information. Also refer to the cuML README for conda install instructions for cuML.


Docker Container

Run the following command within the Docker container started from the command above to launch the notebook server:

(rapids) root@container:/rapids/notebooks# bash utils/start-jupyter.sh

NOTE: This will run JupyterLab on your host machine at port 8888.

Use JupyterLab to Explore the Notebooks

Notebooks can be found in two directories within the container:

  • /rapids/notebooks/cuml - cuML demo notebooks
    • These notebooks have data pre-loaded in the container image and will be decompressed by the notebooks
  • /rapids/notebooks/mortgage - cuDF, Dask, XGBoost demo notebook
    • This notebook requires download of Mortgage Data, see notebook E2E.ipynb for more details

Advanced Usage

See the RAPIDS Container README page for more information about using custom datasets. Docker Hub and NVIDIA GPU Cloud host RAPIDS containers with full list of available tags.


Pip Install

NOTE: Ubuntu 16.04’s python3 package is Python 3.5 by default. Follow the python install instructions to upgrade. Refer to the cuDF README or cuML README for pip install instructions.


Build From Source

Checkout the cuDF README or cuML README for from source build instructions.

More Information


Prerequisites


Ubuntu 16.04 Python Install

By default, Ubuntu 16.04’s python3 package is Python 3.5, so you need to install Python 3.6 or 3.7 with the following steps:

For Python 3.6

apt-get install software-properties-common python-software-properties
add-apt-repository ppa:deadsnakes/ppa
apt update && apt install python3.6-dev

For Python 3.7

apt-get install software-properties-common python-software-properties
add-apt-repository ppa:deadsnakes/ppa
apt update && apt install python3.7-dev

Documentation

Check out the cuDF, cuML, and XGBoost API docs.

Learn how to setup a mult-node cuDF and XGBoost data preparation and distributed training environment by following the mortgage data example notebook and scripts.