Deploy XGBoost on GPUs

XGBOOST.IO

Seamless Acceleration at Scale

XGBoost is a well-known gradient boosted decision trees (GBDT) machine learning package used to tackle regression, classification, and ranking problems. It’s written in C++ and NVIDIA CUDA® with wrappers for Python, R, Java, Julia, and several other popular languages. XGBoost now includes seamless, drop-in GPU acceleration, which significantly speeds up model training and improves accuracy for better predictions.

The RAPIDS team works closely with the Distributed Machine Learning Common (DMLC) XGBoost organization to upstream code and ensure that all components of the GPU-accelerated analytics ecosystem work smoothly together.

Getting Started

The project is well supported and documented by many tutorials, quick-start guides, and papers.

Try Now Online

Try out XGBoost now, with cuDF and other RAPIDS libraries.
Try with Colaboratory

Notebook Examples

To see how XGBoost integrates with cuDF, Dask, and the entire RAPIDS ecosystem, check out these RAPIDS notebooks which walk through classification and regression examples.

See the Latest Docs

Access current installation instructions, guides, FAQs, and more in the latest documentation.

Read the original XGBoost paper

Take a deep dive into XGBoost’s algorithms with Tianqi Chen and Carlos Guestrin in their XGBoost Paper.

Dive into the XGBoost Algorithm

Learn about the XGBoost algorithms used on GPUs in these blogs from Rory Mitchell, a RAPIDS team member and core XGBoost contributor.
Gradient Boosting, Decision Trees and XGBoost with CUDA
Updates to the XGBoost GPU algorithms
Bias Variance Decompositions using XGBoost

Maximize XGBoost Performance on GPUs

xgboost

Benchmarks Comparing CPUs and GPUs

XGBoost has integrated support to run across multiple GPUs, which can deliver even more significant performance improvements. For the 113-million-row airline dataset used in the gradient boosting machines (GBM) benchmarks suite, eight NVIDIA® Tesla® V100 GPUs completed training in 42.6 seconds, compared to over 39 minutes on eight CPUs—a 54.9X speedup.

Measure Your Performance

You can run GBM benchmarking scripts from this GitHub repository to measure performance on your own system and compare it to various GBM/GBDT implementations.

Deploying Distributed XGBoost at Scale

It’s easy to work across multiple GPUs and multiple nodes with distributed Dask and Apache Spark.

Scale Out with Dask

To take advantage of multiple GPU-accelerated nodes, you can use XGBoost’s native Dask integration. This distributes data, builds DMatrix objects, and sets up cross-node communication to run XGBoost training on a cluster. This blog post covers the XGBoost Dask API in more detail, including usage and performance. The official XGBoost repository includes simple examples with distributed Dask and also more detailed API documentation.

Scale Out with Spark

XGBoost supports a Java API, called XGBoost4J. As of release 1.2, the XGBoost4J JARs include GPU support in the pre-built xgboost4j-spark-gpu JARs.

The team is continuing to work on deeper integration into the Spark ecosystem - learn more in this devblog post.

Use a Single Machine

With Dask-CUDA, running across multiple GPUs on a single machine is easy. Two lines of code can spin up a LocalCUDACluster and parallelize ETL as well as training. See the Dask-CUDA docs for more details.

NOTE: Older versions of XGBoost supported a thread-based “single-node, multi-GPU” pattern with the n_gpus parameters. This parameter is now deprecated, and we encourage all users to shift to Dask or Spark for more scalable and maintainable multi-GPU training.

Download the Software

The RAPIDS team is developing GPU enhancements to open-source XGBoost, working closely with the DCML/XGBoost organization to improve the larger ecosystem. Since RAPIDS is iterating ahead of upstream XGBoost releases, some enhancements will be available earlier from the RAPIDS branch, or from RAPIDS-provided installers.

Prerequisites for RAPIDS + XGBoost

For the latest prerequisites and supported versions, check out our Getting Started page.

Conda Install

The default RAPIDS conda metapackage includes a recent snapshot of XGBoost by default. This package is released on the same schedule as other RAPIDS packages and tested for full compatibility. You can find the latest install options with our RAPIDS Release Selector.

Docker Container

Install using Docker (the latest RAPIDS release). RAPIDS provides Docker images that include a recent version of GPU-accelerated XGBoost. Just follow the Docker installation instructions with our RAPIDS Release Selector page and you can start using XGBoost right away from a notebook or the command line.

PIP Install or Other Methods

Install using pip or other methods (the default upstream version). The default open-source XGBoost packages already include GPU support, Dask integration, and the ability to load data from a cuDF DataFrame. Follow the XGBoost instructions to install from source or use:

pip install xgboost

NOTE: Full RAPIDS integration first appeared in release 1.0 of XGBoost. Older pip packages will not include cuDF support.

Configuring Your Code for GPUs

With only a few minor code changes, you’ll be training models on a supercharged XGBoost.

The Best Place to Start

If you haven’t developed your model yet, the best place to start is XGBoost’s Getting Started documentation. If you have an existing code to train models on CPU, converting it to run on GPUs is simple.

More Reference Materials

Similar configuration options apply to R, Java, and Julia wrappers. The XGBoost Documentation and XGBoost GPU Support pages contain much more information on configuring and running models and on GPU-specific options and algorithms.

Training a Model

When training a model with XGBoost, you have to specify a dictionary of training parameters. If you set the tree_method parameter to gpu_hist, XGBoost will run on your GPU.

For example, if your old code in Python looks like:

params = {'max_depth': 3, 'learning_rate': 0.1}
dtrain = xgb.DMatrix(X, y)
xgb.train(params, dtrain)

Change it to:

params = {‘tree_method’: ‘gpu_hist’, 'max_depth': 3, 
'learning_rate': 0.1}
dtrain = xgb.DMatrix(X, y)
xgb.train(params, dtrain)

Bridging DataFrames with XGBoost DMatrix

The RAPIDS team is contributing to the XGBoost project and integrating new features to better optimize GPU performance.

XGBoost DMatrix

The RAPIDS project has developed a seamless bridge between cuDF DataFrames, the primary data structure in RAPIDS, and DMatrix, XGBoost’s data structure. The DMatrix will be built from the GPU dataframe with no need to copy data through host memory. Starting in XGBoost 1.0, GPU data from cuPy and any other GPU array library with support for the __cuda_array_interface__ API can also be used directly to build a DMatrix.

How to Create a DMatrix

To create a DMatrix from a cuDF DataFrame, just pass the data frames to the constructor:

import xgboost as xgb
train_X_cudf = cudf.DataFrame(...)
train_y_cudf = cudf.Series(...)
dmatrix = xgb.DMatrix(train_X_cudf, label=train_y_cudf)

The package will automatically convert from cuDF’s format to XGBoost’s DMatrix format, keeping the data on GPU memory.

Get Started with XGBoost