Deploy XGBoost on GPUs


Seamless Acceleration at Scale

XGBoost is a well-known gradient boosted decision trees (GBDT) machine learning package used to tackle regression, classification, and ranking problems. It’s written in C++ and NVIDIA CUDA® with wrappers for Python, R, Java, Julia, and several other popular languages. XGBoost now includes seamless, drop-in GPU acceleration, which significantly speeds up model training and improves accuracy for better predictions.

The RAPIDS team works closely with the Distributed Machine Learning Common (DMLC) XGBoost organization to upstream code and ensure that all components of the GPU-accelerated analytics ecosystem work smoothly together.

Getting Started

The project is well supported and documented by many tutorials, quick-start guides, and papers.

Try It Now in CoLab

Try out XGBoost now, with the basics of cuDF and other RAPIDS libraries, in our online XGBoost Colaboratory notebook.

Notebook Examples

To see how XGBoost integrates with cuDF, Dask, and the entire RAPIDS ecosystem, check out these RAPIDS notebooks which walk through classification and regression examples.

See the Latest Docs

Access current installation instructions, guides, FAQs, and more in the latest documentation.

Read the original XGBoost paper

Take a deep dive into XGBoost’s algorithms with Tianqi Chen and Carlos Guestrin in their XGBoost Paper.

Dive into the XGBoost Algorithm

Learn about the XGBoost algorithms used on GPUs in these blogs from Rory Mitchell, a RAPIDS team member and core XGBoost contributor.
Gradient Boosting, Decision Trees and XGBoost with CUDA
Updates to the XGBoost GPU algorithms
Bias Variance Decompositions using XGBoost

Maximize XGBoost Performance on GPUs


Benchmarks Comparing CPUs and GPUs

XGBoost has integrated support to run across multiple GPUs, which can deliver even more significant performance improvements. For the 113-million-row airline dataset used in the gradient boosting machines (GBM) benchmarks suite, eight NVIDIA® Tesla® V100 GPUs completed training in 42.6 seconds, compared to over 39 minutes on eight CPUs—a 54.9X speedup.

Measure Your Performance

You can run GBM benchmarking scripts from this GitHub repository to measure performance on your own system and compare it to various GBM/GBDT implementations.

Deploying Distributed XGBoost at Scale

It’s easy to work across multiple GPUs and multiple nodes with distributed Dask and Apache Spark.

Scale Out with Dask

To take advantage of multiple GPU-accelerated nodes, you can use XGBoost’s native Dask integration. This distributes data, builds DMatrix objects, and sets up cross-node communication to run XGBoost training on a cluster. The official XGBoost repository includes simple examples with distributed Dask and also more detailed API documentation.

Scale Out with Spark

The RAPIDS team is working with the community to build a distributed, open source XGBoost4J-Spark + RAPIDS package. More details coming soon.

Use a Single Machine

With Dask-CUDA, running across multiple GPUs on a single machine is easy. Two lines of code can spin up a LocalCUDACluster and parallelize ETL as well as training. See the Dask-CUDA docs for more details.

NOTE: Older versions of XGBoost supported a thread-based “single-node, multi-GPU” pattern with the n_gpus parameters. This parameter is now deprecated, and we encourage all users to shift to Dask or Spark for more scalable and maintainable multi-GPU training.

Download the Software

The RAPIDS team is developing GPU enhancements to open-source XGBoost, working closely with the DCML/XGBoost organization to improve the larger ecosystem. Since RAPIDS is iterating ahead of upstream XGBoost releases, some enhancements will be available earlier from the RAPIDS branch, or from RAPIDS-provided installers.

Installation Prerequisites for RAPIDS + XGBoost


GPU: NVIDIA Pascal™ or better with compute capability 6.0+

CUDA & NVIDIA Drivers: One of the following supported versions:

      9.2 & v396.37+   10.0 & v410.48+   10.1.2 & v418.87+

The latest RAPIDS package, which can be downloaded and installed one of these ways:

Conda Install

Install using conda (the latest RAPIDS release). The RAPIDS conda channel includes an XGBoost package built with CUDA 9.2/10.0/10.1 and Python 3.6/3.7 versions. You can install it with:

> conda install -c rapidsai -c nvidia -c conda-forge \
        rapids-xgboost cudatoolkit=10.0

Replacing 10.0 in cudatoolkit=10.0 will install the desired CUDA version. If you wish to override the python version installed, add python=3.6 or python=3.7 to the install command.

Docker Container

Install using Docker (the latest RAPIDS release). RAPIDS provides Docker images that include a recent version of GPU-accelerated XGBoost. Just follow the Docker installation instructions on the Getting Started page and you can start using XGBoost right away from a notebook or the command line.

PIP Install or Other Methods

Install using pip or other methods (the default upstream version). The default open-source XGBoost packages already include GPU support. Follow the XGBoost instructions to install from source or use:

> pip install xgboost

NOTE: The pip packages and source installation methods currently install XGBoost version 0.90, which will not include some of the more recent contributions, such as cuDF integration. Those contributions have been integrated to the master branch of XGBoost and will appear in pip packages starting in version 1.0.

Configuring Your Code for GPUs

With only a few minor code changes, you’ll be training models on a supercharged XGBoost.

The Best Place to Start

If you haven’t developed your model yet, the best place to start is XGBoost’s Getting Started documentation. If you have an existing code to train models on CPU, converting it to run on GPUs is simple.

More Reference Materials

Similar configuration options apply to R, Java, and Julia wrappers. The XGBoost Documentation and XGBoost GPU Support pages contain much more information on configuring and running models and on GPU-specific options and algorithms.

Training a Model

When training a model with XGBoost, you have to specify a dictionary of training parameters. If you set the tree_method parameter to gpu_hist, XGBoost will run on your GPU.

For example, if your old code in Python looks like:

> params = {'max_depth': 3, 'learning_rate': 0.1}
> dtrain = xgb.DMatrix(X, y)
> xgb.train(params, dtrain)

Change it to:

> params = {‘tree_method’: ‘gpu_hist’, 'max_depth': 3, 
'learning_rate': 0.1}
> dtrain = xgb.DMatrix(X, y)
> xgb.train(params, dtrain)

Bridging DataFrames with XGBoost DMatrix

The RAPIDS team is contributing to the XGBoost project and integrating new features to better optimize GPU performance.

XGBoost DMatrix

The RAPIDS project is developing a seamless bridge between cuDF DataFrames, the primary data structure in RAPIDS, and DMatrix, XGBoost’s data structure. You can get the latest updates in this pull request on GitHub. While this patch is being integrated upstream, early adopters using the RAPIDS conda packages or Docker images can already build DMatrix objects directly from cuDF DataFrames.

How to Create a DMatrix

To create a DMatrix from a cuDF DataFrame, just pass the data frames to the constructor:

> import xgboost as xgb
> train_X_cudf = cudf.DataFrame(...)
> train_y_cudf = cudf.Series(...)
> dmatrix = xgb.DMatrix(train_X_cudf, label=train_y_cudf)

The package will automatically convert from cuDF’s format to XGBoost’s DMatrix format, keeping the data on GPU memory.

Get Started with Dask