Integration With Other ML Algorithms
For other machine learning work on GPU, the cuML library provides a access to the RAPIDS cuML package with Dask. RAPIDS cuML implements popular machine learning algorithms, including clustering, dimensionality reduction, and regression approaches, with high performance GPU-based implementations, offering speedups of up to 100x over CPU-based approaches. cuML replicates the scikit-learn API, so it integrates well with projects like Dask that include scikit-learn support. Currently, dask-cuml supports distributed clustering and regression algorithms, with new algorithms are being added over time.
The RAPIDS Notebooks Extended repository includes several examples with end-to-end examples using Dask for distributed, GPU-accelerated computation. Here’s a few from the collection to get started with.
The introduction to Dask shows how to get started with Dask using basic Python primitives like integers and strings. Go to notebook
Introduction to XGBoost with RAPIDS shows the acceleration one can gain by using GPUs with XGBoost in RAPIDS. Go to notebook
The Linear Regression with Dask+cuML shows a simple example of how to get started with distributed machine learning. Go to notebook
The NYC Taxi End-to-End notebook uses trip data to predict New York City taxi fares (a regression problem). Go to notebook