import cudf gdf = cudf.read_csv('path/to/file.csv') for column in gdf.columns: print(gdf[column].mean())
RAPIDS uses optimized NVIDIA® CUDA® primitives and high-bandwidth GPU memory to accelerate data preparation and machine learning. The goal of RAPIDS is not only to accelerate the individual parts of the typical data science workflow, but to accelerate the complete end-to-end workflow.
We suggest that you take a look at the sample workflow in our Docker container (described below), which illustrates just how straightforward a basic XGBoost model training and testing looks in RAPIDS.
Run the following command within the Docker container started from the command above to launch the notebook server:
(rapids) root@container:/rapids/notebooks# bash utils/start-jupyter.sh
NOTE: This will run JupyterLab on your host machine at port 8888.
Notebooks can be found in two directories within the container:
/rapids/notebooks/cuml- cuML demo notebooks
/rapids/notebooks/mortgage- cuDF, Dask, XGBoost demo notebook
E2E.ipynbfor more details
By default, Ubuntu 16.04’s
python3 package is Python 3.5, so you need to install Python 3.6 or 3.7 with the following steps:
For Python 3.6
apt-get install software-properties-common python-software-properties add-apt-repository ppa:deadsnakes/ppa apt update && apt install python3.6-dev
For Python 3.7
apt-get install software-properties-common python-software-properties add-apt-repository ppa:deadsnakes/ppa apt update && apt install python3.7-dev
Learn how to setup a mult-node cuDF and XGBoost data preparation and distributed training environment by following the mortgage data example notebook and scripts.