import cudf gdf = cudf.read_csv(‘path/to/file.csv’) for column in gdf.columns: print(column.mean())
RAPIDS uses optimized NVIDIA® CUDA® primitives and high-bandwidth GPU memory to accelerate data preparation and machine learning. The goal of RAPIDS is not only to accelerate the individual parts of the typical data science workflow, but to accelerate the complete end-to-end workflow.
We suggest that you take a look at the sample workflow in our Docker container (described below), which illustrates just how straightforward a basic XGBoost model training and testing looks in RAPIDS.
$ conda install -c numba -c conda-forge -c nvidia -c rapidsai -c defaults cudf=0.4.0
For instructions on how to build a development conda environment, see the cuDF README for more information.
$ docker pull rapidsai/rapidsai:cuda9.2_ubuntu16.04 $ docker run --runtime=nvidia \ --rm -it \ -p 8888:8888 \ -p 8787:8787 \ -p 8786:8786 \ rapidsai/rapidsai:cuda9.2_ubuntu16.04 jupyter@container:/rapids/notebooks/$ source activate rapids (rapids) jupyter@container:/rapids/notebooks/$ bash utils/start-jupyter.sh
NOTE: This will run JupyterLab on port 8888 on your host machine.
Notebooks can be found in two directories within the container:
/rapids/notebooks/cuml- cuML demo notebooks
cd /rapids/notebooks/cuml/data && gunzip mortgage.npy.gz
/rapids/notebooks/mortgage- cuDF, Dask, XGBoost demo notebook
E2E.ipynbfor more details
See the RAPIDS Demo Container page for more information about using custom datasets.
Learn how to setup a mult-node cuDF and XGBoost data preparation and distributed training environment by following the Fannie Mae mortgage example notebook and scripts.