Open GPU Data Science

GET STARTED

GPU DATA SCIENCE

Accelerated Data Science

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. Learn more

Scale Out on GPUS

Seamlessly scale from GPU workstations to multi-GPU servers and multi-node clusters with Dask. Learn more about Dask

Python Integration

Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.

Top Model Accuracy

Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.

Reduced Training Time

Drastically improve your productivity with more interactive data science.

Open Source

RAPIDS is an open source project. Supported by NVIDIA, it also relies on numba, apache arrow, and many more open source projects . Learn more

Getting Started

The RAPIDS data science framework is designed to have a familiar look and feel to data scientist working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics:

import cudf

gdf = cudf.read_csv('path/to/file.csv')
for column in gdf.columns:
    print(gdf[column].mean())

Find more details on our get started section

Try Now In Colaboratory

Jump right into a GPU powered RAPIDS notebook with Colabratory for free. Go to example notebook

10 Minutes to cuDF

Modeled after 10 Minutes to Pandas, this is a short introduction to cuDF that is geared mainly for new users. Go to guide

10 Minutes to Dask-XGBoost

A short introduction to XGBoost with a distributed CUDA DataFrame via Dask-cuDF. Go to guide

GDF Cheat Sheet

A handy PDF reference guide for handling GPU Data Frames (GDF) with cuDF. Download PDF

Example Notebooks

A Github repository with examples of XGBoost, cuML demos, cuGraph demos, and more. Go to repo

RAPIDS News

RAPIDS Community

RAPIDS is committed to being open sourced. We strive for a major release every 6 weeks (give or take). Below is a generalized release schedule. Learn more on our Road To 1.0 post

Release Schedule

Release-Schedule v0.6 v0.7 v0.8 MAR 2019 MAY 2019 JUN 2019 LEGACY STABLE NIGHTLY

RAPIDS APIS and Libraries

RAPIDS is open source licensed under Apache 2.0, spanning multiple projects that range from GPU dataframes to GPU accelerated ML algorithms. Its also provides native array_interface support, allowing Apache Arrow data to be pushed to deep learning frameworks. Learn more

Contributing

Whether you are new to RAPIDS, looking to help, or are part of the team, learn about our contributing guidelines on our contributing page. Go to Docs

cuDF API

GitHub / Docs / Change Log

cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data all in a pandas-like API familiar to data scientists.

cuML API

GitHub / Docs / Change Log

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that are compatible with other RAPIDS projects, all in a scikit-learn-like API familiar to data scientists.

cuGraph API

GitHub / Docs / Change Log

cuGraph is a collection of graph analytics that process data in GDF. cuGraph aims at providing a NetworkX-like API familiar to data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

nvStrings API

GitHub / Docs / Change Log

nvStrings, the Python bindings for cuStrings, provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

libcudf LIB

GitHub / Docs / Change Log

libcudf is a C/C++ CUDA library for implementing standard dataframe operations. It is part of the cuDF repository.

RMM LIB

GitHub / Docs / Change Log

RAPIDS Memory Manager (RMM) is a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous.

Contributors

anaconda
blazingdb
gunrock
nvidia
quantsight
walmart labs

Adopters

chainer
databricks
graphistry
h20ai
ibm
iguzaio
inria
mapr
omnisci
pytorch
uber
ursa

Open Source

Apache Arrow
BlazingSQL
Dask
GoAi
Numba
scikitlearn
XGboost