The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs.
Learn more
Seamlessly scale from GPU workstations to multi-GPU servers and multi-node clusters with Dask.
Learn more about Dask
Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.
Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.
Drastically improve your productivity with more interactive data science.
Learn more about XGBoost
RAPIDS is an open source project. Supported by NVIDIA, it also relies on numba, apache arrow, and many more open source projects.
Learn more
The RAPIDS data science framework is designed to have a familiar look and feel to data scientist working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics:
import cudf
gdf = cudf.read_csv('path/to/file.csv')
for column in gdf.columns:
print(gdf[column].mean())
Find more details on our get started section
Jump right into a GPU powered RAPIDS notebook with Colabratory for free. Go to example notebook
Modeled after 10 Minutes to Pandas, this is a short introduction to cuDF that is geared mainly for new users.
Go to guide
A short introduction to XGBoost with a distributed CUDA DataFrame via Dask-cuDF.
Go to guide
A Github repository with our introductory examples of XGBoost, cuML demos, cuGraph demos, and more.
Go to repo
A second Github repository with our extended collection of notebook examples.
Go to repo
RAPIDS is committed to open source. We strive for a 6 week release schedule, below is a generalized release schedule. Learn more on our Road To 1.0 post
RAPIDS is open source licensed under Apache 2.0, spanning multiple projects that range from GPU dataframes to GPU accelerated ML algorithms. Its also provides native array_interface support, allowing Apache Arrow data to be pushed to deep learning frameworks.
Learn more
Whether you are new to RAPIDS, looking to help, or are part of the team, learn about our contributing guidelines on our contributing page.
Go to Docs
GitHub / Docs / Change Log
cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data all in a pandas-like API familiar to data scientists.
GitHub / Docs / Change Log
libcudf is a C/C++ CUDA library for implementing standard dataframe operations. It is part of the cuDF repository.
GitHub / Docs / Change Log
cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that are compatible with other RAPIDS projects, all in a scikit-learn-like API familiar to data scientists.
GitHub / Docs / Change Log
cuGraph is a GPU accelerated graph analytics library, with functionality like NetworkX, which is seamlessly integrated into the RAPIDS data science platform.
GitHub / Docs / Change Log
cuSpatial is an efficient C++ library accelerated on GPUs with Python bindings to enable use by the data science community. cuSpatial provides significant GPU-acceleration to common spatial and spatiotemporal operations such as point-in-polygon tests, distances between trajectories, and trajectory clustering when compared to CPU-based implementations.
GitHub / Docs / Change Log
cuxfilter is a framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the original, it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via cuDF.
GitHub / Docs / Change Log
Cyber Log Accelerators (CLX), also pronounced “clicks”, provides a collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
GitHub / Docs / Change Log
nvStrings, the Python bindings for cuStrings, provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
GitHub / Docs / Change Log
RAPIDS Memory Manager (RMM) is a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous.
BlazingSQL is an open source project providing distributed SQL for analytics that enables the integration of enterprise data at scale. RAPIDS is actively contributing to BlazingSQL, and it integrates with RAPIDS cuDF, XGBoost, and RAPIDS cuML for GPU-accelerated data analytics and machine learning.
Learn more on our BlazingSQL page
Dask is an open source project providing advanced parallelism for analytics that enables performance at scale. RAPIDS is actively contributing to Dask, and it integrates with both RAPIDS cuDF, XGBoost, and RAPIDS cuML for GPU-accelerated data analytics and machine learning.
Learn more on our Dask page
XGBoost is a well-known gradient boosted decision trees (GBDT) machine learning package used to tackle regression, classification, and ranking problems. The RAPIDS team works closely with the Distributed Machine Learning Common (DMLC) XGBoost organization to upstream code and ensure that all components of the GPU-accelerated analytics ecosystem work together.
Learn more on our XGBoost page
Coming soon: NVIDIA will be bringing RAPIDS to Apache Spark.
Learn more on our blog post