The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar dataframe API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.
For RAPIDS logos, themes, branding, and other guides, take a look at our Branding and Guides page.
RAPIDS had its start from the Apache Arrow and GoAi projects based on a columnar, in-memory data structure that delivers efficient and fast data interchange with flexibility to support complex data models.
Libraries and APIs Overview
Some RAPIDS projects include cuDF, a pandas-like dataframe manipulation library; cuML, a collection of machine learning libraries that will provide GPU versions of algorithms available in scikit-learn; cuGraph, a NetworkX-like accelerated graph analytics library. Development follows a 6 week release schedule, so new features and libraries are always on the way.
Integration With Deep Learning Libraries
RAPIDS provides native
array_interface support. This means data stored in Apache Arrow can be seamlessly pushed to deep learning frameworks that accept
array_interface or work with DLPack, such as Chainer, MXNet, and PyTorch.
Our focus on Python allows RAPIDS to play well with most data science visualization libraries. For even greater performance, we are working towards deeper integration with these libraries since a native GPU in-memory data format provides high-performance, high-FPS data visualization capabilities, even with very large datasets.