Open GPU Data Science



The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS is incubated by NVIDIA® based on years of accelerated data science experience. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

End to end Performance Chart

The New GPU
Data Science Pipeline

RAPIDS Pipeline Diagram

  • Apache Arrow This is a columnar, in-memory data structure that delivers efficient and fast data interchange with flexibility to support complex data models.
  • cuDFThe RAPIDS cuDF library is a DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the pandas interface for seamless onboarding of pandas users.
  • cuMLRAPIDS cuML is a collection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in scikit-learn.
  • cuGRAPHThis is a framework and collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform.
  • Deep Learning LibrariesRAPIDS provides native array_interface support. This means data stored in Apache Arrow can be seamlessly pushed to deep learning frameworks that accept array_interface such as PyTorch and Chainer.
  • Visualization Libraries Coming SoonRAPIDS will include tightly integrated data visualization libraries based on Apache Arrow. Native GPU in-memory data format provides high-performance, high-FPS data visualization, even with very large datasets.

Features of RAPIDS

  • hassle free

    Hassle-Free Integration

    Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.

  • scaling out

    Scaling Out
    on Any GPU

    Seamless scale from GPU workstations to multi-GPU servers and multi-node clusters.

  • top model

    Top Model Accuracy

    Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.

  • reduces

    Training Time

    Drastically improve your productivity with near-interactive data science.

  • open source


    The open-source software is customizable, extensible, interoperable--supported by NVIDIA and built on Apache Arrow.


RAPIDS is for everyone–users, adopters, and contributors. If you’re a data scientist, researcher, engineer, or developer using pandas, Dask, scikit-learn, or Spark on CPUs and looking for 50X end-to-end pipeline speedups at scale, look no further. Downloads RAPIDS and give us a run. RAPIDS is open sourced under the Apache 2.0 open source license and intended to be built upon and hardened in the community. While significant time and effort has been invested into making the platform usable and relevant, we need active contributors to help improve it and build its future.




Open Source