The RAPIDS cuDF ecosystem

Author

Marie-Hélène Burle

This section covers a brief introduction to RAPIDS cuDF—an open-source library that allows to easily run Python DataFrames on the GPU.

What is RAPIDS?

RAPIDS is an open-source collection of libraries from NVIDIA that build on the extremely performant open-source, language-agnostic columnar memory format Apache Arrow and on CUDA (not open-source).

RAPIDS include cuML for machine learning, cuGraph and nx-cugraph for graph analytics, and cuDF for DataFrames.

RAPIDS for dataframes

Using cuDF

There are 3 ways to use cuDF:

	cuDF	pandas cuDF	Polars cuDF
API	Similar to pandas Some differences	Exactly the same as pandas	Exactly the same as Polars lazy API
Performance	Very good Operations fully run on GPU	Good Automatic fallbacks to CPU for unsupported operations can lead to costly transfers between CPU and GPU	The best Lazy execution + automatic fallback on CPU
Installation	Install `cudf`	Install `cudf` (`cudf.pandas` is a module)	Install `polars` with GPU engine extra (`polars[gpu]`)

Which one to use?

Unless you are stuck with existing code, pipelines, and workflows that you cannot change, Polars with the GPU engine is a great option. It gives you the best of both worlds: lazy execution with its advantages (running out-of-core, better performance, reduced memory impact) as well as code running on GPU. If you are already using Polars with the lazy API on CPU, the code is virtually the same (you just need to pass engine="gpu" to collect and you can customize the GPU engine for finer control if you want).

pandas cuDF will speed up your pandas code to some extent and at no coding cost (if you come from pandas), but it is the least performant option since the pandas code that can’t be run on GPU will be run on CPU. Additionally, copying the data back and forth between host and devices is costly. If you don’t know pandas, don’t learn it: learn Polars instead.

cuDF runs the code fully on GPU and is a much better option, but it requires learning a slightly new API since it only works on a subset of pandas command and has differences with the pandas API.

In this course, we will focus on Polars with the GPU engine.

As of April 2026, this project is in open beta, meaning that it is available for testing before the official launch. This means that it is still in development and there could be breaking changes. The multi-GPU engine is particularly experimental at this point.

Not quite all data types and expressions are supported yet. For an up to date of what is supported, have a look at the official documentation.