Polars GPU engine

Author

Marie-Hélène Burle

In this section, we look at the basics of using Polars on GPU with RAPIDS cuDF.

Polars

Polars is an online analytical processing (OLAP) query engine for DataFrames in Python.

It is a newer, faster, and better option than pandas: it allows automatic multithreading and lazy evaluation, it builds on Apache Arrow to store data in memory, it has a clearer syntax and a consistent handling of missing data.

You can find more information in our introductory course and webinar on Polars as well as our webinar comparing it to pandas.

Below is an example with data from the GBIF website (free and open access biodiversity database). The Southern African Bird Atlas Project 2 [1] contains a CSV file of 12.3 GB.

Shape of the DataFrame: 25,687,526 rows, 50 columns.

I converted the CSV file into a (much!) better file format: Apache Parquet, a binary, machine-optimized, column-oriented file format with efficient encoding and compression, ideal for large tabular data. And I copied it to the training cluster, in a directory we can all read from.

Because of the size of the data and the small amount of memory available in our training cluster, if you try to read in the file into a Polars DataFrame, the kernel will die because of an out of memory (OOM) error (you can try to see what an OOM problem looks like in a Jupyter notebook: there is not a lot of feedback!).

The timing and result below are on my machine:

import polars as pl

df = pl.read_parquet('/project/def-sponsor00/data/sa_birds.parquet')

df
shape: (25_687_526, 50)
gbifID datasetKey occurrenceID kingdom phylum class order family genus species infraspecificEpithet taxonRank scientificName verbatimScientificName verbatimScientificNameAuthorship countryCode locality stateProvince occurrenceStatus individualCount publishingOrgKey decimalLatitude decimalLongitude coordinateUncertaintyInMeters coordinatePrecision elevation elevationAccuracy depth depthAccuracy eventDate day month year taxonKey speciesKey basisOfRecord institutionCode collectionCode catalogNumber recordNumber identifiedBy dateIdentified license rightsHolder recordedBy typeStatus establishmentMeans lastInterpreted mediaType issue
i64 str str str str str str str str str str str str str str str str str str str str f64 f64 str str str str str str str i64 i64 i64 i64 i64 str str str str str str str str str str str str str str str
3867289255 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid18… "Animalia" "Chordata" "Aves" "Passeriformes" "Muscicapidae" "Bradornis" "Bradornis pallidus" null "SPECIES" "Bradornis pallidus (J.W.von Mü… null null "ZA" null "Mpumalanga" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -24.79125 31.457917 null null null null null null "2022-07-21" 21 7 2022 2492639 2492639 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid18… null "Mr L Hes" null "CC_BY_4_0" null "Mr L Hes" null null "2026-03-07T10:46:26.735Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2341252758 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid25… "Animalia" "Chordata" "Aves" "Charadriiformes" "Burhinidae" "Burhinus" "Burhinus capensis" null "SPECIES" "Burhinus capensis (M.H.K.Licht… null null "ZA" null "Limpopo" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -23.874583 29.457917 null null null null null null "2011-01-07" 7 1 2011 2482097 2482097 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid25… null "Prof J Pretorius" null "CC_BY_4_0" null "Prof J Pretorius" null null "2026-03-07T10:46:53.556Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
3867442137 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid18… "Animalia" "Chordata" "Aves" "Passeriformes" "Platysteiridae" "Batis" "Batis molitor" null "SPECIES" "Batis molitor (Kuster, 1836)" null null "ZA" null "Limpopo" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -23.957917 31.124583 null null null null null null "2022-07-20" 20 7 2022 5231186 5231186 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid18… null "Mr P Verster" null "CC_BY_4_0" null "Mr P Verster" null null "2026-03-07T10:46:26.735Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2347570158 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid88… "Animalia" "Chordata" "Aves" "Coraciiformes" "Alcedinidae" "Halcyon" "Halcyon leucocephala" null "SPECIES" "Halcyon leucocephala (P.L.S.Mü… null null "ZA" null "Limpopo" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -24.29125 30.624583 null null null null null null "2016-10-17" 17 10 2016 5228304 5228304 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid88… null "Mr R Hawkins" null "CC_BY_4_0" null "Mr R Hawkins" null null "2026-03-07T10:47:07.113Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
3867442155 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid18… "Animalia" "Chordata" "Aves" "Passeriformes" "Malaconotidae" "Chlorophoneus" "Chlorophoneus sulfureopectus" null "SPECIES" "Chlorophoneus sulfureopectus (… null null "ZA" null "Limpopo" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -22.374583 31.207917 null null null null null null "2022-07-18" 18 7 2022 5845131 5845131 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid18… null "Mr R Hawkins" null "CC_BY_4_0" null "Mr R Hawkins" null null "2026-03-07T10:46:26.736Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2342365420 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid35… "Animalia" "Chordata" "Aves" "Passeriformes" "Alaudidae" "Mirafra" "Mirafra africana" null "SPECIES" "Mirafra africana A.Smith, 1836" null null "ZA" null "North West" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -26.874583 26.707917 null null null null null null "2012-02-24" 24 2 2012 9389539 9389539 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid35… null "Mrs W Strauss" null "CC_BY_4_0" null "Mrs W Strauss" null null "2026-03-07T10:46:22.798Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2345700071 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid68… "Animalia" "Chordata" "Aves" "Passeriformes" "Cisticolidae" "Apalis" "Apalis flavida" null "SPECIES" "Apalis flavida (Strickland, 18… null null "ZA" null "KwaZulu-Natal" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -27.957917 32.374583 null null null null null null "2015-06-14" 14 6 2015 2492725 2492725 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid68… null "Mr E Marais" null "CC_BY_4_0" null "Mr E Marais" null null "2026-03-07T10:46:21.251Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2342366813 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid35… "Animalia" "Chordata" "Aves" "Passeriformes" "Turdidae" "Turdus" "Turdus olivaceus" null "SPECIES" "Turdus olivaceus Linnaeus, 176… null null "ZA" null "Western Cape" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -34.124583 19.54125 null null null null null null "2012-02-27" 27 2 2012 9363452 9363452 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid35… null "Dr S Shearer" null "CC_BY_4_0" null "Dr S Shearer" null null "2026-03-07T10:46:22.798Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2345703248 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid68… "Animalia" "Chordata" "Aves" "Coraciiformes" "Alcedinidae" "Ceryle" "Ceryle rudis" null "SPECIES" "Ceryle rudis (Linnaeus, 1758)" null null "ZA" null "KwaZulu-Natal" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -28.79125 31.957917 null null null null null null "2015-05-09" 9 5 2015 2475679 2475679 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid68… null "Mr JA Gouws" null "CC_BY_4_0" null "Mr JA Gouws" null null "2026-03-07T10:46:21.252Z" null "COORDINATE_ROUNDED;GEODETIC_DA…
2342367593 "906e6978-e292-4a8b-9c39-adf6bb… "urn:fiao:sabap2:fullprot:rid35… "Animalia" "Chordata" "Aves" "Passeriformes" "Cisticolidae" "Cisticola" "Cisticola juncidis" null "SPECIES" "Cisticola juncidis (Rafinesque… null null "ZA" null "Mpumalanga" "PRESENT" null "dd862d06-e6e9-4ab9-bc86-c875cc… -25.79125 30.124583 null null null null null null "2012-02-24" 24 2 2012 2492822 2492822 "HUMAN_OBSERVATION" "FIAO" "SABAP2" "urn:fiao:sabap2:fullprot:rid35… null "Mr G Lockwood" null "CC_BY_4_0" null "Mr G Lockwood" null null "2026-03-07T10:46:22.799Z" null "COORDINATE_ROUNDED;GEODETIC_DA…

From that dataset, I want a list of species from the genus Passer (Old World sparrows):

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
)
shape: (5, 1)
species
str
"Passer griseus"
"Passer domesticus"
"Passer motitensis"
"Passer melanurus"
"Passer diffusus"

Now, we can time this query:

%%timeit

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
)
234 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Polars lazy API

One of the great strengths of Polars is its lazy API: when you create a LazyFrame, Polars doesn’t run the code eagerly, one operation at a time; instead, it creates a query plan (a graph) that only gets resolved when you collect the result (in the form of a classic Polars DataFrame).

This allows for optimizations and fusions of operations. It also prevents the creation of intermediate objects that take space in memory. Finally, it allows to run queries on datasets too big to fit in memory.

Here is what it looks like (this one runs without any issue on the training cluster!):

import polars as pl

df = pl.scan_parquet('/project/def-sponsor00/data/sa_birds.parquet')

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
    .collect()
)
shape: (5, 1)
species
str
"Passer griseus"
"Passer diffusus"
"Passer motitensis"
"Passer domesticus"
"Passer melanurus"

We can time it too:

%%timeit

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
    .collect()
)
79 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

We have a small speedup (factor of 3), but the main advantage is the memory required.

When to use the lazy API?

  • Use it whenever you are dealing with large datasets.
  • The only option to work out-of-core (data too big to fit in memory).

Here, it makes perfect sense to use the lazy API: the data is very big.

Should you always try to use the lazy API?

Yes! Unless you are dealing with tiny DataFrames, it is always advantageous to use the lazy API. It will either speed computations up or save you memory, or both, depending on the situation. And as we just saw, it will allow you to run queries that you would otherwise not be able to run with the available memory).

Polars on GPU

The GPU engine builds on the lazy API: Polars dispatches the query plan to RAPIDS cuDF for execution on the GPU (if possible). The collected result is returned in the form of a classic Polars DataFrame on the CPU:

df = pl.scan_parquet('/project/def-sponsor00/data/sa_birds.parquet')

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
    .collect(engine='gpu')
)
shape: (5, 1)
species
str
"Passer domesticus"
"Passer motitensis"
"Passer diffusus"
"Passer melanurus"
"Passer griseus"

And the timing:

%%timeit

(
    df.filter(pl.col('genus') == 'Passer')
    .select(pl.col('species'))
    .unique()
    .collect(engine='gpu')
)
95.7 ms ± 378 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

As you can see, the result is actually slower here.

Did the query actually run on the GPU? (Remember that Polars cuDF will, by default, quietly fallback on the CPU for queries that are not yet supported on the GPU).

One way to check is to run the query in verbose mode:

with pl.Config() as cfg:
    cfg.set_verbose(True)
    (
        df.filter(pl.col('genus') == 'Passer')
        .select(pl.col('species'))
        .unique()
        .collect(engine='gpu')
    )

No warning means that the query was run on the GPU.

We will discuss the other method in the next section.

When to use GPUs?

  • When the computations are intensive and can benefit from vast parallelization.
  • Simple queries on very large data won’t benefit from GPUs because the cost of transferring the data to and from the GPU is large and the benefit is small (particularly since Polars already runs computations on multiple threads, so there is already some parallelization).

Should you always try to use GPUs?

No! Benchmarks done by the Polars team show that queries heavy in grouped aggregations and joins benefit most from the GPU engine. By contrast, queries dominated by I/O show similar speeds on CPU and GPU (as we just saw).

Configurations

Default configuration

The default configuration Works in most cases. To use it, as we saw, simply pass engine="cpu" in the collect method.

Configuration options

You can create a GPU engine thanks to polars.GPUEngine and pass options to it.

Example:

  • Use the in-memory executor (default is streaming),
  • select device 1 (if you have at least 2 GPUs; default is 0),
  • raise an error if a query cannot run on GPU (default is to silently fall back to the CPU):
engine = pl.GPUEngine(
    executor="in-memory",
    device=1,
    raise_on_fail=True
)

You then pass this engine as the value of the engine argument of the collect method: <your-query-as-a-lazy-frame>.collect(engine=engine).

The second method to ensure that our computations ran on GPU (and did not silently fallback on CPU) is change the configuration options.

Your turn:

Can you write the code that would test this?

Executors

Streaming

Streaming splits the data into partitions that are streamed through the query graph. Because it scales best and works very well on parquet files, this is what you want to use for large data.

Single GPU

This is the default of the Polars GPU engine.

Using .collect(engine="gpu") as we did earlier is equivalent to creating the following engines:

engine=pl.GPUEngine()

# or the following with the default options
engine=pl.GPUEngine(executor="streaming", executor_options={"cluster": "single"})

and then using them with .collect(engine=engine).

You can pass additional options to the executor

engine = pl.GPUEngine(
    executor_options={"max_rows_per_partition": 1_000_000}
)

While using parquet files, you can pass additional parquet options:

engine = GPUEngine(
    parquet_options={
        'chunked': True,
        'chunk_read_limit': int(1e9),
        'pass_read_limit': int(4e9)
    }
)

Multiple GPUs

If you want to use multiple GPUs, you can create a distributed streaming executor:

engine = pl.GPUEngine(executor_options={"cluster": "distributed"})

In-memory

If you have small data that fits easily in memory, you can run the in-memory executor which will have less overhead. But careful that it will not scale well however:

engine = pl.GPUEngine(executor="in-memory")

References

1.
GBIF.org (2026) Occurrence download