Getting started

Author

Marie-Hélène Burle

Here are a few notes to get you started with Polars.

Installation

Personal computer

python -m venv ~/env                  # Create virtual env
source ~/env/bin/activate             # Activate virtual env
pip install --upgrade pip             # Update pip
pip install polars                    # Install Polars

Alliance clusters

Polars wheels are available for Polars (always prefer wheels when possible):

python -m venv ~/env                  # Create virtual env
source ~/env/bin/activate             # Activate virtual env
pip install --upgrade pip --no-index  # Update pip from wheel
pip install polars --no-index         # Install Polars from wheel

Syntax

Overall, the syntax feels very similar to R’s dplyr from the tidyverse.

In particular, extracting data is not done by indexing, but with action verbs:

import polars as pl

df = pl.DataFrame(
    {
        "species": ["A", "B", "C"],
        "number": [87, 13, 4],
        "category": ["a", "b", "c"]
    }
)

df

shape: (3, 3)
┌─────────┬────────┬──────────┐
│ species ┆ number ┆ category │
│ ---     ┆ ---    ┆ ---      │
│ str     ┆ i64    ┆ str      │
╞═════════╪════════╪══════════╡
│ A       ┆ 87     ┆ a        │
│ B       ┆ 13     ┆ b        │
│ C       ┆ 4      ┆ c        │
└─────────┴────────┴──────────┘

df.filter(pl.col("number") > 20).select("category")

shape: (1, 1)
┌──────────┐
│ category │
│ ---      │
│ str      │
╞══════════╡
│ a        │
└──────────┘

Performance tips

Use lazy execution where possible

We already saw that you can lazily read files with pl.scan_csv instead of using pl.read_csv.

Another option is to use the lazy method.

Example:

df = pl.DataFrame({"foo": ["a", "b", "c"], "bar": [0, 1, 2]}).lazy()

The results get eagerly returned with the collect method.

Data file format

A good file format to store large datasets is Apache Parquet. It is a columnar format (data is stored together by column instead of row as is the case for CSV files) and this allows better compression.

Migrating from Pandas

Read the migration guide: it will help you write Polars code rather than “literally translated” Pandas code that runs, but doesn’t make use of Polars’ strengths. The differences in style mostly come from the fact that Polars runs in parallel.