The syntax to subset data is very different in Polars compared to the indexing of pandas and other languages. Action verbs are used in a style very similar to that of R’s dplyr from the tidyverse.
import polars as pldf = pl.read_csv("https://cdn.jsdelivr.net/npm/vega-datasets/data/disasters.csv")df
shape: (803, 3)
Entity
Year
Deaths
str
i64
i64
"All natural disasters"
1900
1267360
"All natural disasters"
1901
200018
"All natural disasters"
1902
46037
"All natural disasters"
1903
6506
"All natural disasters"
1905
22758
…
…
…
"Wildfire"
2013
35
"Wildfire"
2014
16
"Wildfire"
2015
67
"Wildfire"
2016
39
"Wildfire"
2017
75
Selecting rows
You can create a new DataFrame with a subset of rows matching some condition with polars.DataFrame.filter.
Let’s select rows for the year 2001. For this, we select the column Year by its name with polars.col and return the rows when the values for that column equal 2001:
Using polars.DataFrame.unique and polars.Series.to_list, you can get a list of all the types of natural disasters in this dataset (we can then sort the list with the standard list.sort method):
If you want to add the modified columns to the initial DataFrame (instead of selecting them), you use polars.DataFrame.with_columns. The naming works in the same way:
Notice that the rows became out of order. Not to worry about order makes the code more efficient and does not affect future subsetting of our DataFrame. If you want to maintain the order however, you can use the maintain_order parameter (but this slows down the operation):
Create a new DataFrame, ordered by year, that shows the total number of deaths for each year:
shape: (117, 2)
Year
Deaths
i64
i64
1900
1267360
1901
200018
1902
46037
1903
6506
1905
22758
…
…
2013
22225
2014
20882
2015
23893
2016
10201
2017
2087
Combining contexts
select, with_columns, filter, and group_by are called contexts in the Polars terminology (the data transformations performed in these contexts are called expressions).
Contexts can be combined. For instance, we can create a new DataFrame with the number of deaths for each decade: