import polars as pl
url = "https://cdn.jsdelivr.net/npm/vega-datasets/data/disasters.csv"
df = pl.read_csv(url)
type(df)polars.dataframe.frame.DataFrame
Marie-Hélène Burle
When it comes to high-performance computing, one of the strengths of Polars is that it supports lazy evaluation. Lazy evaluation instantly returns a future that can be used without waiting for the result of the computation. Moreover, when you run queries on a LazyFrame, Polars creates a graph and runs optimizations on it, very much the way compiled languages work.
If you want to speedup your code, use lazy execution whenever possible.
Ideally, you want to use the lazy API from the start, when you read in the data.
In the previous examples, we used polars.read_csv to read our data. This returns a Polars DataFrame:
polars.dataframe.frame.DataFrame
Instead, you can use polars.scan_csv to create a LazyFrame:
There are scan functions for all the IO methods Polars offers.
If you already have a DataFrame, you can create a LazyFrame from it with the polars.DataFrame.lazy method:
To get results from a LazyFrame, you use polars.LazyFrame.collect.
This won’t work because a LazyFrame has no attribute shape:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[17], line 1 ----> 1 df_lazy.filter(pl.col("Year") == 2001).shape AttributeError: 'LazyFrame' object has no attribute 'shape'
You need to collect the result first:
collect turns your LazyFrame into a DataFrame, but it only does so on the subset needed for your query:
This allows you to work with data too big to fit in memory!