import polars as pl
= pl.Series(range(5))
s1 print(s1)
shape: (5,)
Series: '' [i64]
[
0
1
2
3
4
]
Marie-Hélène Burle
Polars provides two fundamental data structures: series and data frames.
In Polars, series are one-dimensional and homogeneous (all elements have the same data type).
In other frameworks or languages (e.g. pandas, R), such data structure would be called a vector.
Polars infers data types from the data. Defaults are Int64 and Float64. For other options, you can create typed series by specifying the type:
Series can be named:
Data frames are two-dimensional and composed of named series of equal lengths. This means that data frames are heterogeneous, but that columns contain homogeneous data.
They can be created from:
shape: (3, 2)
┌──────┬────────┐
│ Name ┆ Colour │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪════════╡
│ Bob ┆ Red │
│ Luc ┆ Green │
│ Lucy ┆ Blue │
└──────┴────────┘
from datetime import date
df2 = pl.DataFrame(
{
"Date": [
date(2024, 10, 1),
date(2024, 10, 2),
date(2024, 10, 3),
date(2024, 10, 6)
],
"Rain": [2.1, 0.5, 0.0, 1.8],
"Cloud cover": [1, 1, 0, 2]
}
)
print(df2)
shape: (4, 3)
┌────────────┬──────┬─────────────┐
│ Date ┆ Rain ┆ Cloud cover │
│ --- ┆ --- ┆ --- │
│ date ┆ f64 ┆ i64 │
╞════════════╪══════╪═════════════╡
│ 2024-10-01 ┆ 2.1 ┆ 1 │
│ 2024-10-02 ┆ 0.5 ┆ 1 │
│ 2024-10-03 ┆ 0.0 ┆ 0 │
│ 2024-10-06 ┆ 1.8 ┆ 2 │
└────────────┴──────┴─────────────┘
shape: (2, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════════╪══════════╡
│ 1 ┆ 2 │
│ 3 ┆ 4 │
└──────────┴──────────┘
Because NumPy ndarrays are stored in memory by rows, the values in the first dimension of the array fill in the first row. If you want to fill in the data frame by column, you use the orient
parameter:
shape: (2, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════════╪══════════╡
│ 1 ┆ 3 │
│ 2 ┆ 4 │
└──────────┴──────────┘
To specify column names, you can use the schema parameter:
To specify data types different from the default, you also use the schema parameter:
df6 = pl.DataFrame(
{
"Rain": [2.1, 0.5, 0.0, 1.8],
"Cloud cover": [1, 1, 0, 2],
},
schema={"Rain": pl.Float32, "Cloud cover": pl.Int32}
)
print(df6)
shape: (4, 2)
┌──────┬─────────────┐
│ Rain ┆ Cloud cover │
│ --- ┆ --- │
│ f32 ┆ i32 │
╞══════╪═════════════╡
│ 2.1 ┆ 1 │
│ 0.5 ┆ 1 │
│ 0.0 ┆ 0 │
│ 1.8 ┆ 2 │
└──────┴─────────────┘