import polars as pl
s1 = pl.Series(range(5))
s1| i64 |
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
Marie-Hélène Burle
Polars provides two fundamental data structures: Series and DataFrames.
In Polars, Series are one-dimensional and homogeneous (all elements have the same data type).
In other frameworks or languages (e.g. pandas, R), such data structure would be called vector.
Polars infers the data type from the data. Defaults are Int64 and Float64, but you can specify another type:
Series can be named:
DataFrames are two-dimensional and composed of named Series of equal length. This means that DataFrames can be heterogeneous, but that columns contain homogeneous data.
They can be created from:
| Name | Colour |
|---|---|
| str | str |
| "Bob" | "Red" |
| "Luc" | "Green" |
| "Lucy" | "Blue" |
| Date | Rain | Cloud cover |
|---|---|---|
| date | f64 | i64 |
| 2024-10-01 | 2.1 | 1 |
| 2024-10-02 | 0.5 | 1 |
| 2024-10-03 | 0.0 | 0 |
| 2024-10-06 | 1.8 | 2 |
| column_0 | column_1 |
|---|---|
| i64 | i64 |
| 1 | 2 |
| 3 | 4 |
Because NumPy ndarrays are stored in memory by rows, the values in the first dimension of the array fill in the first row. If you want to fill in the DataFrame by column, you use the orient parameter:
| column_0 | column_1 |
|---|---|
| i64 | i64 |
| 1 | 3 |
| 2 | 4 |
To specify column names, you can use the schema parameter:
| Var1 | Var2 |
|---|---|
| i64 | i64 |
| 1 | 2 |
| 3 | 4 |
To specify data types different from the defaults, you also use the schema parameter: