Data frame inspection

Author

Marie-Hélène Burle

Once we have a data frame, it is important to quickly get some basic information about it. In this section, we will see how to do so.

Let’s start by reading an online CSV file from a URL:

import polars as pl

df = pl.read_csv("https://raw.githubusercontent.com/razoumov/publish/master/jeopardy.csv")
print(df)
shape: (216_930, 7)
┌─────────────┬──────────┬───────────┬─────────────────┬─────────┬────────────────┬────────────────┐
│ Show Number ┆ Air Date ┆ Round     ┆ Category        ┆ Value   ┆ Question       ┆ Answer         │
│ ---         ┆ ---      ┆ ---       ┆ ---             ┆ ---     ┆ ---            ┆ ---            │
│ i64         ┆ str      ┆ str       ┆ str             ┆ str     ┆ str            ┆ str            │
╞═════════════╪══════════╪═══════════╪═════════════════╪═════════╪════════════════╪════════════════╡
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ HISTORY         ┆ $200    ┆ For the last 8 ┆ Copernicus     │
│             ┆          ┆           ┆                 ┆         ┆ years of his   ┆                │
│             ┆          ┆           ┆                 ┆         ┆ li…            ┆                │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ ESPN's TOP 10   ┆ $200    ┆ No. 2: 1912    ┆ Jim Thorpe     │
│             ┆          ┆           ┆ ALL-TIME        ┆         ┆ Olympian;      ┆                │
│             ┆          ┆           ┆ ATHLETE…        ┆         ┆ football…      ┆                │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ EVERYBODY TALKS ┆ $200    ┆ The city of    ┆ Arizona        │
│             ┆          ┆           ┆ ABOUT IT...     ┆         ┆ Yuma in this   ┆                │
│             ┆          ┆           ┆                 ┆         ┆ state…         ┆                │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ THE COMPANY     ┆ $200    ┆ In 1963, live  ┆ McDonald's     │
│             ┆          ┆           ┆ LINE            ┆         ┆ on "The Art    ┆                │
│             ┆          ┆           ┆                 ┆         ┆ Link…          ┆                │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ EPITAPHS &      ┆ $200    ┆ Signer of the  ┆ John Adams     │
│             ┆          ┆           ┆ TRIBUTES        ┆         ┆ Dec. of        ┆                │
│             ┆          ┆           ┆                 ┆         ┆ Indep., …      ┆                │
│ …           ┆ …        ┆ …         ┆ …               ┆ …       ┆ …              ┆ …              │
│ 4999        ┆ 5/11/06  ┆ Double    ┆ RIDDLE ME THIS  ┆ $2,000  ┆ This Puccini   ┆ Turandot       │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ opera turns on ┆                │
│             ┆          ┆           ┆                 ┆         ┆ th…            ┆                │
│ 4999        ┆ 5/11/06  ┆ Double    ┆ "T" BIRDS       ┆ $2,000  ┆ In North       ┆ a titmouse     │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ America this   ┆                │
│             ┆          ┆           ┆                 ┆         ┆ term is …      ┆                │
│ 4999        ┆ 5/11/06  ┆ Double    ┆ AUTHORS IN      ┆ $2,000  ┆ In Penny Lane, ┆ Clive Barker   │
│             ┆          ┆ Jeopardy! ┆ THEIR YOUTH     ┆         ┆ where this     ┆                │
│             ┆          ┆           ┆                 ┆         ┆ "Hel…          ┆                │
│ 4999        ┆ 5/11/06  ┆ Double    ┆ QUOTATIONS      ┆ $2,000  ┆ From Ft. Sill, ┆ Geronimo       │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ Okla. he made  ┆                │
│             ┆          ┆           ┆                 ┆         ┆ t…             ┆                │
│ 4999        ┆ 5/11/06  ┆ Final     ┆ HISTORIC NAMES  ┆ None    ┆ A silent movie ┆ Grigori        │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ title includes ┆ Alexandrovich  │
│             ┆          ┆           ┆                 ┆         ┆ …              ┆ Potemkin       │
└─────────────┴──────────┴───────────┴─────────────────┴─────────┴────────────────┴────────────────┘

Printing a few rows

Print first rows (5 by default):

print(df.head())
shape: (5, 7)
┌─────────────┬──────────┬───────────┬────────────────────┬───────┬───────────────────┬────────────┐
│ Show Number ┆ Air Date ┆ Round     ┆ Category           ┆ Value ┆ Question          ┆ Answer     │
│ ---         ┆ ---      ┆ ---       ┆ ---                ┆ ---   ┆ ---               ┆ ---        │
│ i64         ┆ str      ┆ str       ┆ str                ┆ str   ┆ str               ┆ str        │
╞═════════════╪══════════╪═══════════╪════════════════════╪═══════╪═══════════════════╪════════════╡
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ HISTORY            ┆ $200  ┆ For the last 8    ┆ Copernicus │
│             ┆          ┆           ┆                    ┆       ┆ years of his li…  ┆            │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ ESPN's TOP 10      ┆ $200  ┆ No. 2: 1912       ┆ Jim Thorpe │
│             ┆          ┆           ┆ ALL-TIME ATHLETE…  ┆       ┆ Olympian;         ┆            │
│             ┆          ┆           ┆                    ┆       ┆ football…         ┆            │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ EVERYBODY TALKS    ┆ $200  ┆ The city of Yuma  ┆ Arizona    │
│             ┆          ┆           ┆ ABOUT IT...        ┆       ┆ in this state…    ┆            │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ THE COMPANY LINE   ┆ $200  ┆ In 1963, live on  ┆ McDonald's │
│             ┆          ┆           ┆                    ┆       ┆ "The Art Link…    ┆            │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ EPITAPHS &         ┆ $200  ┆ Signer of the     ┆ John Adams │
│             ┆          ┆           ┆ TRIBUTES           ┆       ┆ Dec. of Indep., … ┆            │
└─────────────┴──────────┴───────────┴────────────────────┴───────┴───────────────────┴────────────┘
print(df.head(2))
shape: (2, 7)
┌─────────────┬──────────┬───────────┬────────────────────┬───────┬───────────────────┬────────────┐
│ Show Number ┆ Air Date ┆ Round     ┆ Category           ┆ Value ┆ Question          ┆ Answer     │
│ ---         ┆ ---      ┆ ---       ┆ ---                ┆ ---   ┆ ---               ┆ ---        │
│ i64         ┆ str      ┆ str       ┆ str                ┆ str   ┆ str               ┆ str        │
╞═════════════╪══════════╪═══════════╪════════════════════╪═══════╪═══════════════════╪════════════╡
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ HISTORY            ┆ $200  ┆ For the last 8    ┆ Copernicus │
│             ┆          ┆           ┆                    ┆       ┆ years of his li…  ┆            │
│ 4680        ┆ 12/31/04 ┆ Jeopardy! ┆ ESPN's TOP 10      ┆ $200  ┆ No. 2: 1912       ┆ Jim Thorpe │
│             ┆          ┆           ┆ ALL-TIME ATHLETE…  ┆       ┆ Olympian;         ┆            │
│             ┆          ┆           ┆                    ┆       ┆ football…         ┆            │
└─────────────┴──────────┴───────────┴────────────────────┴───────┴───────────────────┴────────────┘

Print last rows (5 by default):

print(df.tail(2))
shape: (2, 7)
┌─────────────┬──────────┬───────────┬─────────────────┬─────────┬────────────────┬────────────────┐
│ Show Number ┆ Air Date ┆ Round     ┆ Category        ┆ Value   ┆ Question       ┆ Answer         │
│ ---         ┆ ---      ┆ ---       ┆ ---             ┆ ---     ┆ ---            ┆ ---            │
│ i64         ┆ str      ┆ str       ┆ str             ┆ str     ┆ str            ┆ str            │
╞═════════════╪══════════╪═══════════╪═════════════════╪═════════╪════════════════╪════════════════╡
│ 4999        ┆ 5/11/06  ┆ Double    ┆ QUOTATIONS      ┆ $2,000  ┆ From Ft. Sill, ┆ Geronimo       │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ Okla. he made  ┆                │
│             ┆          ┆           ┆                 ┆         ┆ t…             ┆                │
│ 4999        ┆ 5/11/06  ┆ Final     ┆ HISTORIC NAMES  ┆ None    ┆ A silent movie ┆ Grigori        │
│             ┆          ┆ Jeopardy! ┆                 ┆         ┆ title includes ┆ Alexandrovich  │
│             ┆          ┆           ┆                 ┆         ┆ …              ┆ Potemkin       │
└─────────────┴──────────┴───────────┴─────────────────┴─────────┴────────────────┴────────────────┘

Print random rows (this is very useful as the head and tail of your data frame may not be representative of your data):

import random

print(df.sample(4))
shape: (4, 7)
┌─────────────┬──────────┬───────────┬───────────────────┬───────┬───────────────────┬─────────────┐
│ Show Number ┆ Air Date ┆ Round     ┆ Category          ┆ Value ┆ Question          ┆ Answer      │
│ ---         ┆ ---      ┆ ---       ┆ ---               ┆ ---   ┆ ---               ┆ ---         │
│ i64         ┆ str      ┆ str       ┆ str               ┆ str   ┆ str               ┆ str         │
╞═════════════╪══════════╪═══════════╪═══════════════════╪═══════╪═══════════════════╪═════════════╡
│ 4885        ┆ 12/2/05  ┆ Jeopardy! ┆ COUNTRIES BY      ┆ $800  ┆ In Africa:        ┆ Gabon       │
│             ┆          ┆           ┆ CAPITAL           ┆       ┆ Libreville        ┆             │
│ 3733        ┆ 11/22/00 ┆ Jeopardy! ┆ THAT'S MY WEAPON  ┆ $300  ┆ In 1855 he began  ┆ Samuel Colt │
│             ┆          ┆           ┆                   ┆       ┆ mass producti…    ┆             │
│ 4925        ┆ 1/27/06  ┆ Double    ┆ AMERICANA         ┆ $800  ┆ During the        ┆ Disneyland  │
│             ┆          ┆ Jeopardy! ┆                   ┆       ┆ Fabulous '50s,    ┆             │
│             ┆          ┆           ┆                   ┆       ┆ this…             ┆             │
│ 3250        ┆ 10/23/98 ┆ Jeopardy! ┆ IT'S MY PARTY     ┆ $200  ┆ This country's    ┆ India       │
│             ┆          ┆           ┆                   ┆       ┆ controversial n…  ┆             │
└─────────────┴──────────┴───────────┴───────────────────┴───────┴───────────────────┴─────────────┘

Structure

Overview of the data frame and its structure:

print(df.glimpse())
Rows: 216930
Columns: 7
$ Show Number <i64> 4680, 4680, 4680, 4680, 4680, 4680, 4680, 4680, 4680, 4680
$ Air Date    <str> '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04', '12/31/04'
$ Round       <str> 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!', 'Jeopardy!'
$ Category    <str> 'HISTORY', "ESPN's TOP 10 ALL-TIME ATHLETES", 'EVERYBODY TALKS ABOUT IT...', 'THE COMPANY LINE', 'EPITAPHS & TRIBUTES', '3-LETTER WORDS', 'HISTORY', "ESPN's TOP 10 ALL-TIME ATHLETES", 'EVERYBODY TALKS ABOUT IT...', 'THE COMPANY LINE'
$ Value       <str> '$200 ', '$200 ', '$200 ', '$200 ', '$200 ', '$200 ', '$400 ', '$400 ', '$400 ', '$400 '
$ Question    <str> "For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory", 'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves', 'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year', 'In 1963, live on "The Art Linkletter Show", this company served its billionth burger', 'Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States', 'In the title of an Aesop fable, this insect shared billing with a grasshopper', "Built in 312 B.C. to link Rome & the South of Italy, it's still in use today", 'No. 8: 30 steals for the Birmingham Barons; 2,306 steals for the Bulls', 'In the winter of 1971-72, a record 1,122 inches of snow fell at Rainier Paradise Ranger Station in this state', 'This housewares store was named for the packaging its merchandise came in & was first displayed on'
$ Answer      <str> 'Copernicus', 'Jim Thorpe', 'Arizona', "McDonald's", 'John Adams', 'the ant', 'the Appian Way', 'Michael Jordan', 'Washington', 'Crate & Barrel'

None

This is similar to the str() function in R.

To print a list of the data types of each variable, you can use:

print(df.dtypes)
[Int64, String, String, String, String, String, String]

But the printing of a Polars data frame already gives you this information (along with the shape).

The schema of a Polars data frame sets the names of the variables (columns) and their data types:

print(df.schema)
Schema({'Show Number': Int64, 'Air Date': String, 'Round': String, 'Category': String, 'Value': String, 'Question': String, 'Answer': String})

Summary statistics

This is not always meaningful depending on your data:

print(df.describe())
shape: (9, 8)
┌────────────┬────────────┬──────────┬────────────┬────────────┬─────────┬────────────┬────────────┐
│ statistic  ┆ Show       ┆ Air Date ┆ Round      ┆ Category   ┆ Value   ┆ Question   ┆ Answer     │
│ ---        ┆ Number     ┆ ---      ┆ ---        ┆ ---        ┆ ---     ┆ ---        ┆ ---        │
│ str        ┆ ---        ┆ str      ┆ str        ┆ str        ┆ str     ┆ str        ┆ str        │
│            ┆ f64        ┆          ┆            ┆            ┆         ┆            ┆            │
╞════════════╪════════════╪══════════╪════════════╪════════════╪═════════╪════════════╪════════════╡
│ count      ┆ 216930.0   ┆ 216930   ┆ 216930     ┆ 216778     ┆ 216930  ┆ 216928     ┆ 216898     │
│ null_count ┆ 0.0        ┆ 0        ┆ 0          ┆ 152        ┆ 0       ┆ 2          ┆ 32         │
│ mean       ┆ 4264.23851 ┆ null     ┆ null       ┆ null       ┆ null    ┆ null       ┆ null       │
│            ┆ 9          ┆          ┆            ┆            ┆         ┆            ┆            │
│ std        ┆ 1386.29633 ┆ null     ┆ null       ┆ null       ┆ null    ┆ null       ┆ null       │
│            ┆ 5          ┆          ┆            ┆            ┆         ┆            ┆            │
│ min        ┆ 1.0        ┆ 1/1/01   ┆ Double     ┆ A JIM      ┆ $1,000  ┆ " 'Cause   ┆  Hamlet    │
│            ┆            ┆          ┆ Jeopardy!  ┆ CARREY     ┆         ┆ I'm never  ┆            │
│            ┆            ┆          ┆            ┆ FILM       ┆         ┆ gonna stop ┆            │
│            ┆            ┆          ┆            ┆ FESTIVAL   ┆         ┆ …          ┆            │
│ 25%        ┆ 3349.0     ┆ null     ┆ null       ┆ null       ┆ null    ┆ null       ┆ null       │
│ 50%        ┆ 4490.0     ┆ null     ┆ null       ┆ null       ┆ null    ┆ null       ┆ null       │
│ 75%        ┆ 5393.0     ┆ null     ┆ null       ┆ null       ┆ null    ┆ null       ┆ null       │
│ max        ┆ 6300.0     ┆ 9/9/99   ┆ Tiebreaker ┆ ’70s     ┆ None    ┆ “You     ┆ “one     │
│            ┆            ┆          ┆            ┆ CINEMA     ┆         ┆ Can't Lose ┆ giant leap │
│            ┆            ┆          ┆            ┆            ┆         ┆ Me†      ┆ for        │
│            ┆            ┆          ┆            ┆            ┆         ┆            ┆ mankindâ…  │
└────────────┴────────────┴──────────┴────────────┴────────────┴─────────┴────────────┴────────────┘