# Load the Polars library and create a shorter name for it
import polars as pl
# The global confirmed cases are available in CSV format at the url:
= "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
url
# Polars allows to read in data from the web directly
= pl.read_csv(url) cases
DataFrames with Polars
Polars is a Python library built to manipulate data frames and time series.
For this section, we will use the Covid-19 data from the Johns Hopkins University CSSE repository.
You can visualize this data in a dashboard created by the Johns Hopkins University Center for Systems Science and Engineering.
Setup
First, we need to load the polars
library and read in the data from the web:
First look at the data
What does our data look like?
cases
Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | 1/30/20 | 1/31/20 | 2/1/20 | 2/2/20 | 2/3/20 | 2/4/20 | 2/5/20 | 2/6/20 | 2/7/20 | 2/8/20 | 2/9/20 | 2/10/20 | 2/11/20 | 2/12/20 | 2/13/20 | 2/14/20 | 2/15/20 | 2/16/20 | 2/17/20 | 2/18/20 | 2/19/20 | 2/20/20 | 2/21/20 | 2/22/20 | 2/23/20 | … | 2/1/23 | 2/2/23 | 2/3/23 | 2/4/23 | 2/5/23 | 2/6/23 | 2/7/23 | 2/8/23 | 2/9/23 | 2/10/23 | 2/11/23 | 2/12/23 | 2/13/23 | 2/14/23 | 2/15/23 | 2/16/23 | 2/17/23 | 2/18/23 | 2/19/23 | 2/20/23 | 2/21/23 | 2/22/23 | 2/23/23 | 2/24/23 | 2/25/23 | 2/26/23 | 2/27/23 | 2/28/23 | 3/1/23 | 3/2/23 | 3/3/23 | 3/4/23 | 3/5/23 | 3/6/23 | 3/7/23 | 3/8/23 | 3/9/23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | f64 | f64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | … | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 |
null | "Afghanistan" | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 208552 | 208669 | 208669 | 208621 | 208627 | 208704 | 208721 | 208771 | 208771 | 208943 | 208971 | 208982 | 209011 | 209036 | 209056 | 209072 | 209083 | 209084 | 209107 | 209153 | 209181 | 209181 | 209215 | 209230 | 209246 | 209274 | 209308 | 209322 | 209340 | 209358 | 209362 | 209369 | 209390 | 209406 | 209436 | 209451 | 209451 |
null | "Albania" | 41.1533 | 20.1683 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 334177 | 334187 | 334203 | 334204 | 334211 | 334211 | 334211 | 334222 | 334229 | 334229 | 334234 | 334255 | 334255 | 334264 | 334264 | 334273 | 334291 | 334305 | 334314 | 334315 | 334336 | 334336 | 334345 | 334356 | 334373 | 334378 | 334380 | 334391 | 334408 | 334408 | 334427 | 334427 | 334427 | 334427 | 334427 | 334443 | 334457 |
null | "Algeria" | 28.0339 | 1.6596 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 271385 | 271386 | 271394 | 271394 | 271394 | 271395 | 271399 | 271403 | 271406 | 271406 | 271409 | 271409 | 271409 | 271409 | 271421 | 271424 | 271424 | 271425 | 271425 | 271426 | 271428 | 271431 | 271432 | 271436 | 271439 | 271440 | 271440 | 271441 | 271448 | 271463 | 271469 | 271469 | 271477 | 271477 | 271490 | 271494 | 271496 |
null | "Andorra" | 42.5063 | 1.5218 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 47839 | 47839 | 47850 | 47850 | 47850 | 47850 | 47850 | 47850 | 47860 | 47860 | 47860 | 47860 | 47860 | 47860 | 47860 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47866 | 47875 | 47875 | 47875 | 47875 | 47875 | 47875 | 47875 | 47890 | 47890 |
null | "Angola" | -11.2027 | 17.8739 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105184 | 105205 | 105205 | 105205 | 105205 | 105205 | 105255 | 105255 | 105277 | 105277 | 105277 | 105277 | 105277 | 105277 | 105277 | 105288 | 105288 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
null | "West Bank and Gaza" | 31.9522 | 35.2332 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 |
null | "Winter Olympics 2022" | 39.9042 | 116.4074 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 |
null | "Yemen" | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 |
null | "Zambia" | -13.133897 | 27.849332 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 340944 | 341261 | 341397 | 341573 | 341573 | 341690 | 341690 | 341936 | 342114 | 342114 | 342288 | 342288 | 342317 | 342317 | 342317 | 342317 | 342317 | 342317 | 342317 | 342724 | 342782 | 342831 | 342831 | 342831 | 342831 | 342831 | 342831 | 343012 | 343012 | 343079 | 343079 | 343079 | 343135 | 343135 | 343135 | 343135 | 343135 |
null | "Zimbabwe" | -19.015438 | 29.154857 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 262324 | 262324 | 262324 | 262324 | 262324 | 262324 | 262324 | 263083 | 263083 | 263083 | 263083 | 263083 | 263083 | 263083 | 263642 | 263642 | 263642 | 263642 | 263642 | 263642 | 263642 | 263921 | 263921 | 263921 | 263921 | 263921 | 263921 | 263921 | 264127 | 264127 | 264127 | 264127 | 264127 | 264127 | 264127 | 264276 | 264276 |
# Quick summary of the data
cases.describe()
statistic | Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | 1/30/20 | 1/31/20 | 2/1/20 | 2/2/20 | 2/3/20 | 2/4/20 | 2/5/20 | 2/6/20 | 2/7/20 | 2/8/20 | 2/9/20 | 2/10/20 | 2/11/20 | 2/12/20 | 2/13/20 | 2/14/20 | 2/15/20 | 2/16/20 | 2/17/20 | 2/18/20 | 2/19/20 | 2/20/20 | 2/21/20 | 2/22/20 | … | 2/1/23 | 2/2/23 | 2/3/23 | 2/4/23 | 2/5/23 | 2/6/23 | 2/7/23 | 2/8/23 | 2/9/23 | 2/10/23 | 2/11/23 | 2/12/23 | 2/13/23 | 2/14/23 | 2/15/23 | 2/16/23 | 2/17/23 | 2/18/23 | 2/19/23 | 2/20/23 | 2/21/23 | 2/22/23 | 2/23/23 | 2/24/23 | 2/25/23 | 2/26/23 | 2/27/23 | 2/28/23 | 3/1/23 | 3/2/23 | 3/3/23 | 3/4/23 | 3/5/23 | 3/6/23 | 3/7/23 | 3/8/23 | 3/9/23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | … | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
"count" | "91" | "289" | 287.0 | 287.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | … | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 | 289.0 |
"null_count" | "198" | "0" | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
"mean" | null | null | 19.718719 | 22.182084 | 1.927336 | 2.273356 | 3.266436 | 4.972318 | 7.33564 | 10.134948 | 19.307958 | 21.346021 | 28.50173 | 34.349481 | 41.653979 | 58.086505 | 68.813149 | 82.695502 | 95.653979 | 106.595156 | 119.031142 | 128.480969 | 138.968858 | 147.99654 | 155.065744 | 156.512111 | 208.941176 | 231.529412 | 238.944637 | 246.49827 | 253.539792 | 260.051903 | 261.782007 | 263.723183 | 265.903114 | 272.0 | … | 2.3219e6 | 2.3228e6 | 2.3236e6 | 2.3240e6 | 2.3243e6 | 2.3248e6 | 2.3255e6 | 2.3263e6 | 2.3272e6 | 2.3277e6 | 2.3281e6 | 2.3284e6 | 2.3289e6 | 2.3295e6 | 2.3304e6 | 2.3311e6 | 2.3318e6 | 2.3321e6 | 2.3324e6 | 2.3327e6 | 2.3333e6 | 2.3342e6 | 2.3349e6 | 2.3354e6 | 2.3356e6 | 2.3358e6 | 2.3362e6 | 2.3368e6 | 2.3375e6 | 2.3382e6 | 2.3388e6 | 2.3390e6 | 2.3392e6 | 2.3394e6 | 2.3398e6 | 2.3405e6 | 2.3411e6 |
"std" | null | null | 25.956609 | 77.870931 | 26.173664 | 26.270191 | 32.707271 | 45.523871 | 63.623197 | 85.724481 | 210.329649 | 211.628535 | 291.630499 | 345.759248 | 425.471558 | 661.171033 | 799.258224 | 985.038344 | 1160.785189 | 1304.872173 | 1472.006165 | 1598.378021 | 1747.162098 | 1870.449899 | 1966.785835 | 1967.121793 | 2838.127785 | 3202.384554 | 3310.754868 | 3424.390905 | 3530.604364 | 3630.11703 | 3650.660287 | 3674.805102 | 3687.812705 | 3771.362586 | … | 8.4392e6 | 8.4439e6 | 8.4473e6 | 8.4483e6 | 8.4491e6 | 8.4511e6 | 8.4544e6 | 8.4592e6 | 8.4639e6 | 8.4655e6 | 8.4669e6 | 8.4674e6 | 8.4695e6 | 8.4726e6 | 8.4776e6 | 8.4812e6 | 8.4843e6 | 8.4852e6 | 8.4857e6 | 8.4865e6 | 8.4900e6 | 8.4957e6 | 8.4993e6 | 8.5010e6 | 8.5014e6 | 8.5019e6 | 8.5037e6 | 8.5066e6 | 8.5113e6 | 8.5145e6 | 8.5180e6 | 8.5184e6 | 8.5186e6 | 8.5193e6 | 8.5216e6 | 8.5250e6 | 8.5278e6 |
"min" | "Alberta" | "Afghanistan" | -71.9499 | -178.1165 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
"25%" | null | null | 4.210484 | -23.0418 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 | 14567.0 |
"50%" | null | null | 21.512583 | 20.9394 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 | 103248.0 |
"75%" | null | null | 40.463667 | 90.3563 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 1.049457e6 | 1.049537e6 | 1.04964e6 | 1.049729e6 | 1.049828e6 | 1.049944e6 | 1.049944e6 | 1.05e6 | 1.050127e6 | 1.050241e6 | 1.05033e6 | 1.050402e6 | 1.050402e6 | 1.050549e6 | 1.050621e6 | 1.050736e6 | 1.050824e6 | 1.050961e6 | 1.051079e6 | 1.051162e6 | 1.051241e6 | 1.051336e6 | 1.051457e6 | 1.051589e6 | 1.051732e6 | 1.051862e6 | 1.05193e6 | 1.051998e6 | 1.052122e6 | 1.052247e6 | 1.052382e6 | 1.052519e6 | 1.052664e6 | 1.052664e6 | 1.052926e6 | 1.053068e6 | 1.053213e6 |
"max" | "Zhejiang" | "Zimbabwe" | 71.7069 | 178.065 | 444.0 | 444.0 | 549.0 | 761.0 | 1058.0 | 1423.0 | 3554.0 | 3554.0 | 4903.0 | 5806.0 | 7153.0 | 11177.0 | 13522.0 | 16678.0 | 19665.0 | 22112.0 | 24953.0 | 27100.0 | 29631.0 | 31728.0 | 33366.0 | 33366.0 | 48206.0 | 54406.0 | 56249.0 | 58182.0 | 59989.0 | 61682.0 | 62031.0 | 62442.0 | 62662.0 | 64084.0 | … | 1.02479379e8 | 1.02561054e8 | 1.02599784e8 | 1.02603942e8 | 1.02606868e8 | 1.02631179e8 | 1.02676694e8 | 1.02760211e8 | 1.02842061e8 | 1.02855101e8 | 1.0285951e8 | 1.02862879e8 | 1.02886005e8 | 1.02934224e8 | 1.03023231e8 | 1.0308391e8 | 1.03131898e8 | 1.03134605e8 | 1.03136077e8 | 1.03138119e8 | 1.03198669e8 | 1.03308832e8 | 1.03365511e8 | 1.03378408e8 | 1.03381157e8 | 1.03382763e8 | 1.03399827e8 | 1.03443455e8 | 1.03533872e8 | 1.03589757e8 | 1.0364869e8 | 1.03650837e8 | 1.03646975e8 | 1.03655539e8 | 1.0369091e8 | 1.03755771e8 | 1.03802702e8 |
Of course, this value is meaningless for Lat
and Long
!
# Data types of the various columns
# cases.dtypes
# cases.info()
cases.shape
(289, 1147)
Cases per country by date
The dataset is a time series: this means that we have the cumulative numbers up to each date.
# Let's get rid of the latitude and longitude to simplify our data
= cases.drop(columns=['Lat', 'Long'])
simple simple
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 2 1 # Let's get rid of the latitude and longitude to simplify our data ----> 2 simple = cases.drop(columns=['Lat', 'Long']) 3 simple TypeError: DataFrame.drop() got an unexpected keyword argument 'columns'
Some countries (e.g. Australia) are split between several provinces or states so we will have to add the values of all their provinces/states to get their totals.
Here is how to make the sum for all Australian states:
Let’s first select all the data for Australia: we want all the rows for which the Country/Region
column is equal to Australia
.
First, we want to select the Country/Region
column. There are several ways to index in Polars.
When indexing columns, one can use square brackets directly after the DataFrame to index:
'Country/Region'] simple[
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[8], line 1 ----> 1 simple['Country/Region'] NameError: name 'simple' is not defined
However, it is more efficient to use the .loc
or .iloc
methods.
- Use
.loc
when using labels or booleans:
# simple.loc[:, 'Country/Region']
- Use
.iloc
when using indices:
# simple.iloc[:, 1]
Country/Region
is the 2nd column, but indexing starts at 0 in Python.
Then we need a conditional to filter the rows for which the value is equal to Australia
:
# simple.loc[:, 'Country/Region'] == 'Australia'
Finally, we index, out of our entire data frame, the rows for which that condition returns True
:
# simple.loc[simple.loc[:, 'Country/Region'] == 'Australia']
Here we use .loc
to index based on a boolean array.
We can now make the sum for all of Australia for each day:
= simple.loc[simple.loc[:, 'Country/Region'] == 'Australia'].sum(numeric_only=True)
total_australia total_australia
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[13], line 1 ----> 1 total_australia = simple.loc[simple.loc[:, 'Country/Region'] == 'Australia'].sum(numeric_only=True) 2 total_australia NameError: name 'simple' is not defined
We can do this for all countries by grouping them:
= simple.groupby('Country/Region').sum(numeric_only=True)
totals totals
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[14], line 1 ----> 1 totals = simple.groupby('Country/Region').sum(numeric_only=True) 2 totals NameError: name 'simple' is not defined
Now, we can look at the totals for any date:
'6/12/21'] totals.loc[:,
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[15], line 1 ----> 1 totals.loc[:, '6/12/21'] NameError: name 'totals' is not defined
To make it easier to read, let’s order those numbers by decreasing order:
'6/12/21'].sort_values(ascending=False) totals.loc[:,
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[16], line 1 ----> 1 totals.loc[:, '6/12/21'].sort_values(ascending=False) NameError: name 'totals' is not defined
We can also index the data for a particular country by indexing a row instead of a column:
'Albania', :] totals.loc[
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[17], line 1 ----> 1 totals.loc['Albania', :] NameError: name 'totals' is not defined
When indexing rows, this syntax can be simplified to:
'Albania'] totals.loc[
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[18], line 1 ----> 1 totals.loc['Albania'] NameError: name 'totals' is not defined
Global totals
Now, what if we want to have the world totals for each day? We calculate the columns totals (i.e. the sum across countries):
sum() totals.
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[19], line 1 ----> 1 totals.sum() NameError: name 'totals' is not defined
Your turn:
How many confirmed cases were there in Venezuela by March 10, 2021?