# Data types and structures

Author

Marie-Hélène Burle

This section covers the various data types and structures available in R.

## Summary of structures

Dimension Homogeneous Heterogeneous
1 d Atomic vector List
2 d Matrix Data frame
3 d Array

## Atomic vectors

### With a single element

``````a <- 2
a``````
`` 2``
``typeof(a)``
`` "double"``
``str(a)``
`` num 2``
``length(a)``
`` 1``
``dim(a)``
``NULL``

The `dim` attribute of a vector doesn’t exist (hence the `NULL`). This makes vectors different from one-dimensional arrays which have a `dim` of `1`.

You might have noticed that `2` is a double (double precision floating point number, equivalent of “float” in other languages). In R, this is the default, even if you don’t type `2.0`. This prevents the kind of weirdness you can find in, for instance, Python.

In Python:

``````>>> 2 == 2.0
True
>>> type(2) == type(2.0)
False
>>> type(2)
<class 'int'>
>>> type(2.0)
<class 'float'>``````

In R:

``````> 2 == 2.0
 TRUE
> typeof(2) == typeof(2.0)
 TRUE
> typeof(2)
 "double"
> typeof(2.0)
 "double"``````

If you want to define an integer variable, you use:

``````b <- 2L
b``````
`` 2``
``typeof(b)``
`` "integer"``
``mode(b)``
`` "numeric"``
``str(b)``
`` int 2``

There are six vector types:

• logical
• integer
• double
• character
• complex
• raw

### With multiple elements

``````c <- c(2, 4, 1)
c``````
`` 2 4 1``
``typeof(c)``
`` "double"``
``mode(c)``
`` "numeric"``
``str(c)``
`` num [1:3] 2 4 1``
``````d <- c(TRUE, TRUE, NA, FALSE)
d``````
``  TRUE  TRUE    NA FALSE``
``typeof(d)``
`` "logical"``
``str(d)``
`` logi [1:4] TRUE TRUE NA FALSE``

`NA` (“Not Available”) is a logical constant of length one. It is an indicator for a missing value.

Vectors are homogeneous, so all elements need to be of the same type.

If you use elements of different types, R will convert some of them to ensure that they become of the same type:

``````e <- c("This is a string", 3, "test")
e``````
`` "This is a string" "3"                "test"            ``
``typeof(e)``
`` "character"``
``str(e)``
`` chr [1:3] "This is a string" "3" "test"``
``````f <- c(TRUE, 3, FALSE)
f``````
`` 1 3 0``
``typeof(f)``
`` "double"``
``str(f)``
`` num [1:3] 1 3 0``
``````g <- c(2L, 3, 4L)
g``````
`` 2 3 4``
``typeof(g)``
`` "double"``
``str(g)``
`` num [1:3] 2 3 4``
``````h <- c("string", TRUE, 2L, 3.1)
h``````
`` "string" "TRUE"   "2"      "3.1"   ``
``typeof(h)``
`` "character"``
``str(h)``
`` chr [1:4] "string" "TRUE" "2" "3.1"``

The binary operator `:` is equivalent to the `seq()` function and generates a regular sequence of integers:

``````i <- 1:5
i``````
`` 1 2 3 4 5``
``typeof(i)``
`` "integer"``
``str(i)``
`` int [1:5] 1 2 3 4 5``
``identical(2:8, seq(2, 8))``
`` TRUE``

## Matrices

``````j <- matrix(1:12, nrow = 3, ncol = 4)
j``````
``````     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12``````
``typeof(j)``
`` "integer"``
``str(j)``
`` int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...``
``length(j)``
`` 12``
``dim(j)``
`` 3 4``

The default is `byrow = FALSE`. If you want the matrix to be filled in by row, you need to set this argument to `TRUE`:

``````k <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
k``````
``````     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12``````

## Arrays

``````l <- array(as.double(1:24), c(3, 2, 4))
l``````
``````, , 1

[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

[,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

[,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

[,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24``````
``typeof(l)``
`` "double"``
``str(l)``
`` num [1:3, 1:2, 1:4] 1 2 3 4 5 6 7 8 9 10 ...``
``length(l)``
`` 24``
``dim(l)``
`` 3 2 4``

## Lists

``````m <- list(2, 3)
m``````
``````[]
 2

[]
 3``````
``typeof(m)``
`` "list"``
``str(m)``
``````List of 2
\$ : num 2
\$ : num 3``````
``length(m)``
`` 2``
``dim(m)``
``NULL``

As with atomic vectors, lists do not have a `dim` attribute. Lists are in fact a different type of vectors.

Lists can be heterogeneous:

``````n <- list(2L, 3, c(2, 1), FALSE, "string")
n``````
``````[]
 2

[]
 3

[]
 2 1

[]
 FALSE

[]
 "string"``````
``typeof(n)``
`` "list"``
``str(n)``
``````List of 5
\$ : int 2
\$ : num 3
\$ : num [1:2] 2 1
\$ : logi FALSE
\$ : chr "string"``````
``length(n)``
`` 5``

## Data frames

Data frames contain tabular data. Under the hood, a data frame is a list of vectors.

``````o <- data.frame(
var = c(2.9, 3.1, 4.5)
)
o``````
``````  country var
``typeof(o)``
`` "list"``
``str(o)``
``````'data.frame':   3 obs. of  2 variables:
``length(o)``
`` 2``
``dim(o)``
`` 3 2``