Data types and structures

Author

Marie-Hélène Burle

This section covers the various data types and structures available in R.

Summary of structures

Dimension Homogeneous Heterogeneous
1 d Atomic vector List
2 d Matrix Data frame
3 d Array

Atomic vectors

With a single element

a <- 2
a
[1] 2
typeof(a)
[1] "double"
str(a)
 num 2
length(a)
[1] 1
dim(a)
NULL

The dim attribute of a vector doesn’t exist (hence the NULL). This makes vectors different from one-dimensional arrays which have a dim of 1.

You might have noticed that 2 is a double (double precision floating point number, equivalent of “float” in other languages). In R, this is the default, even if you don’t type 2.0. This prevents the kind of weirdness you can find in, for instance, Python.

In Python:

>>> 2 == 2.0
True
>>> type(2) == type(2.0)
False
>>> type(2)
<class 'int'>
>>> type(2.0)
<class 'float'>

In R:

> 2 == 2.0
[1] TRUE
> typeof(2) == typeof(2.0)
[1] TRUE
> typeof(2)
[1] "double"
> typeof(2.0)
[1] "double"

If you want to define an integer variable, you use:

b <- 2L
b
[1] 2
typeof(b)
[1] "integer"
mode(b)
[1] "numeric"
str(b)
 int 2

There are six vector types:

  • logical
  • integer
  • double
  • character
  • complex
  • raw

With multiple elements

c <- c(2, 4, 1)
c
[1] 2 4 1
typeof(c)
[1] "double"
mode(c)
[1] "numeric"
str(c)
 num [1:3] 2 4 1
d <- c(TRUE, TRUE, NA, FALSE)
d
[1]  TRUE  TRUE    NA FALSE
typeof(d)
[1] "logical"
str(d)
 logi [1:4] TRUE TRUE NA FALSE

NA (“Not Available”) is a logical constant of length one. It is an indicator for a missing value.

Vectors are homogeneous, so all elements need to be of the same type.

If you use elements of different types, R will convert some of them to ensure that they become of the same type:

e <- c("This is a string", 3, "test")
e
[1] "This is a string" "3"                "test"            
typeof(e)
[1] "character"
str(e)
 chr [1:3] "This is a string" "3" "test"
f <- c(TRUE, 3, FALSE)
f
[1] 1 3 0
typeof(f)
[1] "double"
str(f)
 num [1:3] 1 3 0
g <- c(2L, 3, 4L)
g
[1] 2 3 4
typeof(g)
[1] "double"
str(g)
 num [1:3] 2 3 4
h <- c("string", TRUE, 2L, 3.1)
h
[1] "string" "TRUE"   "2"      "3.1"   
typeof(h)
[1] "character"
str(h)
 chr [1:4] "string" "TRUE" "2" "3.1"

The binary operator : is equivalent to the seq() function and generates a regular sequence of integers:

i <- 1:5
i
[1] 1 2 3 4 5
typeof(i)
[1] "integer"
str(i)
 int [1:5] 1 2 3 4 5
identical(2:8, seq(2, 8))
[1] TRUE

Matrices

j <- matrix(1:12, nrow = 3, ncol = 4)
j
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
typeof(j)
[1] "integer"
str(j)
 int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(j)
[1] 12
dim(j)
[1] 3 4

The default is byrow = FALSE. If you want the matrix to be filled in by row, you need to set this argument to TRUE:

k <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
k
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

Arrays

l <- array(as.double(1:24), c(3, 2, 4))
l
, , 1

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

     [,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

     [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

     [,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24
typeof(l)
[1] "double"
str(l)
 num [1:3, 1:2, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(l)
[1] 24
dim(l)
[1] 3 2 4

Lists

m <- list(2, 3)
m
[[1]]
[1] 2

[[2]]
[1] 3
typeof(m)
[1] "list"
str(m)
List of 2
 $ : num 2
 $ : num 3
length(m)
[1] 2
dim(m)
NULL

As with atomic vectors, lists do not have a dim attribute. Lists are in fact a different type of vectors.

Lists can be heterogeneous:

n <- list(2L, 3, c(2, 1), FALSE, "string")
n
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 2 1

[[4]]
[1] FALSE

[[5]]
[1] "string"
typeof(n)
[1] "list"
str(n)
List of 5
 $ : int 2
 $ : num 3
 $ : num [1:2] 2 1
 $ : logi FALSE
 $ : chr "string"
length(n)
[1] 5

Data frames

Data frames contain tabular data. Under the hood, a data frame is a list of vectors.

o <- data.frame(
  country = c("Canada", "USA", "Mexico"),
  var = c(2.9, 3.1, 4.5)
)
o
  country var
1  Canada 2.9
2     USA 3.1
3  Mexico 4.5
typeof(o)
[1] "list"
str(o)
'data.frame':   3 obs. of  2 variables:
 $ country: chr  "Canada" "USA" "Mexico"
 $ var    : num  2.9 3.1 4.5
length(o)
[1] 2
dim(o)
[1] 3 2