# Optimizations

Author

Marie-Hélène Burle

A lot of hardware is not the answer to poorly written code. Before considering parallelization, you should think about ways to optimize your code sequentially.

Why?

• Not all code can be parallelized.
• Parallelization is costly (waiting time to access a cluster or money).
• The optimization of the sequential code will also benefit the parallel code.

In many cases, writing better code will save you more computing time than parallelization.

In this section, we will cover several principles by playing with the programmatic implementation of the fizz buzz game.

## Toy example

Fizz buzz is a children game to practice divisions. Players take turn counting out loud while replacing:

• any number divisible by 3 with the word “Fizz”,
• any number divisible by 5 with the word “Buzz”,
• any number divisible by both 3 and 5 with the word “FizzBuzz”.

Let’s write functions to solve the game and time them to draw some general principles about more efficient code.

We will use `bench::mark()` to benchmark our solutions. Let’s load it:

``library(bench)``

## Pre-allocate memory

In this first function, we create an empty object `z` of class integer and of length `0` that will hold the result of a loop, then we run the loop and at each iteration, we add a new value to `z`:

``````f1 <- function(n) {
z <- integer()
for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(i %% 3 == 0) {
z[i] <- "Fizz"
} else if(i %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- i
}
}
z
}``````

The second function is very similar, but this time, we create an empty object `z` of class integer and of length matching the final length `z` will have after running the loop. This means that we are pre-allocating memory for the full vector before we run the loop instead of growing the vector at each iteration:

``````f2 <- function(n) {
z <- integer(n)
for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(i %% 3 == 0) {
z[i] <- "Fizz"
} else if(i %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- i
}
}
z
}``````

Let’s make sure that our functions work by testing it on a small number:

``f1(20)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````
``f2(20)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````

Now, let’s benchmark them for a large number:

``````n <- 1e5
mark(f1(n), f2(n))``````
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression      min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f1(n)         136ms    137ms      7.18   16.55MB     25.1
2 f2(n)         124ms    126ms      7.89    1.15MB     31.6``````

`f2()` is consistently faster. While in this example the difference is very slight, pre-allocating the object that will hold the result of a loop before running the loop can make a huge difference.

## Aren’t loops a big ‘no no’ in R?

By now, you might be thinking: “Wait… aren’t loops a big ‘no no’ in R? I’ve always been told that they are slow and that one should always use functional programming! We are talking about optimization in this course and we are using loops?!?”

There are a lot of misconceptions around R loops. They can be very slow if you don’t pre-allocate memory. Otherwise they are almost always faster than functions (the `apply()` family or the tidyverse equivalent of the `purrr::map()` family). You can choose to use a functional programming approach for style and readability, but not for speed.

Let’s test it.

First we create a function:

``````fb <- function(n) {
if(n %% 3 == 0 && n %% 5 == 0) {
"FizzBuzz"
} else if(n %% 3 == 0) {
"Fizz"
} else if(n %% 5 == 0) {
"Buzz"
} else {
n
}
}``````

Then we pass it through `sapply()`. We can test that it works on a small number:

``sapply(1:20, fb)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````

Finally, we compare the timing with that of `f2()`:

``mark(f2(n), sapply(1:n, fb))``
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression           min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)              140ms    143ms      6.95    1.15MB     31.3
2 sapply(1:n, fb)    172ms    190ms      5.34    3.29MB     28.5``````

As you can see, the loop is faster.

## Avoid unnecessary operations

### Example 1

Calling `z` as the last command in our function is the same as calling `return(z)`.

From the R documentation:

If the end of a function is reached without calling return, the value of the last evaluated expression is returned.

Now, what about using `print()` instead?

``````f3 <- function(n) {
z <- integer(n)
for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(i %% 3 == 0) {
z[i] <- "Fizz"
} else if(i %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- i
}
}
print(z)
}``````

Let’s benchmark it against `f2()`:

``mark(f2(n), f3(n))``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"     "Fizz"     "22"       "23"       "Fizz"
[25] "Buzz"     "26"       "Fizz"     "28"       "29"       "FizzBuzz"
[31] "31"       "32"       "Fizz"     "34"       "Buzz"     "Fizz"
[37] "37"       "38"       "Fizz"     "Buzz"     "41"       "Fizz"
...

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression      min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(1e+05)  151.88ms 157.65ms     6.30     1.25MB     29.9
2 f3(1e+05)     3.25s    3.25s     0.308    1.04GB     26.8``````

What happened?

`print()` returns its argument, but it additionally prints it to the standard output. This is why the `mark()` function printed the output of `f3()` before printing the timings.

As you can see, printing takes a long time.

The code in this website is run by Quarto. Since, by default, RStudio will only print the first 1,000 results, the timing you will get for `f3()` in RStudio will be much less bad as it won’t include the time it takes to print the remaining 99,000 results.

If you are evaluating `f2()` on its own (e.g. `f2(20)`), the returned result will also be printed to standard output and both functions will be equivalent. However, if you are using the function in another context, printing becomes an unnecessary and timely operation and `f3()` would be a very bad option. `f3()` is thus not a good function.

Here is an example in which `f3()` would perform a totally unnecessary operation that `f2()` avoids:

``a <- f2(20)``

No unnecessary printing.

``a <- f3(20)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````

Unnecessary printing.

For 1e5, the difference in timing between running an unnecessary printing vs not is a factor of 21!

Even worse would be to use:

``````f4 <- function(n) {
for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
print("FizzBuzz")
} else if(i %% 3 == 0) {
print("Fizz")
} else if(i %% 5 == 0) {
print("Buzz")
} else {
print(i)
}
}
}``````

Here the difference in timing is a factor of 50…

### Example 2

One modulo operation and equality test can be removed by replacing `i %% 3 == 0 && i %% 5 == 0` by `i %% 15 == 0`. The difference isn’t huge, but there is a slight speedup:

``````f2bis <- function(n) {
z <- integer(n)
for(i in 1:n) {
if(i %% 15 == 0) {
z[i] <- "FizzBuzz"
} else if(i %% 3 == 0) {
z[i] <- "Fizz"
} else if(i %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- i
}
}
z
}

mark(f2(n), f2bis(n))``````
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression      min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)         121ms    125ms      7.98    1.25MB     28.7
2 f2bis(n)      116ms    117ms      8.45    1.22MB     30.4``````

### Example 3

But is `f2()` really the fastest function?

Louis Arsenault-Mahjoubi—who attended this workshop—found ways to get rid of several operations and get a speedup of 1.7 over `f2()`.

First, we can assign `1:n` to `z` instead of pre-allocating memory with an empty vector, thus rendering the assignment of `i` to `z[i]` unnecessary in the last else statement:

``````f_louis1 <- function(n) {
z <- 1:n
for(i in z) {
if(i %% 3 == 0 && i %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(i %% 3 == 0) {
z[i] <- "Fizz"
} else if(i %% 5 == 0) {
z[i] <- "Buzz"
}
}
z
}``````

This function works:

``f_louis1(20)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````

… and is faster (speedup of 1.3):

``mark(f2bis(n), f_louis1(n))``
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression       min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)     118.1ms  121.3ms      8.18    1.17MB     27.8
2 f_louis1(n)   88.6ms   91.5ms     10.6     1.24MB     35.2``````

Then, we can prevent the repetitions of the modulo operations and equality tests by saving them to variables:

``````f_louis2 <- function(n) {
z <- 1:n
for(i in z) {
div3 <- (i %% 3 == 0)
div5 <- (i %% 5 == 0)
if(div3 && div5) {
z[i] <- "FizzBuzz"
} else if(div3) {
z[i] <- "Fizz"
} else if(div5) {
z[i] <- "Buzz"
}
}
z
}``````

This gets us an even greater speedup of 1.7:

``mark(f2bis(n), f_louis2(n))``
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression       min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)     127.1ms  136.1ms      7.15    1.15MB     32.2
2 f_louis2(n)   75.8ms   78.5ms     12.8     1.22MB     34.7``````

But it gets even better: we can actually get rid of the for loop!

``````f_louis3 <- function(n) {
z <- 1:n
div3 <- (z %% 3 == 0)
div5 <- (z %% 5 == 0)
z[which(div3)] <- "Fizz"
z[which(div5)] <- "Buzz"
z[which(div3 & div5)] <- "FizzBuzz"
z
}``````

Now we get a speedup of 5.5 compared to our best `f2` function:

``mark(f2bis(n), f_louis3(n))``
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression       min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)     119.4ms  122.5ms      8.06    1.17MB    27.4
2 f_louis3(n)   24.1ms   25.3ms     37.2      5.2MB     3.92``````

You can ensure that we still get the same result:

``f_louis3(20)``
`````` [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"
[7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"
[19] "19"       "Buzz"    ``````

## Replace costly operations where possible

Now imagine that we have a dataframe called `dat` with a first column called `datvar` filled with integers.

We want to write a function that will accept our dataframe as argument and play the fizz buzz game on the column `datvar`.

One could imagine the following function:

``````f5 <- function(dat) {
z <- integer(length(dat[[1]]))
for(i in seq_along(dat[[1]])) {
if(dat[[1]][i] %% 3 == 0 && dat[[1]][i] %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(dat[[1]][i] %% 3 == 0) {
z[i] <- "Fizz"
} else if(dat[[1]][i] %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- dat[[1]][i]
}
}
z
}``````

Indexing a column from a dataframe in this fashion is a very costly operation. It is much more efficient to index with the name of the column:

``````f6 <- function(dat) {
z <- integer(length(dat\$datvar))
for(i in seq_along(dat\$datvar)) {
if(dat\$datvar[i] %% 3 == 0 && dat\$datvar[i] %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(dat\$datvar[i] %% 3 == 0) {
z[i] <- "Fizz"
} else if(dat\$datvar[i] %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- dat\$datvar[i]
}
}
z
}``````

Now, let’s create a random dataframe to benchmark `f5()` and `f6()`:

``````set.seed(123)
dat <- data.frame(datvar = round(runif(n, 1, n)))
mark(f5(dat), f6(dat))``````
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 2 × 6
expression      min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(dat)       1.99s    1.99s     0.503    1.26MB     22.6
2 f6(dat)    347.21ms 380.61ms     2.63     1.26MB     23.6``````

## Avoid repetitions of costly operations

This made a big difference (speedup of 5), but notice that we are indexing the column 6 times in our function. Let’s remove the repetition of this operation:

``````f7 <- function(dat) {
var <- dat\$datvar
z <- integer(length(var))
for(i in seq_along(var)) {
if(var[i] %% 3 == 0 && var[i] %% 5 == 0) {
z[i] <- "FizzBuzz"
} else if(var[i] %% 3 == 0) {
z[i] <- "Fizz"
} else if(var[i] %% 5 == 0) {
z[i] <- "Buzz"
} else {
z[i] <- var[i]
}
}
z
}``````

Let’s benchmark all 3 versions:

``mark(f5(dat), f6(dat), f7(dat))``
``````Warning: Some expressions had a GC in every iteration; so filtering is
disabled.``````
``````# A tibble: 3 × 6
expression      min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(dat)       1.79s    1.79s     0.557    1.15MB     30.1
2 f6(dat)    376.47ms 381.31ms     2.62     1.15MB     26.2
3 f7(dat)    161.25ms 165.38ms     5.94     1.25MB     17.8``````

`f7()` gave us another speedup of almost 3 over `f6()`. `f7()` runs 14 times faster than our initial function!

Indexing from a vector isn’t costly. There is thus no advantage at removing the repetition of that operation.