Automation

Author

Marie-Hélène Burle

One of the strengths of programming is the ability to automate tasks.

In this section, we will see how a loop can automate the creation of file names.

Let’s say that we now want to import data from 5 files arc1.csv, …, arc5.csv and create 5 data frames with their data.

We need a character vector with the file names.

We could create it this way:

files <- c(
  "https://mint.westdri.ca/r/hss_data/arc1.csv",
  "https://mint.westdri.ca/r/hss_data/arc2.csv",
  "https://mint.westdri.ca/r/hss_data/arc3.csv",
  "https://mint.westdri.ca/r/hss_data/arc4.csv",
  "https://mint.westdri.ca/r/hss_data/arc5.csv"
)

It works of course:

files
[1] "https://mint.westdri.ca/r/hss_data/arc1.csv"
[2] "https://mint.westdri.ca/r/hss_data/arc2.csv"
[3] "https://mint.westdri.ca/r/hss_data/arc3.csv"
[4] "https://mint.westdri.ca/r/hss_data/arc4.csv"
[5] "https://mint.westdri.ca/r/hss_data/arc5.csv"

But if we had 50 files instead of 5, it would be quite a tedium! And if we had 500 files, it would be unrealistic. A better approach is to write a loop.

In order to store the results of a loop, we need to create an empty object and assign to it the result of the loop at each iteration. It is very important to pre-allocate memory: by creating an empty object of the final size, the necessary memory to hold this object is requested once (then the object gets filled in while the loop runs). Without this, more memory would have to be allocated at each iteration of the loop and this is highly inefficient.

So let’s create an empty vector of length 5 and of type character:

files <- character(5)

Now we can fill in our vector with the proper values with the loop:

for (i in 1:5) {
  files[i] <- paste0("https://mint.westdri.ca/r/hss_data/arc", i, ".csv")
}

This gives us the same result, but the big difference is that it is scalable:

files
[1] "https://mint.westdri.ca/r/hss_data/arc1.csv"
[2] "https://mint.westdri.ca/r/hss_data/arc2.csv"
[3] "https://mint.westdri.ca/r/hss_data/arc3.csv"
[4] "https://mint.westdri.ca/r/hss_data/arc4.csv"
[5] "https://mint.westdri.ca/r/hss_data/arc5.csv"

Now, if our files were not named following such a nice sequence, we would have to modify our loop a little. Below are two examples:

files <- character(5)

for (i in seq_along(c(3, 6, 9, 10, 14))) {
  files[i] <- paste0(
    "https://mint.westdri.ca/r/hss_data/arc",
    c(3, 6, 9, 10, 14)[i],
    ".csv"
  )
}

files
[1] "https://mint.westdri.ca/r/hss_data/arc3.csv" 
[2] "https://mint.westdri.ca/r/hss_data/arc6.csv" 
[3] "https://mint.westdri.ca/r/hss_data/arc9.csv" 
[4] "https://mint.westdri.ca/r/hss_data/arc10.csv"
[5] "https://mint.westdri.ca/r/hss_data/arc14.csv"
files <- character(5)

for (i in seq_along(c("_a", "_b", "_c", "_d", "_e"))) {
  files[i] <- paste0(
    "https://mint.westdri.ca/r/hss_data/arc",
    c("_a", "_b", "_c", "_d", "_e")[i],
    ".csv"
  )
}

files
[1] "https://mint.westdri.ca/r/hss_data/arc_a.csv"
[2] "https://mint.westdri.ca/r/hss_data/arc_b.csv"
[3] "https://mint.westdri.ca/r/hss_data/arc_c.csv"
[4] "https://mint.westdri.ca/r/hss_data/arc_d.csv"
[5] "https://mint.westdri.ca/r/hss_data/arc_e.csv"

If you had all the files in one directory, an alternative approach would be to create a list of all the names matching a regular expression.

In our case, we would use:

files <- list.files(pattern="^arc\\d+\\.csv$")