The future package

Author

Marie-Hélène Burle

The future package is a modern package that brings a consistent and simple API for all evaluation strategies of futures in R.

Excellent backends have been built on top of it.

Classic parallel packages in R

We talked in the previous section about various types of parallelism. Several options exist in R to run code in shared-memory or distributed parallelism.

Examples of options for shared-memory parallelism:

  • The foreach package with backends such as doMC, now also part of the doParallel package.
  • mclapply() and mcmapply() from the parallel package (part of the core distribution of R).

Examples of options for distributed parallelism:

  • The foreach package with backends such as doSNOW, now also part of the doParallel package.
  • The suite of clusterApply() and par*apply() functions from the parallel package.

The parallel package is a merger of the former multicore package for shared-memory and of the snow package for distributed parallelism.

Similarly, the doParallel package is merger of the doMC package for use with foreach in shared-memory and the doSNOW package for use with foreach for distributed parallelism.

The future package

The future package opened up a new landscape in the world of parallel R by providing a simple and consistent API for the evaluation of futures sequentially, through shared-memory parallelism, or through distributed parallelism.

A future is an object that acts as an abstract representation for a value in the future. A future can be resolved (if the value has been computed) or unresolved. If the value is queried while the future is unresolved, the process is blocked until the future is resolved. Futures thus allow for asynchronous and parallel evaluations.

The evaluation strategy is set with the plan() function:

  • plan(sequential):
    Futures are evaluated sequentially in the current R session.

  • plan(multisession):
    Futures are evaluated by new R sessions spawned in the background (multi-processing in shared memory).

  • plan(multicore):
    Futures are evaluated in processes forked from the existing process (multi-processing in shared memory).

  • plan(cluster):
    Futures are evaluated on an ad-hoc cluster (distributed parallelism across multiple nodes).

Consistency

To ensure a consistent behaviour across plans, all evaluations are done in a local environment:

library(future)

a <- 1

b %<-% {      # %<-% creates futures
  a <- 2
}

a
[1] 1

the future ecosystem

Several great packages have been built on top of the future API.

  • The doFuture package allows to parallelize foreach expressions on the future evaluation strategies.
  • Similarly, the future.apply package parallelizes the *apply() functions on these strategies.
  • The furrr package provides a parallel version of purrr for those who prefer this approach to functional programming.
  • The future.callr package implements a future evaluation based on callr that resolves every future in a new R session. This removes any limitation on the number of background R parallel processes that can be active at the same time.
  • The future.batchtools package implements a future evaluation based on the batchtools package—a package that provides functions to interact with HPC systems schedulers such as Slurm.

In this course, we will cover foreach with doFuture in great details to explain all the important concepts. After that, you will be able to use any of these backends easily.