# Partitioning data with multidplyr

Author

Marie-Hélène Burle

The package `multidplyr` provides simple techniques to partition data across a set of workers (multicore parallelism) on the same or different nodes.

## Create a cluster of workers

Let’s load the `multidplyr` package:

``library(multidplyr)``

First of all, you need to create a set of worker:

``````cl <- new_cluster(4)
cl``````
``4 session cluster [....]``

## Data assignment

There are multiple ways to assign data to the workers.

### Assign the same value to each worker

This is done with the `cluster_assign()` function:

``cluster_assign(cl, a = 1:4)``

To execute the code on each worker and return the result, you use the function `cluster_call()`:

``cluster_call(cl, a)``
``````[]
 1 2 3 4

[]
 1 2 3 4

[]
 1 2 3 4

[]
 1 2 3 4``````
``````cluster_assign(cl, b = runif(4))
cluster_call(cl, b)``````
``````[]
 0.93146519 0.75181518 0.33158435 0.02970799

[]
 0.93146519 0.75181518 0.33158435 0.02970799

[]
 0.93146519 0.75181518 0.33158435 0.02970799

[]
 0.93146519 0.75181518 0.33158435 0.02970799``````

### Assign different values to each worker

For this, use instead `cluster_assign_each()`:

``````cluster_assign_each(cl, c = 1:4)
cluster_call(cl, c)``````
``````[]
 1

[]
 2

[]
 3

[]
 4``````
``````cluster_assign_each(cl, d = runif(4))
cluster_call(cl, d)``````
``````[]
 0.8892167

[]
 0.09334862

[]
 0.614763

[]
 0.6986541``````

### Partition vectors

`cluster_assign_partition()` splits up a vector to assign about the same amount of data to each worker:

``````cluster_assign_partition(cl, e = 1:10)
cluster_call(cl, e)``````
``````[]
 1 2 3

[]
 4 5

[]
 6 7

[]
  8  9 10``````