Introduction to HPC in Julia

Author

Marie-Hélène Burle

Interactive sessions on clusters

When you launch a Jupyter session from a JupyterHub, you are running a Slurm job on a compute node. If you want to play for 8 hours in Jupyter, you are requesting an 8 hour job. Now, most of the time you spend on Jupyter is spent typing, running bits and pieces of code, or doing nothing at all. If you ask for GPUs, many CPUs, and lots of RAM, all of it will remain idle most of the time. This is a suboptimal use of resources.

In addition, if you ask for lots of resources for a long time, you will have to wait for a while before they get allocated to you.

Lastly, you will go through your allocations quickly.

All of this applies equally for interactive sessions launched from an SSH session with salloc.

A better approach

A more efficient strategy is to develop and test your code with small samples, few iterations, etc. in an interactive job (from an SSH session in the cluster with salloc), on your own computer, or in Jupyter. Once you are confident that your code works, launch an sbatch job from an SSH session in the cluster to run the code as a script on all your data. This ensures that heavy duty resources that you requested are actually put to use to run your heavy calculations and not seating idle while you are thinking, typing, etc.

Logging on to the cluster

Step 1: get the info

During the course, we will give you 3 pieces of information:

  • a link to a list of usernames,
  • the hostname for our temporary training cluster,
  • the password to access that cluster.

Step 2: claim a username

Add your first name or a pseudo next to a free username on the list to claim it.

Your username is the name that was already on the list, NOT what you wrote next to it (which doesn’t matter at all and only serves at signalling that this username is now taken).

Your username will look like userxxxx being 2 digits—with no space and no capital letter.

Step 3: run the ssh command

Linux users:   open the terminal emulator of your choice.
macOS users:   open “Terminal”.

Then type:

ssh userxx@hostname

and press Enter.

  • Replace userxx by your username (e.g. user09).
  • Replace hostname by the hostname we will give you the day of the workshop.

When asked:

Are you sure you want to continue connecting (yes/no/[fingerprint])?

Answer: “yes”.

We suggest using the free version of MobaXterm, a software that comes with a terminal emulator and a GUI interface for SSH sessions.

Here is how to install MobaXterm:

  • download the “Installer edition” to your computer (green button to the right),
  • unzip the file,
  • double-click on the .msi file to launch the installation.

Here is how to log in with MobaXterm:

  • open MobaXterm,
  • click on Session (top left corner),
  • click on SSH (top left corner),
  • fill in the Remote host * box with the cluster hostname we gave you,
  • tick the box Specify username,
  • fill in the box with the username you selected (e.g. user09),
  • press OK,
  • when asked Are you sure you want to continue connecting (yes/no/[fingerprint])?, answer: “yes”.

Here is a live demo.

Step 4: enter the password

When prompted, enter the password we gave you.

You will not see anything happen as you type the password. This is normal and it is working, so keep on typing the password.

This is called blind typing and is a Linux safety feature. It can be unsettling at first not to get any feed-back while typing as it really looks like it is not working. Type slowly and make sure not to make typos.

Then press Enter.

Am I logged in?

To know whether or not you are logged in, look at your prompt: it should look like the following (with your actual username):

[userxx@login1 ~]$

Troubleshooting

Problems logging in are almost always due to typos. If you cannot log in, retry slowly, entering your password carefully.

How do I log out?

You can log out by pressing Ctl+d.

Accessing Julia

This is done with the Lmod tool through the module command. You can find the full documentation here and below are the subcommands you will need:

# get help on the module command
$ module help
$ module --help
$ module -h

# list modules that are already loaded
$ module list

# see which modules are available for Julia
$ module spider julia

# see how to load julia 1.3
$ module spider julia/1.3.0

# load julia 1.3 with the required gcc module first
# (the order is important)
$ module load gcc/7.3.0 julia/1.3.0

# you can see that we now have Julia loaded
$ module list

Copying files to the cluster

We will create a julia_workshop directory in ~/scratch, then copy our julia script in it.

$ mkdir ~/scratch/julia_job

Open a new terminal window and from your local terminal (make sure that you are not on the remote terminal by looking at the bash prompt) run:

$ scp /local/path/to/sort.jl <username>@<hostname>:scratch/julia_job
$ scp /local/path/to/psort.jl <username>@<hostname>:scratch/julia_job

# enter password

Job scripts

We will not run an interactive session with Julia on the cluster: we already have julia scripts ready to run. All we need to do is to write job scripts to submit to Slurm, the job scheduler used by the Alliance clusters.

We will create 2 scripts: one to run Julia on one core and one on as many cores as are available.

Your turn:

How many processors are there on our training cluster?

We can run Julia with multiple threads by running:

$ JULIA_NUM_THREADS=2 julia

or:

$ julia -t 2

Once in Julia, you can double check that Julia does indeed have access to 2 threads by running:

Threads.nthreads()

Save your job scripts in the files ~/scratch/julia_job/job_julia1c.sh and job_julia2c.sh for one and two cores respectively.

Here is what our single core Slurm script looks like:

#!/bin/bash
#SBATCH --job-name=julia1c          # job name
#SBATCH --time=00:01:00             # max walltime 1 min
#SBATCH --cpus-per-task=1           # number of cores
#SBATCH --mem=1000                  # max memory (default unit is megabytes)
#SBATCH --output=julia1c%j.out      # file name for the output
#SBATCH --error=julia1c%j.err       # file name for errors
# %j gets replaced with the job number

echo Running NON parallel script
julia sort.jl
echo Running parallel script on $SLURM_CPUS_PER_TASK core
julia -t $SLURM_CPUS_PER_TASK psort.jl

Your turn:

Write the script for 2 cores.

Now, we can submit our jobs to the cluster:

$ cd ~/scratch/julia_job
$ sbatch job_julia1c.sh
$ sbatch job_julia2c.sh

And we can check their status with:

$ sq      # This is an Alliance alias for `squeue -u $USER $@`

PD stands for pending
R stands for running