Running JAX

Author

Marie-Hélène Burle

This sections covers important considerations on how to use resources efficiently when training neural networks and instructions on how to run code during this course.

Resource efficient workflow

Code prototyping

Python being an interpreted language, it makes sense to prototype code in an interactive fashion. There are many options of this, including:

launching the Python REPL (which finally saw a little refresh with Python 3.13 but still remains very austere)
using the much more powerful IPython shell,
using the even more powerful ptpython (prompt toolkit) shell,
using the previous two combined (ptpython integrates with IPython thanks to its ptipython executable),
using Emacs as a Python IDE,
using JupyterLab.

Training a model requires a lot of resources. You might need multiple GPUs or entire nodes. It would be silly to have so much resource sit idle for hours while you are typing in a Jupyter notebook or thinking about your code.

The answer is to prototype code in an interactive environment at a very small scale (e.g. on a tiny subsample of data) until you have a program (a script) that works.

Scaling things up

Once you are confident that your code is good, you can scale it up on more hardware (not all at once, do multiple tests at increasingly larger scale so that you don’t wait for three weeks for results that don’t work).

For this part, it is best to SSH into a cluster and launch a batch Slurm job.

One of the great things about JAX is that the same code runs on any device so you can test the code on your machine on CPUs and then run it as is on a clusters on GPU.

Our workflow for this course

We will mostly use a JupyterHub during this course to play with snippets of code. When we get to really trying to train a model, we will log in to our training cluster via SSH and submit jobs to the Slurm scheduler.

Accessing our temporary JupyterHub

Go to the etherpad shared during the course to claim a username,
go to the URL of the JupyterHub for this course,
sign in with the username you claimed and the password we gave you,
leave OTP blank,
the server options are good as they are unless you want to bump the time a little (e.g. to 1.5h),
press start,
start a Python notebook: click on the button “Python 3” in the “Notebook” section (top row of buttons).

The packages for this course are already installed in the JupyterHub.

If you don’t need all the time you asked for after all, you should log out (the resources you are using on a cluster are shared amongst many people and when resources are allocated to you, they aren’t available to other people. So it is a good thing not to ask for unnecessary resources and have them sit idle when others could be using them).

To log out, click on “File” in the top menu and select “Log out” at the very bottom.

If you would like to make a change to the information you entered on the server option page after you have pressed “start”, log out in the same way, log back in, edit the server options, and press start again.

Note that this JupyterHub will be destroyed at the end of the course.

Logging in through SSH

Open a terminal emulator

Windows users: Install the free version of MobaXTerm and launch it.
macOS users: Launch Terminal.
Linux users: Open the terminal emulator of your choice.

Access the cluster through secure shell

Windows users

Follow the first 18% of this demo.

For “Remote host”, use the hostname we gave you.
Select the box “Specify username” and provide your username.

Note that the password is entered through blind typing, meaning that you will not see anything happening as you type it. This is a Linux feature. While it is a little disturbing at first, do know that it is working. Make sure to type it slowly to avoid typos, then press the “enter” key on your keyboard.

macOS and Linux users

In the terminal, run:

ssh <username>@<hostname>

Replace the username and hostname by their values.
For instance:

ssh user21@somecluster.c3.ca

You will be asked a question, answer “Yes”.

When prompted, type the password.

Troubleshooting

Problems logging in are almost always due to typos. If you cannot log in, retry slowly, entering your password carefully.