Installation and setup

Author

Marie-Hélène Burle

This section covers the installation of Flax, CLU (training loop helpers), Optax (optimizers and loss functions), Orbax (checkpointing), and dataset libraries on your computer and the Alliance clusters.

For this course, we will use a training cluster.

On your computer

Stable versions:

python -m pip install flax  # NN library \
    clu                     # training loop helpers \
    optax                   # optimizers & loss functions \
    orbax-checkpoint        # checkpointing \
    # only install the library you want to use to load datasets
    datasets torchvision tensorflow-datasets

Latest versions:

python -m pip install git+https://github.com/google/flax.git \
    git+https://github.com/google/CommonLoopUtils \
    git+https://github.com/google-deepmind/optax.git \
    'git+https://github.com/google/orbax/#subdirectory=checkpoint' \
    # only install the library you want to use to load datasets
    datasets torchvision tfds-nightly

The CPU version of JAX will get installed as a Flax dependency. Install another JAX version if you need.

On an Alliance cluster

Logging in through SSH

Open a terminal emulator

Windows users:  Install the free version of MobaXTerm and launch it.
MacOS users:   Launch Terminal.
Linux users:     Open the terminal emulator of your choice.

Access the cluster through secure shell

Windows users

Follow the first 18% of this demo.

For “Remote host”, use the hostname we gave you.
Select the box “Specify username” and provide your username.

Note that the password is entered through blind typing, meaning that you will not see anything happening as you type it. This is a Linux feature. While it is a little disturbing at first, do know that it is working. Make sure to type it slowly to avoid typos, then press the “enter” key on your keyboard.

MacOS and Linux users

In the terminal, run:

ssh <username>@<hostname>

Replace the username and hostname by their values.
For instance:

ssh user21@somecluster.c3.ca

You will be asked a question, answer “Yes”.

When prompted, type the password.

Note that the password is entered through blind typing, meaning that you will not see anything happening as you type it. This is a Linux feature. While it is a little disturbing at first, do know that it is working. Make sure to type it slowly to avoid typos, then press the “enter” key on your keyboard.

Troubleshooting

Problems logging in are almost always due to typos. If you cannot log in, retry slowly, entering your password carefully.

Install Flax

We already created a Python virtual environment and installed Flax to save time. The instructions for today thus differ from what you would normally do, but I include the normal instructions in a separate tab for your future reference.

I already created a virtual Python environment under /project and installed Flax in it to save time and space. All you have to do is activate it:

source /project/60055/env/bin/activate

Look for available Python modules:

module spider python

Load the version of your choice:

module load python/3.11.5

Create a Python virtual environment:

python -m venv ~/env

Activate it:

source ~/env/bin/activate

Update pip from wheel:

python -m pip install --upgrade pip --no-index

Whenever a Python wheel for a package is available on the Alliance clusters, you should use it instead of downloading the package from PyPI. To do this, simply add the --no-index flag to the install command.

You can see whether a wheel is available with avail_wheels <package> or look at the list of available wheels.

Advantages of wheels:

  • compiled for the clusters hardware,
  • ensures no missing or conflicting dependencies,
  • much faster installation.

Install libraries from wheel:

python -m pip install --no-index # install from wheels \
    flax                         # NN library \
    clu                          # training loop helpers \
    optax                        # optimizers & loss functions \
    orbax-checkpoint             # checkpointing \
    # only install the library you want to use to load datasets
    datasets torchvision tensorflow-datasets

Don’t forget the --no-index flag here: the wheel will save you from having to deal with the CUDA and CUDNN dependencies, making your life a lot easier.

Don’t mindlessly install all the datasets libraries: it is pointless to clutter your virtual environment with things you don’t need and—because they rely on similar dependencies—it can lead to dependency conflicts.