Installation and setup
This section covers the installation of Flax, CLU (training loop helpers), Optax (optimizers and loss functions), Orbax (checkpointing), and dataset libraries on your computer and the Alliance clusters.
For this course, we will use a training cluster.
On your computer
Stable versions:
python -m pip install flax # NN library \
clu # training loop helpers \
optax # optimizers & loss functions \
orbax-checkpoint # checkpointing \
# only install the library you want to use to load datasets
datasets torchvision tensorflow-datasets
Latest versions:
python -m pip install git+https://github.com/google/flax.git \
\
git+https://github.com/google/CommonLoopUtils \
git+https://github.com/google-deepmind/optax.git 'git+https://github.com/google/orbax/#subdirectory=checkpoint' \
# only install the library you want to use to load datasets
datasets torchvision tfds-nightly
The CPU version of JAX will get installed as a Flax dependency. Install another JAX version if you need.
On an Alliance cluster
Logging in through SSH
Open a terminal emulator
Windows users: Install the free version of MobaXTerm and launch it.
MacOS users: Launch Terminal.
Linux users: Open the terminal emulator of your choice.
Access the cluster through secure shell
Windows users
Follow the first 18% of this demo.
For “Remote host”, use the hostname we gave you.
Select the box “Specify username” and provide your username.
Note that the password is entered through blind typing, meaning that you will not see anything happening as you type it. This is a Linux feature. While it is a little disturbing at first, do know that it is working. Make sure to type it slowly to avoid typos, then press the “enter” key on your keyboard.
MacOS and Linux users
In the terminal, run:
ssh <username>@<hostname>
Replace the username and hostname by their values.
For instance:
ssh user21@somecluster.c3.ca
You will be asked a question, answer “Yes”.
When prompted, type the password.
Note that the password is entered through blind typing, meaning that you will not see anything happening as you type it. This is a Linux feature. While it is a little disturbing at first, do know that it is working. Make sure to type it slowly to avoid typos, then press the “enter” key on your keyboard.
Troubleshooting
Problems logging in are almost always due to typos. If you cannot log in, retry slowly, entering your password carefully.
Install Flax
We already created a Python virtual environment and installed Flax to save time. The instructions for today thus differ from what you would normally do, but I include the normal instructions in a separate tab for your future reference.
I already created a virtual Python environment under /project
and installed Flax in it to save time and space. All you have to do is activate it:
source /project/60055/env/bin/activate
Look for available Python modules:
module spider python
Load the version of your choice:
module load python/3.11.5
Create a Python virtual environment:
python -m venv ~/env
Activate it:
source ~/env/bin/activate
Update pip from wheel:
python -m pip install --upgrade pip --no-index
Whenever a Python wheel for a package is available on the Alliance clusters, you should use it instead of downloading the package from PyPI. To do this, simply add the --no-index
flag to the install command.
You can see whether a wheel is available with avail_wheels <package>
or look at the list of available wheels.
Advantages of wheels:
- compiled for the clusters hardware,
- ensures no missing or conflicting dependencies,
- much faster installation.
Install libraries from wheel:
python -m pip install --no-index # install from wheels \
flax # NN library \
clu # training loop helpers \
optax # optimizers & loss functions \
orbax-checkpoint # checkpointing \
# only install the library you want to use to load datasets
datasets torchvision tensorflow-datasets
Don’t forget the --no-index
flag here: the wheel will save you from having to deal with the CUDA and CUDNN dependencies, making your life a lot easier.
Don’t mindlessly install all the datasets libraries: it is pointless to clutter your virtual environment with things you don’t need and—because they rely on similar dependencies—it can lead to dependency conflicts.