Getting the data
An important step if you want to train models on the Alliance clusters using your own dataset is to transfer it to the cluster.
In this section, I discuss several options.
Download data
The first step is to download the data.
If you can use curl or wget to do so, download it directly to the cluster. That the best and simplest option.
Unfortunately, the data source for our example does not provide a direct URL. Instead, you have to log in before ending up in a DropBox. So the only option is to download it to your own machine first, then transfer it to the cluster.
Transfer data to clusters
Globus
Globus is by far the best method in this case. Our wiki explains how to use it on our clusters.
All clusters have a Globus collection name specified in the table on top of their respective wiki page.
Example:
The wiki page for the Fir cluster lists its Globus collection name and provides a direct link: alliancecan#fir-globus.
You also need to install Globus Connect Personal on your computer to create an endpoint for your machine.
During transfers, the endpoint needs to be active. On Linux, if you install the command-line version of Globus Connect Personal, the command is:
globusconnect -startOnce the transfer is complete, you receive an email similar to this one:
TASK DETAILS
Task ID: ce0c341d-d247-11f0-8471-0ed8a6a59ea5
Task Type: TRANSFER
Status: SUCCEEDED
Source: xxxxxxxxxxxxxxxxxxxxxxx
Destination: computecanada#cedar-globus & alliancecan#fir-globus
Label: n/a
Request Time: 2025-12-06 02:03:49.999679 (UTC)
Completion Time: 2025-12-06 02:15:44.150358 (UTC)
Files Transfered: 48562
Directories Transfered: 556
Bytes Transfered: 9950162482
Effective Speed: 13932861 Bytes per Second
Transfer Settings:
- verify file integrity after transfer
- transfer is not encrypted
- overwriting all files on destination
Alternative methods
If neither direct downloads nor Globus are possible, there are other methods, although for large datasets, you might run into problems.
A trick that will make any of the following commands involving the path to a remote cluster easier is to create an SSH config file as ~/.ssh/config.
~/.ssh/config
Host <name-of-your-choice>
Hostname <hostname-address>
User <username>Many additional configurations can be added to this file to create agent forwarding, persistent logins, etc.
Once you have created this file, you can use the <name-of-your-choice> in lieu of <username>@<hostname-address>. This makes commands a lot easier to type.
Example:
~/.ssh/config
Host fir
Hostname fir.alliancecan.ca
User jdoeNow, you can replace jdoe@fir.alliancecan.ca in all commands by fir.
Remote copies with scp
Secure copy protocol (SCP) allows to copy files over the Secure Shell Protocol (SSH) with the scp utility. scp follows a syntax similar to that of the cp command.
Note that you need to run scp from your local machines (not from the cluster as your firewall would block it).
Create a compressed archive
Before moving a dataset this way, create a compressed tar archive.
Example:
tar -zcvf nabirds.tar.gz nabirdsCopy from your machine
# Copy a local file to your home directory on the cluster
scp /local/path/file <username>@<hostname-address>:Example:
scp ~/data/nabirds.tar.gz jdoe@fir.alliancecan.ca:Or, if John Doe has an SSH config file on their machine:
scp ~/data/nabirds.tar.gz fir:# Copy a local file to some path on the cluster
scp /local/path/file <username>@<hostname-address>:/remote/pathCopy from the cluster
If you need to get your transformed dataset back, you can scp a tar archive back with (still from your machine):
# Copy a file from the cluster to some path on your machine
scp <username>@<hostname-address>:/remote/path/file /local/path# Copy a file from the cluster to your current location on your machine
scp <username>@<hostname-address>:/remote/path/file .You can also use wildcards to transfer multiple files:
# Copy all the Bash scripts from your cluster home dir to some local path
scp <username>@<hostname-address>:*.sh /local/pathCopying directories
To copy a directory, you need to add the -r (recursive) flag:
scp -r /local/path/folder <username>@<hostname-address>:/remote/pathCopying for Windows users
MobaXterm users (on Windows) can copy files by dragging them between the local and remote machines in the GUI. Alternatively, they can use the download and upload buttons.
Uncompress archive
After the transfer, you can uncompress your archive with:
tar -xvzf nabirds.tar.gzInteractive transfers with sftp
The Secure File Transfer Protocol (SFTP) is more sophisticated and allows additional operations. The sftp command provided by OpenSSH and other packages launches an SFTP client:
sftp <username>@<hostname-address>Look at your prompt: your usual Bash/Zsh prompt has been replaced with sftp>.
From this prompt, you can access a number of SFTP commands. Type help for a list:
sftp> helpAvailable commands:
bye Quit sftp
cd path Change remote directory to 'path'
chgrp [-h] grp path Change group of file 'path' to 'grp'
chmod [-h] mode path Change permissions of file 'path' to 'mode'
chown [-h] own path Change owner of file 'path' to 'own'
copy oldpath newpath Copy remote file
cp oldpath newpath Copy remote file
df [-hi] [path] Display statistics for current directory or
filesystem containing 'path'
exit Quit sftp
get [-afpR] remote [local] Download file
help Display this help text
lcd path Change local directory to 'path'
lls [ls-options [path]] Display local directory listing
lmkdir path Create local directory
ln [-s] oldpath newpath Link remote file (-s for symlink)
lpwd Print local working directory
ls [-1afhlnrSt] [path] Display remote directory listing
lumask umask Set local umask to 'umask'
mkdir path Create remote directory
progress Toggle display of progress meter
put [-afpR] local [remote] Upload file
pwd Display remote working directory
quit Quit sftp
reget [-fpR] remote [local] Resume download file
rename oldpath newpath Rename remote file
reput [-fpR] local [remote] Resume upload file
rm path Delete remote file
rmdir path Remove remote directory
symlink oldpath newpath Symlink remote file
version Show SFTP version
!command Execute 'command' in local shell
! Escape to local shell
? Synonym for help
As this list shows, you have access to a number of classic Unix command such as cd, pwd, ls, etc. These commands will be executed on the remote machine.
In addition, there are a number of commands of the form l<command>. “l” stands for “local”.
These commands will be executed on your local machine.
For instance, ls will list the files in your current directory in the remote machine while lls (“local ls”) will list the files in your current directory on your computer.
This means that you are now able to navigate two file systems at once: your local machine and the remote machine.
Here are a few examples:
sftp> pwd # print remote working directory
sftp> lpwd # print local working directory
sftp> ls # list files in remote working directory
sftp> lls # list files in local working directory
sftp> cd # change the remote directory
sftp> lcd # change the local directory
sftp> put local_file # upload a file
sftp> get remote_file # download a fileCopying directories
To upload/download directories, you first need to create them in the destination, then copy the content with the -r (recursive) flag.
If you have a local directory called dir and you want to copy it to the cluster you need to run:
sftp> mkdir dir # First create the directory
sftp> put -r dir # Then copy the contentTo terminate the session, press <Ctrl+D>.
Syncing
If, instead of a single copying of the dataset between your machine and the cluster, you want to keep a directory in sync between both machines, you might want to use rsync instead. You can look at the Alliance wiki page on rsync for complete instructions.
Windows line endings
On modern Mac operating systems and on Linux, lines in files are terminated with a newline (\n). On Windows, they are terminated with a carriage return + newline (\r\n).
When you transfer files between Windows and Linux (the cluster uses Linux), this creates a mismatch. Most modern software handle this correctly, but you may occasionally run into problems.
The solution is to convert a file from Windows encoding to Unix encoding with:
dos2unix fileTo convert a file back to Windows encoding, run:
unix2dos file