Automation & scripting in bash for beginners

Author

Marie-Hélène Burle

This workshop will demystify the command line and get you started using Bash and Bash scripting.

Warning: you might find that working in the command line is actually really fun and addictive!

Background

What are Unix shells?

A Unix shell is a command line interpreter: the user enters commands as text, either interactively in the command line or in a script, and the shell passes them to the operating system.

Bash

Bash (Bourne Again SHell), released in 1989, is part of the GNU Project and is the default Unix shell on many systems (MacOS recently changed its default to zsh).

Other shells

Prior to Bash, the default was the Bourne shell (sh).

A new and popular shell (backward compatible with Bash) is zsh. It extends Bash’s capabilities.

Another shell in the same family is the KornShell (ksh).

All these shells are quite similar. The C shell (csh) however was modeled on the C programming language.

Bash is the most common shell and the one which makes the most sense to learn as a first Unix shell.

Why use a shell?

While automating GUI operations is really difficult, it is easy to rerun a script (a file with a number of commands). Unix shells thus allow the creation of reproducible workflows and the automation of repetitive tasks.

They are powerful to launch tools, modify files, search text, or combine commands.

They also allow to work on remote machines and HPC systems.

How we will use Bash today

Bash is a Unix shell. You thus need a Unix or Unix-like operating system.

We will connect to a remote HPC system via SSH (secure shell). HPC systems always run Linux.

Those on Linux or MacOS can alternatively use Bash directly on their machine. On MacOS, the default is now zsh (you can see that by typing echo $SHELL in Terminal), but zsh is fully compatible with Bash commands, so it is totally fine to use it instead. If you really want to use Bash, simply launch it by typing in Terminal: bash.

Connecting to a remote HPC system via SSH

Usernames and password

We will give you a link to an etherpad during the workshop. Add your name next to a free username to claim it.

We will also give you the password for our training cluster. When prompted, enter it.

Note that you will not see any character as you type the password: this is called blind typing and is a Linux safety feature. Type slowly and make sure not to make typos. It can be unsettling at first not to get any feed-back while typing.

Linux and MacOS users

Linux users: open the terminal emulator of your choice.
MacOS users: open “Terminal”.

Then type:

ssh userxx@bashworkshop.c3.ca  # Replace userxx by your username (e.g. user09)

Windows users

We suggest using the free version of MobaXterm.

MobaXterm comes with a terminal emulator and a GUI interface for SSH sessions.

Open MobaXterm, click on “Session”, then “SSH”, and fill in the Remote host name and your username. Here is a live demo.

Bash: the basics

The prompt

In command-line interfaces, a command prompt is a sequence of characters indicating that the interpreter is ready to accept input. It can also provide some information (e.g. time, error types, username and hostname, etc.)

The Bash prompt is customizable. By default, it often gives the username and the hostname, and it typically ends with $.

Help on commands

Man pages:

man <command>

Man pages open in a pager (usually less).
Navigate up/down with the space bar and the b key.
Quit the pager with the q key.

Help pages:

<command> --help

Inspect commands:

command -V <command>

Examples of commands

  • Print working directory: pwd
  • Change directory: cd
  • Print: echo
  • Print content of a file: cat
  • List: ls
  • Copy: cp
  • Move or rename: mv
  • Create a new directory: mkdir
  • Create a new file: touch

Keybindings

Clear the terminal (command clear) with C-l (this means: press the Ctrl and L keys at the same time).

Navigate command history with C-p and C-n (or up and down arrows).

You can auto-complete commands by pressing the tab key.

Bash scripting: the basics

Instead of typing commands one at a time directly in a terminal, you can write them down, one per line, in a text file called a script.

They will be run in the order in which they are written when you execute the script.

This is a great way to automate tasks: to rerun this sequence of commands, you simply have to rerun the script.

File name

Shell scripts, including Bash scripts, are usually given the extension sh (e.g. my_script.sh).

You can store scripts anywhere, but a common practice is to store them in a ~/bin directory.

Syntax

Shebang

Scripts can be written for any interpreter (e.g. Bash, Python, R, etc.) The way to tell the system which one to use is to use a shebang (#!) followed by the path of the interpreter on the first line of the script.

To use Bash, start your scripts with:

#!/bin/bash

You may also encounter this notation:

#!/usr/bin/env bash

If you are curious, you can read the answers to this Stack Overflow question for the differences between the two.

Comments

Anything to the left of # is ignored by the interpreter and is for human consumption only.

# You can write full-line comments

pwd       # You can also write comments after commands

Executing scripts

There are two ways to execute a script:

bash my_script.sh
./my_script.sh  # The dot represents the current directory

In the latter case, you need to make sure that your script is executable by first running:

chmod u+x my_script.sh  # This makes the script executable by the user (i.e. you)

Our first script

Open a text editor (e.g. nano) and type:

#!/bin/bash

echo "This is our first script."

Save and close the file.

Your turn:

Now run the script with one, then the other method.
What does this script do?

Variables

Declaring variables

You can declare a variable (i.e. a name that holds a value) with the = sign.

!! Make sure not to put spaces around the equal sign.

variable=Test

Quotes

Let’s experiment with quotes:

variable=This string is the value of the variable
echo $variable
bash: line 1: string: command not found

Oops…

variable="This string is the value of the variable"
echo $variable
This string is the value of the variable
variable='This string is the value of the variable'
echo $variable
This string is the value of the variable
variable='This string's the value of the variable'
echo $variable
bash: -c: line 1: unexpected EOF while looking for matching `''

Oops…

One solution to this is to use double quotes:

variable="This string's the value of the variable"
echo $variable
This string's the value of the variable

Alternatively, single quotes can be escaped:

variable='This string'"'"'s the value of the variable'
echo $variable
This string's the value of the variable

Admittedly, this last one is a little crazy. It is the way to escape single quotes in single-quoted strings.

The first ' ends the first string, both " create a double-quoted string with ' (escaped) in it, then the last ' starts the second string.

Escaping double quotes is a lot easier and simply requires \".

Expanding a variable’s value

To expand a variable (to access its value), you need to prepend its name with $:

variable=Test
echo variable
variable

Mmmm… not really want we want!

variable=Test
echo $variable
Test
variable=Test; echo "$variable"
Test

!! Single quotes don’t expand variables.

variable=Test; echo '$variable'
$variable

Passing variables to a Bash script

Create a script called name.sh with the following content:

#!/bin/bash

echo "My name is $1."  # $1 refers to the first variable passed to the script

You can now pass a variable to this script with:

bash name.sh Marie
My name is Marie.

You can pass several variables to a script. Copy name.sh to name2.sh and edit name2.sh to look like the following:

#!/bin/bash

echo "My name is $1 and I am $2 years old."
bash name2.sh Marie 43
My name is Marie and I am 43 years old.

You can also pass any number of variables to a script:

#!/bin/bash

echo $@
bash script.sh argument1 argument2 argument3 argument4
argument1 argument2 argument3 argument4

Brace expansion

echo {1..5}
1 2 3 4 5
echo {01..10}
01 02 03 04 05 06 07 08 09 10
echo {1..5}.txt
1.txt 2.txt 3.txt 4.txt 5.txt
echo {r..v}
r s t u v
echo {file1,file2}.sh
file1.sh file2.sh

!! Make sure not to add a space after the comma.

touch {file1,file2}.sh
touch file{3..6}.sh
echo {list,of,strings}
list of strings

Wildcards

Wildcards are really powerful to apply a command to all the elements having a common pattern.

For instance, we can delete all the files we created earlier (file1.sh, file2.sh, etc.) with a single command:

rm file*.sh

!! Be very careful that rm is irreversible. Deleted files do not go to the trash: they are gone.

Loops

To apply a set of commands to all the elements of a list, you can use for loops. The general structure is as follows:

for <iterable> in <list>
do
    <statement1>
    <statement2>
    ...
done

Let’s create the script names.sh:

#!/bin/bash

for name in $@
do
    echo $name
done

Now let’s run it with a list of arguments:

bash names.sh Patrick Paul Marie Alex
Patrick
Paul
Marie
Alex

Your turn:

Compare the outputs of the following 2 scripts:

  • script1.sh:
#!/bin/bash

echo $@
  • script2.sh:
#!/bin/bash

for i in $@
do
    echo $i
done

How do you explain the difference between running:

bash script1.sh arg1 arg2 arg3

and running:

bash script2.sh arg1 arg2 arg3

Let’s put it all together to automate some task

This is a rather silly example, but bear with me and let’s imagine that it actually makes sense (of course, you don’t write that many thesis chapters so you would probably never automate these tasks…)

So… let’s imagine that each time you write a thesis chapter, you do the same things:

  • you create a directory with the name of the chapter,
  • you create a number of subdirectories (for your source code, your manuscript, your data, and your results),
  • you create a Python script in the source code directory,
  • you create a markdown document in your manuscript directory,
  • you put the whole thing under version control with Git,
  • you create a .gitignore file in which you put the data subdirectory.

Your turn:

Write a script that would do all this, then test the script.

Give it a try on your own before looking at the solution below…

Here is what the script looks like (let’s call it chapter.sh):

#!/bin/bash

mkdir $1
cd $1
mkdir src data results ms
touch src/$1.py ms/$1.md
git init
echo data/ > .gitignore

You then run the script:

bash chapter.sh chapter1

You can verify that all the files and directories got created with:

tree chapter1
chapter1/
├── data
├── ms
│   └── chapter1.md
├── results
└── src
    └── chapter1.py

and:

ls -aF chapter1
./  ../  data/  .git/  .gitignore  ms/  results/  src/

You can also verify the content of your .gitignore file with:

cat chapter1/.gitignore
data/

Resources

One very useful (although very dense) resource is the Bash manual.

You can also get information on Bash from within Bash with:

info bash

and:

man bash

There are also countless resources online and don’t forget to Google anything you don’t know how to do: you will almost certainly find the answer on StackOverflow or some Stack Exchange site.