Redirections & pipes

Author

Marie-Hélène Burle
Adapted from a Software Carpentry workshop

By default, commands that produce an output print it to the terminal. This output can however be redirected to be printed elsewhere (e.g. to a file) or to be passed as the argument of another command.

For this section, we will play with files created by The Carpentries.

You can download them into a zip file called bash.zip with:

wget https://bit.ly/bashfile -O bash.zip

You can then unzip that file with:

unzip bash.zip

You should now have a data-shell directory with a molecules subdirectory.

cd into it:

cd data-shell/molecules

Redirections

By default, commands that produce an output print it to standard output—that is, the terminal. This is what we have been doing so far.

The output can however be redirected with the > sign. For instance, it can be redirected to a file, which is very handy if you want to save the result.

Example:

Let’s print the number of lines in each .pdb file in the molecules directory:

wc -l *.pdb
  20 gas_cubane.pdb
  12 gas_ethane.pdb
   9 gas_methane.pdb
  30 gas_octane.pdb
  21 gas_pentane.pdb
  15 gas_propane.pdb
 107 total

Your turn:

  • What does the wc command do?
  • What does the -l flag for this command do?
  • How did you find out?

To save this result into a file called lengths.txt, we run:

wc -l *.pdb > lengths.txt

Note that > always creates a new file. If a file called lengths.txt already exists, it will be overwritten. Be careful not to lose data this way!

If you don’t want to lose the content of the old file, you can append the output to the existing file with >> (>> will create a file lengths.txt if it doesn’t exist yet, but if it exists, it will append the new content below the old one).

Your turn:

How can you make sure that you did create a file called lengths.txt?

Let’s print its content to the terminal:

cat lengths.txt
  20 gas_cubane.pdb
  12 gas_ethane.pdb
   9 gas_methane.pdb
  30 gas_octane.pdb
  21 gas_pentane.pdb
  15 gas_propane.pdb
 107 total

As you can see, it contains the output of the command wc -l *.pdb.

Of course, we can print the content of the file with modification. For instance, we can sort it:

sort -n lengths.txt
   9 gas_methane.pdb
  12 gas_ethane.pdb
  15 gas_propane.pdb
  20 gas_cubane.pdb
  21 gas_pentane.pdb
  30 gas_octane.pdb
 107 total

And we can redirect this new output to a new file:

sort -n lengths.txt > sorted.txt

Instead of printing an entire file to the terminal, you can print only part of it.

Let’s print the first line of the new file sorted.txt:

head -1 sorted.txt
   9 gas_methane.pdb

Pipes

Another form of redirection is the Bash pipe. Instead of redirecting the output to a different stream for printing, the output is passed as an argument to another command. This is very convenient because it allows to chain multiple commands without having to create files or variables to save intermediate results.

For instance, we could run the three commands we ran previously at once, without the creation of the two intermediate files:

wc -l *.pdb | sort -n | head -1
   9 gas_methane.pdb

In each case, the output of the command on the left-hand side (LHS) is passed as the input of the command on the right-hand side (RHS).

Your turn:

In a directory we want to find the 3 files that have the least number of lines. Which command would work for this?

  1. wc -l * > sort -n > head -3
  2. wc -l * | sort -n | head 1-3
  3. wc -l * | sort -n | head -3
  4. wc -l * | head -3 | sort -n

Here is a video of a previous version of this workshop.