Finding files

Author

Marie-Hélène Burle

Data for this section

For this section, we will play with files created by The Carpentries.

You can download them into a zip file called bash.zip with:

wget http://bit.ly/bashfile -O bash.zip

You can then unzip that file with:

unzip bash.zip

Finally, you can delete the zip file:

rm bash.zip

You should now have a data-shell directory with a molecules subdirectory.

cd into it:

cd data-shell/molecules

Command find

Search for files inside the current working directory:

find . -type f
./methane.pdb
./pentane.pdb
./sorted.txt
./propane.pdb
./lengths.txt
./cubane.pdb
./ethane.pdb
./octane.pdb

find . -type d will instead search for directories inside the current working directory.

Here are other examples:

find . -maxdepth 1 -type f     # depth 1 is the current directory
find . -mindepth 2 -type f     # current directory and one level down
find . -name haiku.txt      # finds specific file
ls data       # shows one.txt two.txt
find . -name *.txt      # still finds one file -- why? answer: expands *.txt to haiku.txt
find . -name '*.txt'    # finds all three files -- good!

Let’s wrap the last command into $()—called command substitution—as if it were a variable:

echo $(find . -name '*.txt')   # will print ./data/one.txt ./data/two.txt ./haiku.txt
ls -l $(find . -name '*.txt')   # will expand to ls -l ./data/one.txt ./data/two.txt ./haiku.txt
wc -l $(find . -name '*.txt')   # will expand to wc -l ./data/one.txt ./data/two.txt ./haiku.txt
grep elegant $(find . -name '*.txt')   # will look for 'elegant' inside all *.txt files

Your turn:

grep’s -v flag inverts pattern matching, so that only lines that do not match the pattern are printed.

Given that, which of the following commands will find all files in /data whose names end in ose.dat (e.g. sucrose.dat or maltose.dat), but do not contain the word temp?

  1. find /data -name '*.dat' | grep ose | grep -v temp
  2. find /data -name ose.dat | grep -v temp
  3. grep -v temp $(find /data -name '*ose.dat')
  4. None of the above

Here is a video of a previous version of this workshop.

Running a command on the results of find

Let’s say that you want to run a command on each of the files in the output of find. You can always do something using command substitution like this:

for f in $(find . -name "*.txt")
do
    command on $f
done

Alternatively, you can make it a one-liner:

find . -name "*.txt" -exec command {} \;

Another—perhaps more elegant—one-line alternative is to use xargs. In its simplest usage, xargs command lets you construct a list of arguments:

find . -name "*.txt"                   # returns multiple lines
find . -name "*.txt" | xargs           # use those lines to construct a list
find . -name "*.txt" | xargs command   # pass this list as arguments to `command`
command $(find . -name "*.txt")        # command substitution, achieving the same result (this is riskier!)
command `(find . -name "*.txt")`       # alternative syntax for command substitution

In these examples, xargs achieves the same result as command substitution, but it is safer in terms of memory usage and the length of lists you can pass.

When would you need to use this? A good example is with the command grep. grep takes a search stream (and not a list of files) as its standard input:

cat filename | grep pattern

To pass a list of files to grep, you can use xargs that takes that list from its standard input and converts it into a list of arguments that is then passed to grep:

find . -name "*.txt" | xargs grep pattern   # search for `pattern` inside all those files (`grep` does not take a list of files as standard input)

Here is a video of a previous version of this workshop.