Finding files
Data for this section
For this section, we will play with files created by The Carpentries.
You can download them into a zip file called bash.zip
with:
wget http://bit.ly/bashfile -O bash.zip
You can then unzip that file with:
unzip bash.zip
Finally, you can delete the zip file:
rm bash.zip
You should now have a data-shell
directory with a molecules
subdirectory.
cd
into it:
cd data-shell/molecules
Command find
Search for files inside the current working directory:
find . -type f
./methane.pdb
./pentane.pdb
./sorted.txt
./propane.pdb
./lengths.txt
./cubane.pdb
./ethane.pdb
./octane.pdb
find . -type d
will instead search for directories inside the current working directory.
Here are other examples:
find . -maxdepth 1 -type f # depth 1 is the current directory
find . -mindepth 2 -type f # current directory and one level down
find . -name haiku.txt # finds specific file
ls data # shows one.txt two.txt
find . -name *.txt # still finds one file -- why? answer: expands *.txt to haiku.txt
find . -name '*.txt' # finds all three files -- good!
Let’s wrap the last command into $()
—called command substitution—as if it were a variable:
echo $(find . -name '*.txt') # will print ./data/one.txt ./data/two.txt ./haiku.txt
ls -l $(find . -name '*.txt') # will expand to ls -l ./data/one.txt ./data/two.txt ./haiku.txt
wc -l $(find . -name '*.txt') # will expand to wc -l ./data/one.txt ./data/two.txt ./haiku.txt
grep elegant $(find . -name '*.txt') # will look for 'elegant' inside all *.txt files
Your turn:
grep
’s -v
flag inverts pattern matching, so that only lines that do not match the pattern are printed.
Given that, which of the following commands will find all files in /data
whose names end in ose.dat
(e.g. sucrose.dat
or maltose.dat
), but do not contain the word temp
?
find /data -name '*.dat' | grep ose | grep -v temp
find /data -name ose.dat | grep -v temp
grep -v temp $(find /data -name '*ose.dat')
- None of the above
Here is a video of a previous version of this workshop.
Running a command on the results of find
Let’s say that you want to run a command on each of the files in the output of find
. You can always do something using command substitution like this:
for f in $(find . -name "*.txt")
do
command on $f
done
Alternatively, you can make it a one-liner:
find . -name "*.txt" -exec command {} \;
Another—perhaps more elegant—one-line alternative is to use xargs
. In its simplest usage, xargs
command lets you construct a list of arguments:
find . -name "*.txt" # returns multiple lines
find . -name "*.txt" | xargs # use those lines to construct a list
find . -name "*.txt" | xargs command # pass this list as arguments to `command`
command $(find . -name "*.txt") # command substitution, achieving the same result (this is riskier!)
command `(find . -name "*.txt")` # alternative syntax for command substitution
In these examples, xargs
achieves the same result as command substitution, but it is safer in terms of memory usage and the length of lists you can pass.
When would you need to use this? A good example is with the command grep
. grep
takes a search stream (and not a list of files) as its standard input:
cat filename | grep pattern
To pass a list of files to grep, you can use xargs
that takes that list from its standard input and converts it into a list of arguments that is then passed to grep
:
find . -name "*.txt" | xargs grep pattern # search for `pattern` inside all those files (`grep` does not take a list of files as standard input)
Here is a video of a previous version of this workshop.