Searching a version-controlled project

Author

Marie-Hélène Burle

What is the point of creating all these commits if you are unable to make use of them because you can’t find the information you need in them?

In this workshop, we will learn how to search:

  • your files (at any of their versions) and
  • your commit logs.

By the end of the workshop, you should be able to retrieve anything you need from your versioned project.

Prerequisites:

This special Git topic is suitable for people who already use Git.

You don’t need to be an expert, but we expect that you are able to run basic Git commands in the command line.

Installation

MacOS & Linux users:

Install Git from the official website.

Windows users:

Install Git for Windows. This will also install “Git Bash”, a Bash emulator.

Using Git

We will use Git from the command line throughout this workshop.

MacOS users:    open “Terminal”.
Windows users:   open “Git Bash”.
Linux users:    open the terminal emulator of your choice.

Practice repo

Get a repo

You are welcome to use a repository of yours to follow this workshop. Alternatively, you can clone a practice repo I have on GitHub:

  1. Navigate to an appropriate location:
cd /path/to/appropriate/location
  1. Clone the repo:
# If you have set SSH for your GitHub account
git clone git@github.com:prosoitos/practice_repo.git
# If you haven't set SSH
git clone https://github.com/prosoitos/practice_repo.git
  1. Enter the repo:
cd practice_repo

Searching files

The first thing that can happen is that you are looking for a certain pattern somewhere in your project (for instance a certain function or a certain word).

git grep

The main command to look through versioned files is git grep.

You might be familiar with the command-line utility grep which allows to search for lines matching a certain pattern in files. git grep does a similar job with these differences:

  • it is much faster since all files under version control are already indexed by Git,
  • you can easily search any commit without having to check it out,
  • it has features lacking in grep such as, for instance, pattern arithmetic or tree search using globs.

Let’s try it

By default, git grep searches recursively through the tracked files in the working directory (that is, the current version of the tracked files).

First, let’s look for the word test in the current version of the tracked files in the test repo:

git grep test
adrian.txt:Adrian's test text file.
formerlyadrian.txt:Adrian's test text file.
ms/protocol.md:This is my test.
ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
redone17.txt:this is a test file from redone17
src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py:    test_loss = 0
src/test_manuel.py:        for data, target in test_loader:
src/test_manuel.py:            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
src/test_manuel.py:    test_loss /= len(test_loader.dataset)
src/test_manuel.py:        test_loss, correct, len(test_loader.dataset),
src/test_manuel.py:        100. * correct / len(test_loader.dataset)))
src/test_manuel.py:    test_data = datasets.MNIST(
src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py:        test(model, device, test_loader)
testAV1.txt:This is a test
text-collab.txt:This is the collaboration testing

Let’s add blank lines between the results of each file for better readability:

git grep --break test
adrian.txt:Adrian's test text file.

formerlyadrian.txt:Adrian's test text file.

ms/protocol.md:This is my test.

ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow

redone17.txt:this is a test file from redone17

src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py:    test_loss = 0
src/test_manuel.py:        for data, target in test_loader:
src/test_manuel.py:            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
src/test_manuel.py:    test_loss /= len(test_loader.dataset)
src/test_manuel.py:        test_loss, correct, len(test_loader.dataset),
src/test_manuel.py:        100. * correct / len(test_loader.dataset)))
src/test_manuel.py:    test_data = datasets.MNIST(
src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py:        test(model, device, test_loader)

testAV1.txt:This is a test

text-collab.txt:This is the collaboration testing

Let’s also put the file names on separate lines:

git grep --break --heading test
adrian.txt
Adrian's test text file.

formerlyadrian.txt
Adrian's test text file.

ms/protocol.md
This is my test.

ms/smabraha.txt
This is a test file that I wanted to make, then push it somehow

redone17.txt
this is a test file from redone17

src/test_manuel.py
def test(model, device, test_loader):
    test_loss = 0
        for data, target in test_loader:
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
    test_loss /= len(test_loader.dataset)
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    test_data = datasets.MNIST(
    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
        test(model, device, test_loader)

testAV1.txt
This is a test

text-collab.txt
This is the collaboration testing

We can display the line numbers for the results with the -n flag:

git grep --break --heading -n test
adrian.txt
1:Adrian's test text file.

formerlyadrian.txt
1:Adrian's test text file.

ms/protocol.md
9:This is my test.

ms/smabraha.txt
1:This is a test file that I wanted to make, then push it somehow

redone17.txt
1:this is a test file from redone17

src/test_manuel.py
50:def test(model, device, test_loader):
52:    test_loss = 0
55:        for data, target in test_loader:
58:            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
62:    test_loss /= len(test_loader.dataset)
65:        test_loss, correct, len(test_loader.dataset),
66:        100. * correct / len(test_loader.dataset)))
84:    test_data = datasets.MNIST(
90:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
97:        test(model, device, test_loader)

testAV1.txt
1:This is a test

text-collab.txt
1:This is the collaboration testing

Notice how the results for the file src/test_manuel.py involve functions. It would be very convenient to have the names of the functions in which test appears.

We can do this with the -p flag:

git grep --break --heading -p test src/test_manuel.py
src/test_manuel.py
def train(model, device, train_loader, optimizer, epoch):
def test(model, device, test_loader):
    test_loss = 0
        for data, target in test_loader:
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
    test_loss /= len(test_loader.dataset)
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
def main():
    test_data = datasets.MNIST(
    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
        test(model, device, test_loader)

We added the argument src/test_manuel.py to limit the search to that file.

We can now see that the word test appears in the functions test and main.

Now, instead of printing all the matching lines, let’s print the number of matches per file:

git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1

More complex patterns

git grep in fact searches for regular expressions. test is a regular expression matching test, but we can look for more complex patterns.

Let’s look for image:

git grep image

No output means that the search is not returning any result.

Let’s make this search case insensitive:

git grep -i image
src/new_file.py:from PIL import Image
src/new_file.py:berlin1_lr = Image.open("/home/marie/parvus/pwg/wtm/slides/static/img/upscaling/lr/berlin_1945_1.jpg")
src/new_file.py:berlin1_hr = Image.open("/home/marie/parvus/pwg/wtm/slides/static/img/upscaling/hr/berlin_1945_1.png")

We are now getting some results as Image was present in three lines of the file src/new_file.py.

Let’s now search for data:

git grep data
.gitignore:data/
ms/protocol.md:Collected and analyzed amazing data
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/test_manuel.py:from torchvision import datasets, transforms
src/test_manuel.py:    for batch_idx, (data, target) in enumerate(train_loader):
src/test_manuel.py:        data, target = data.to(device), target.to(device)
src/test_manuel.py:        output = model(data)
src/test_manuel.py:                epoch, batch_idx * len(data), len(train_loader.dataset),
src/test_manuel.py:        for data, target in test_loader:
src/test_manuel.py:            data, target = data.to(device), target.to(device)
src/test_manuel.py:            output = model(data)
src/test_manuel.py:    test_loss /= len(test_loader.dataset)
src/test_manuel.py:        test_loss, correct, len(test_loader.dataset),
src/test_manuel.py:        100. * correct / len(test_loader.dataset)))
src/test_manuel.py:    train_data = datasets.MNIST(
src/test_manuel.py:        '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py:        # '~/projects/def-sponsor00/data',
src/test_manuel.py:    test_data = datasets.MNIST(
src/test_manuel.py:        '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py:        # '~/projects/def-sponsor00/data',
src/test_manuel.py:    train_loader = torch.utils.data.DataLoader(train_data, batch_size=50)
src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)

We are getting results for the word data, but also for the pattern data in longer expressions such as train_data or dataset. If we only want results for the word data, we can use the -w flag:

git grep -w data
.gitignore:data/
ms/protocol.md:Collected and analyzed amazing data
src/test_manuel.py:    for batch_idx, (data, target) in enumerate(train_loader):
src/test_manuel.py:        data, target = data.to(device), target.to(device)
src/test_manuel.py:        output = model(data)
src/test_manuel.py:                epoch, batch_idx * len(data), len(train_loader.dataset),
src/test_manuel.py:        for data, target in test_loader:
src/test_manuel.py:            data, target = data.to(device), target.to(device)
src/test_manuel.py:            output = model(data)
src/test_manuel.py:        '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py:        # '~/projects/def-sponsor00/data',
src/test_manuel.py:        '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py:        # '~/projects/def-sponsor00/data',
src/test_manuel.py:    train_loader = torch.utils.data.DataLoader(train_data, batch_size=50)
src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)

Now, let’s use a more complex regular expression. We want the counts for the pattern ".*_.*" (i.e. any name with a snail case such as train_loader):

git grep -c ".*_.*"
.gitignore:4
src/new_file.py:22
src/test_manuel.py:29

Let’s print the first 3 results per file:

git grep -m 3 ".*_.*"
.gitignore:hidden_file
.gitignore:search_cache/
.gitignore:ws_search_cache/html
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/new_file.py:set5.column_names
src/test_manuel.py:from torch.optim.lr_scheduler import StepLR
src/test_manuel.py:    def __init__(self):
src/test_manuel.py:        super(Net, self).__init__()

As you can see, our results also include __init__ which is not what we were looking for. So let’s exclude __:

git grep -m 3 -e ".*_.*" --and --not -e "__"
.gitignore:hidden_file
.gitignore:search_cache/
.gitignore:ws_search_cache/html
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/new_file.py:set5.column_names
src/test_manuel.py:from torch.optim.lr_scheduler import StepLR
src/test_manuel.py:        x = F.max_pool2d(x, 2)
src/test_manuel.py:        output = F.log_softmax(x, dim=1)

For simple searches, you don’t have to use the -e flag before the pattern you are searching for. Here however, our command has gotten complex enough that we have to use it before each pattern.

Let’s make sure this worked as expected:

git grep -c ".*_.*"
echo "---"
git grep -c "__"
echo "---"
git grep -ce ".*_.*" --and --not -e "__"
.gitignore:4
src/new_file.py:22
src/test_manuel.py:29
---
src/test_manuel.py:2
---
.gitignore:4
src/new_file.py:22
src/test_manuel.py:27

There were 2 lines matching __ in src/test_manuel.py and we have indeed excluded them from our search.

Extended regular expressions are also covered with the flag -E.

Searching other trees

So far, we have searched the current version of tracked files, but we can just as easily search files at any commit.

Let’s search for test in the tracked files 20 commits ago:

git grep test HEAD~20
HEAD~20:adrian.txt:Adrian's test text file.
HEAD~20:formerlyadrian.txt:Adrian's test text file.
HEAD~20:ms/protocol.md:This is my test.
HEAD~20:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
HEAD~20:redone17.txt:this is a test file from redone17
HEAD~20:testAV1.txt:This is a test
HEAD~20:text-collab.txt:This is the collaboration testing

As you can see, the file src/test_manuel.py is not in the results. Either it didn’t exist or it didn’t have the word test at that commit.

If you want to search tracked files AND untracked files, you need to use the --untracked flag.

Let’s create a new (thus untracked) file with some content including the word test:

echo "This is a test" > newfile

Now compare the following:

git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1

with:

git grep -c --untracked test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
ws_search.rmarkdown:41

This last result also returned our untracked file newfile.

If you want to search untracked and ignored files (meaning all your files), use the flags --untracked --no-exclude-standard.

Let’s see what the .gitignore file contains:

cat .gitignore
data/
output/
hidden_file
search_cache/
search.qmd
newfile
img
ws_search_cache/html
ws_search.qmd

The directory data is in .gitignore. This means that it is not under version control and it thus doesn’t exist in our repo (since we cloned our repo, we only have the version-controlled files). Let’s create it:

mkdir data

Now, let’s create a file in it that contains test:

echo "And another test" > data/file

We can rerun our previous two searches to verify that files excluded from version control are not searched:

git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
git grep -c --untracked test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
ws_search.rmarkdown:41

And now, let’s try:

git grep -c --untracked --no-exclude-standard test
adrian.txt:1
data/file:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
newfile:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
ws_search.qmd:41
ws_search.rmarkdown:41

data/file, despite being excluded from version control, is also searched.

Searching all commits

We saw that git grep <pattern> <commit> can search a pattern in any commit. Now, what if we all to search all commits for a pattern?

For this, we pass the expression $(git rev-list --all) in lieu of <commit>.

git rev-list --all creates a list of all the commits in a way that can be used as an argument to other functions. The $() allows to run the expression inside it and pass the result as and argument.

To search for test in all the commits, we thus run:

git grep "test" $(git rev-list --all)

I am not running this command has it has a huge output. Instead, I will limit the search to the last two commits:

git grep "test" $(git rev-list --all -2)
388fdc13de66537cac2169253cb385dfd409e710:adrian.txt:Adrian's test text file.
388fdc13de66537cac2169253cb385dfd409e710:formerlyadrian.txt:Adrian's test text file.
388fdc13de66537cac2169253cb385dfd409e710:ms/protocol.md:This is my test.
388fdc13de66537cac2169253cb385dfd409e710:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
388fdc13de66537cac2169253cb385dfd409e710:redone17.txt:this is a test file from redone17
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:def test(model, device, test_loader):
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:    test_loss = 0
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:        for data, target in test_loader:
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:    test_loss /= len(test_loader.dataset)
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:        test_loss, correct, len(test_loader.dataset),
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:        100. * correct / len(test_loader.dataset)))
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:    test_data = datasets.MNIST(
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:        test(model, device, test_loader)
388fdc13de66537cac2169253cb385dfd409e710:testAV1.txt:This is a test
388fdc13de66537cac2169253cb385dfd409e710:text-collab.txt:This is the collaboration testing
423f454765d45e21e0ae401da0b3dec2d84113ce:adrian.txt:Adrian's test text file.
423f454765d45e21e0ae401da0b3dec2d84113ce:formerlyadrian.txt:Adrian's test text file.
423f454765d45e21e0ae401da0b3dec2d84113ce:ms/protocol.md:This is my test.
423f454765d45e21e0ae401da0b3dec2d84113ce:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
423f454765d45e21e0ae401da0b3dec2d84113ce:redone17.txt:this is a test file from redone17
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:def test(model, device, test_loader):
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:    test_loss = 0
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:        for data, target in test_loader:
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:    test_loss /= len(test_loader.dataset)
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:        test_loss, correct, len(test_loader.dataset),
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:        100. * correct / len(test_loader.dataset)))
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:    test_data = datasets.MNIST(
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:    test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:        test(model, device, test_loader)
423f454765d45e21e0ae401da0b3dec2d84113ce:testAV1.txt:This is a test
423f454765d45e21e0ae401da0b3dec2d84113ce:text-collab.txt:This is the collaboration testing

In combination with the fuzzy finder tool fzf, this can make finding a particular commit extremely easy.

For instance, the code below allows you to dynamically search in the result through incremental completion:

git grep "test" $(git rev-list --all) | fzf --cycle -i -e

Or even better, you can automatically copy the short form of the hash of the selected commit to clipboard so that you can use it with git show, git checkout, etc.:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e |
    cut -c 1-7 |
    xclip -r -selection clipboard

Here, I am using xclip to copy to the clipboard as I am on Linux. Depending on your OS you might need to use a different tool.

Of course, you can create a function in your .bashrc file with such code so that you wouldn’t have to type it each time:

grep_all_commits () {
    git grep "$1" $(git rev-list --all) |
        fzf --cycle -i -e |
        cut -c 1-7 |
        xclip -r -selection clipboard
}

Alternatively, you can pass the result directly into whatever git command you want to use that commit for.

Here is an example with git show:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e |
    cut -c 1-7 |
    git show

And if you wanted to get really fancy, you could go with:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e --no-multi \
        --ansi --preview="$_viewGitLogLine" \
        --header "enter: view, C-c: copy hash" \
        --bind "enter:execute:$_viewGitLogLine | less -R" \
        --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"

Wrapped in a function:

grep_all_commits_preview () {
    git grep "$1" $(git rev-list --all) |
        fzf --cycle -i -e --no-multi \
            --ansi --preview="$_viewGitLogLine" \
            --header "enter: view, C-c: copy hash" \
            --bind "enter:execute:$_viewGitLogLine |
              less -R" \
            --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"
}

This last function allows you to search through all the results in an incremental fashion while displaying a preview of the selected diff (the changes made at that particular commit). If you want to see more of the diff than the preview displays, press <enter> (then q to quit the pager), if you want to copy the hash of a commit, press C-c (Control + c).

With this function, you can now instantly get a preview of the changes made to any line containing an expression for any file, at any commit, and copy the hash of the selected commit. This is really powerful.

Aliases

If you don’t want to type a series of flags all the time, you can configure aliases for Git. For instance, Alex Razoumov uses the alias git search for git grep --break --heading -n -i.

Let’s add to it the -p flag. Here is how you would set this alias:

git config --global alias.search 'grep --break --heading -n -i -p'

This setting gets added to your main Git configuration file (on Linux, by default, at ~/.gitconfig).

From there on, you can use your alias with:

git search test
git: 'search' is not a git command. See 'git --help'.

Searching logs

The second thing that can happen is that you are looking for some pattern in your version control logs.

git log

git log allows to get information on commit logs.

By default, it outputs all the commits of the current branch.

Let’s show the logs of the last 3 commits:

git log -3
commit 388fdc13de66537cac2169253cb385dfd409e710
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Thu Dec 14 20:55:30 2023 -0800

    update gitignore

commit 423f454765d45e21e0ae401da0b3dec2d84113ce
Merge: 7342af5 818c32a
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Tue Dec 12 17:30:07 2023 -0800

    Merge branch 'main' of github.com:prosoitos/practice_repo

commit 7342af5dfff53dc51dfaf99da1e29448fd253e03
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Tue Dec 12 17:29:59 2023 -0800

    update gitignore

The output can be customized thanks to a plethora of options.

For instance, here are the logs of the last 15 commits, in a graph, with one line per commit:

git log --graph --oneline -n 15
* 388fdc1 update gitignore
*   423f454 Merge branch 'main' of github.com:prosoitos/practice_repo
|\  
| * 818c32a Delete ml_models directory
| * b3c2414 Created using Colaboratory
* | 7342af5 update gitignore
|/  
* e3cfb2e Update gitignore with Quarto files
* 15fdec6 Update README.org
* 15d4ee9 change values training
* 06efa34 add lots of code
* 1457143 remove stupid line
* 711e1dc add real py content to test_manual.py
* 90016aa adding new python file
*   2c0f612 Merge branch 'main' of github.com:prosoitos/git_workshop_collab
|\  
| *   6f7d03d Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab into main
| |\  
| * \   3c53269 Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab into main
| |\ \  

But git log has also flags that allow to search for patterns.

Searching commit messages

One of the reasons it is so important to write informative commit messages is that they are key to finding commits later on.

To look for a pattern in all your commit messages, use git log --grep=<pattern>.

Let’s look for test in the commit messages and limit the output to 3 commits:

git log --grep=test -3
commit 711e1dc53011e5071b17dc7c35b516f6e066f396
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date:   Tue Mar 15 11:52:48 2022 -0700

    add real py content to test_manual.py

commit a55ca0d60d82578c94bd49fc4ca987727b851216
Author: Manuelhrokr <zl.manuel@protonmail.com>
Date:   Thu Feb 17 15:19:42 2022 -0700

    new comment add just as test

commit ea74e46f487fba09c31524a110fdf060796e3cf8
Author: mpkin <mikin@physics.ubc.ca>
Date:   Thu Sep 23 14:51:24 2021 -0700

    Add test_mk.txt

For a more compact output:

git log --grep="test" -3 --oneline
711e1dc add real py content to test_manual.py
a55ca0d new comment add just as test
ea74e46 Add test_mk.txt

Here too you can use this in combination to fzf with for instance:

git log --grep="test" | fzf --cycle -i -e

Or:

git log --grep="test" --oneline |
    fzf --cycle -i -e --no-multi \
        --ansi --preview="$_viewGitLogLine" \
        --header "enter: view, C-c: copy hash" \
        --bind "enter:execute:$_viewGitLogLine | less -R" \
        --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"

Changes made to a pattern

Remember that test was present in the file src/test_manuel.py. If we want to see when the pattern was first created and then each time it was modified, we use the -L flag in this fashion:

git log -L :<pattern>:file

In our case:

git log -L :test:src/test_manuel.py
commit 711e1dc53011e5071b17dc7c35b516f6e066f396
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date:   Tue Mar 15 11:52:48 2022 -0700

    add real py content to test_manual.py

diff --git a/src/test_manuel.py b/src/test_manuel.py
--- a/src/test_manuel.py
+++ b/src/test_manuel.py
@@ -1,1 +50,19 @@
-test
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
+            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
+            correct += pred.eq(target.view_as(pred)).sum().item()
+
+    test_loss /= len(test_loader.dataset)
+
+    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
+        test_loss, correct, len(test_loader.dataset),
+        100. * correct / len(test_loader.dataset)))
+
+

commit 90016aa3ed3a6cf71e206392bbf10adfe1a14c17
Author: Manuelhrokr <zl.manuel@protonmail.com>
Date:   Thu Feb 17 15:33:03 2022 -0700

    adding new python file

diff --git a/src/test_manuel.py b/src/test_manuel.py
--- /dev/null
+++ b/src/test_manuel.py
@@ -0,0 +1,1 @@
+test

This is very useful if you want to see, for instance, changes made to a function in a script.

Changes in number of occurrences of a pattern

Now, if we want to list all commits that created a change in the number of occurrences of test in our project, we run:

git log -S test --oneline
818c32a Delete ml_models directory
b3c2414 Created using Colaboratory
711e1dc add real py content to test_manual.py
90016aa adding new python file
652faa5 delete my file
b684eac Deleted file
6717236 For collab
ca1845d delete alex.txt
6b56198 editing adrians text file
01a7358 test dtrad
e44a454 Create testAV1.txt
5ee88e6 For collab
cf3d4ea Collab-test
13faa1e test, test
0366115 Adrian's test file
9ebd3ce This is my test
6dfefa8 create redone17.txt
e43163c added alex.txt

This can be useful to identify the commit you need.

TL;DR

Here are the search functions you are the most likely to use:

  • Search for a pattern in the current version of your tracked files:
git grep <pattern>
  • Search for a pattern in your files at a certain commit:
git grep <pattern> <commit>
  • Search for a pattern in your files in all the commits:
git grep <pattern> $(git rev-list --all)
  • Search for a pattern in your commit messages:
git log --grep=<pattern>

Now you should be able to find pretty much anything in your projects and their histories.