Searching a version-controlled project
What is the point of creating all these commits if you are unable to make use of them because you can’t find the information you need in them?
In this workshop, we will learn how to search:
- your files (at any of their versions) and
- your commit logs.
By the end of the workshop, you should be able to retrieve anything you need from your versioned project.
Prerequisites:
This special Git topic is suitable for people who already use Git.
You don’t need to be an expert, but we expect that you are able to run basic Git commands in the command line.
Installation
MacOS & Linux users:
Install Git from the official website.
Windows users:
Install Git for Windows. This will also install “Git Bash”, a Bash emulator.
Using Git
We will use Git from the command line throughout this workshop.
MacOS users: open “Terminal”.
Windows users: open “Git Bash”.
Linux users: open the terminal emulator of your choice.
Practice repo
Get a repo
You are welcome to use a repository of yours to follow this workshop. Alternatively, you can clone a practice repo I have on GitHub:
- Navigate to an appropriate location:
cd /path/to/appropriate/location
- Clone the repo:
# If you have set SSH for your GitHub account
git clone git@github.com:prosoitos/practice_repo.git
# If you haven't set SSH
git clone https://github.com/prosoitos/practice_repo.git
- Enter the repo:
cd practice_repo
Searching files
The first thing that can happen is that you are looking for a certain pattern somewhere in your project (for instance a certain function or a certain word).
git grep
The main command to look through versioned files is git grep
.
You might be familiar with the command-line utility grep which allows to search for lines matching a certain pattern in files. git grep
does a similar job with these differences:
- it is much faster since all files under version control are already indexed by Git,
- you can easily search any commit without having to check it out,
- it has features lacking in
grep
such as, for instance, pattern arithmetic or tree search using globs.
Let’s try it
By default, git grep
searches recursively through the tracked files in the working directory (that is, the current version of the tracked files).
First, let’s look for the word test
in the current version of the tracked files in the test repo:
git grep test
adrian.txt:Adrian's test text file.
formerlyadrian.txt:Adrian's test text file.
ms/protocol.md:This is my test.
ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
redone17.txt:this is a test file from redone17
src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py: test_loss = 0
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
src/test_manuel.py: test_loss /= len(test_loader.dataset)
src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
src/test_manuel.py: test_data = datasets.MNIST(
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py: test(model, device, test_loader)
testAV1.txt:This is a test
text-collab.txt:This is the collaboration testing
Let’s add blank lines between the results of each file for better readability:
git grep --break test
adrian.txt:Adrian's test text file.
formerlyadrian.txt:Adrian's test text file.
ms/protocol.md:This is my test.
ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
redone17.txt:this is a test file from redone17
src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py: test_loss = 0
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
src/test_manuel.py: test_loss /= len(test_loader.dataset)
src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
src/test_manuel.py: test_data = datasets.MNIST(
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py: test(model, device, test_loader)
testAV1.txt:This is a test
text-collab.txt:This is the collaboration testing
Let’s also put the file names on separate lines:
git grep --break --heading test
adrian.txt
Adrian's test text file.
formerlyadrian.txt
Adrian's test text file.
ms/protocol.md
This is my test.
ms/smabraha.txt
This is a test file that I wanted to make, then push it somehow
redone17.txt
this is a test file from redone17
src/test_manuel.py
def test(model, device, test_loader):
test_loss = 0
for data, target in test_loader:
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
test_loss /= len(test_loader.dataset)
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
test_data = datasets.MNIST(
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
test(model, device, test_loader)
testAV1.txt
This is a test
text-collab.txt
This is the collaboration testing
We can display the line numbers for the results with the -n
flag:
git grep --break --heading -n test
adrian.txt
1:Adrian's test text file.
formerlyadrian.txt
1:Adrian's test text file.
ms/protocol.md
9:This is my test.
ms/smabraha.txt
1:This is a test file that I wanted to make, then push it somehow
redone17.txt
1:this is a test file from redone17
src/test_manuel.py
50:def test(model, device, test_loader):
52: test_loss = 0
55: for data, target in test_loader:
58: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
62: test_loss /= len(test_loader.dataset)
65: test_loss, correct, len(test_loader.dataset),
66: 100. * correct / len(test_loader.dataset)))
84: test_data = datasets.MNIST(
90: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
97: test(model, device, test_loader)
testAV1.txt
1:This is a test
text-collab.txt
1:This is the collaboration testing
Notice how the results for the file src/test_manuel.py
involve functions. It would be very convenient to have the names of the functions in which test
appears.
We can do this with the -p
flag:
git grep --break --heading -p test src/test_manuel.py
src/test_manuel.py
def train(model, device, train_loader, optimizer, epoch):
def test(model, device, test_loader):
test_loss = 0
for data, target in test_loader:
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
test_loss /= len(test_loader.dataset)
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main():
test_data = datasets.MNIST(
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
test(model, device, test_loader)
We added the argument src/test_manuel.py
to limit the search to that file.
We can now see that the word test
appears in the functions test
and main
.
Now, instead of printing all the matching lines, let’s print the number of matches per file:
git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
More complex patterns
git grep
in fact searches for regular expressions. test
is a regular expression matching test
, but we can look for more complex patterns.
Let’s look for image
:
git grep image
No output means that the search is not returning any result.
Let’s make this search case insensitive:
git grep -i image
src/new_file.py:from PIL import Image
src/new_file.py:berlin1_lr = Image.open("/home/marie/parvus/pwg/wtm/slides/static/img/upscaling/lr/berlin_1945_1.jpg")
src/new_file.py:berlin1_hr = Image.open("/home/marie/parvus/pwg/wtm/slides/static/img/upscaling/hr/berlin_1945_1.png")
We are now getting some results as Image
was present in three lines of the file src/new_file.py
.
Let’s now search for data
:
git grep data
.gitignore:data/
ms/protocol.md:Collected and analyzed amazing data
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/test_manuel.py:from torchvision import datasets, transforms
src/test_manuel.py: for batch_idx, (data, target) in enumerate(train_loader):
src/test_manuel.py: data, target = data.to(device), target.to(device)
src/test_manuel.py: output = model(data)
src/test_manuel.py: epoch, batch_idx * len(data), len(train_loader.dataset),
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: data, target = data.to(device), target.to(device)
src/test_manuel.py: output = model(data)
src/test_manuel.py: test_loss /= len(test_loader.dataset)
src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
src/test_manuel.py: train_data = datasets.MNIST(
src/test_manuel.py: '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py: # '~/projects/def-sponsor00/data',
src/test_manuel.py: test_data = datasets.MNIST(
src/test_manuel.py: '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py: # '~/projects/def-sponsor00/data',
src/test_manuel.py: train_loader = torch.utils.data.DataLoader(train_data, batch_size=50)
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
We are getting results for the word data
, but also for the pattern data
in longer expressions such as train_data
or dataset
. If we only want results for the word data
, we can use the -w
flag:
git grep -w data
.gitignore:data/
ms/protocol.md:Collected and analyzed amazing data
src/test_manuel.py: for batch_idx, (data, target) in enumerate(train_loader):
src/test_manuel.py: data, target = data.to(device), target.to(device)
src/test_manuel.py: output = model(data)
src/test_manuel.py: epoch, batch_idx * len(data), len(train_loader.dataset),
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: data, target = data.to(device), target.to(device)
src/test_manuel.py: output = model(data)
src/test_manuel.py: '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py: # '~/projects/def-sponsor00/data',
src/test_manuel.py: '~/parvus/pwg/wtm/tml/data',
src/test_manuel.py: # '~/projects/def-sponsor00/data',
src/test_manuel.py: train_loader = torch.utils.data.DataLoader(train_data, batch_size=50)
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
Now, let’s use a more complex regular expression. We want the counts for the pattern ".*_.*"
(i.e. any name with a snail case such as train_loader
):
git grep -c ".*_.*"
.gitignore:2
src/new_file.py:22
src/test_manuel.py:29
Let’s print the first 3 results per file:
git grep -m 3 ".*_.*"
.gitignore:hidden_file
.gitignore:search_cache/
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/new_file.py:set5.column_names
src/test_manuel.py:from torch.optim.lr_scheduler import StepLR
src/test_manuel.py: def __init__(self):
src/test_manuel.py: super(Net, self).__init__()
As you can see, our results also include __init__
which is not what we were looking for. So let’s exclude __
:
git grep -m 3 -e ".*_.*" --and --not -e "__"
.gitignore:hidden_file
.gitignore:search_cache/
src/new_file.py:from datasets import load_dataset
src/new_file.py:set5 = load_dataset('eugenesiow/Set5', 'bicubic_x4', split='validation')
src/new_file.py:set5.column_names
src/test_manuel.py:from torch.optim.lr_scheduler import StepLR
src/test_manuel.py: x = F.max_pool2d(x, 2)
src/test_manuel.py: output = F.log_softmax(x, dim=1)
For simple searches, you don’t have to use the -e
flag before the pattern you are searching for. Here however, our command has gotten complex enough that we have to use it before each pattern.
Let’s make sure this worked as expected:
git grep -c ".*_.*"
echo "---"
git grep -c "__"
echo "---"
git grep -ce ".*_.*" --and --not -e "__"
.gitignore:2
src/new_file.py:22
src/test_manuel.py:29
---
src/test_manuel.py:2
---
.gitignore:2
src/new_file.py:22
src/test_manuel.py:27
There were 2 lines matching __
in src/test_manuel.py
and we have indeed excluded them from our search.
Extended regular expressions are also covered with the flag -E
.
Searching other trees
So far, we have searched the current version of tracked files, but we can just as easily search files at any commit.
Let’s search for test
in the tracked files 20 commits ago:
git grep test HEAD~20
HEAD~20:adrian.txt:Adrian's test text file.
HEAD~20:formerlyadrian.txt:Adrian's test text file.
HEAD~20:ms/protocol.md:This is my test.
HEAD~20:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
HEAD~20:redone17.txt:this is a test file from redone17
HEAD~20:testAV1.txt:This is a test
HEAD~20:text-collab.txt:This is the collaboration testing
As you can see, the file src/test_manuel.py
is not in the results. Either it didn’t exist or it didn’t have the word test
at that commit.
If you want to search tracked files AND untracked files, you need to use the --untracked
flag.
Let’s create a new (thus untracked) file with some content including the word test
:
echo "This is a test" > newfile
Now compare the following:
git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
with:
git grep -c --untracked test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
newfile:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
This last result also returned our untracked file newfile
.
If you want to search untracked and ignored files (meaning all your files), use the flags --untracked --no-exclude-standard
.
Let’s see what the .gitignore
file contains:
cat .gitignore
data/
output/
hidden_file
search_cache/
search.qmd
search.rmarkdown
The directory data
is in .gitignore
. This means that it is not under version control and it thus doesn’t exist in our repo (since we cloned our repo, we only have the version-controlled files). Let’s create it:
mkdir data
Now, let’s create a file in it that contains test
:
echo "And another test" > data/file
We can rerun our previous two searches to verify that files excluded from version control are not searched:
git grep -c test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
git grep -c --untracked test
adrian.txt:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
newfile:1
redone17.txt:1
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
And now, let’s try:
git grep -c --untracked --no-exclude-standard test
adrian.txt:1
data/file:1
formerlyadrian.txt:1
ms/protocol.md:1
ms/smabraha.txt:1
newfile:1
redone17.txt:1
search.qmd:41
search.rmarkdown:41
src/test_manuel.py:10
testAV1.txt:1
text-collab.txt:1
data/file
, despite being excluded from version control, is also searched.
Searching all commits
We saw that git grep <pattern> <commit>
can search a pattern in any commit. Now, what if we all to search all commits for a pattern?
For this, we pass the expression $(git rev-list --all)
in lieu of <commit>
.
git rev-list --all
creates a list of all the commits in a way that can be used as an argument to other functions. The $()
allows to run the expression inside it and pass the result as and argument.
To search for test
in all the commits, we thus run:
git grep "test" $(git rev-list --all)
I am not running this command has it has a huge output. Instead, I will limit the search to the last two commits:
git grep "test" $(git rev-list --all -2)
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:adrian.txt:Adrian's test text file.
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:formerlyadrian.txt:Adrian's test text file.
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:ms/protocol.md:This is my test.
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:redone17.txt:this is a test file from redone17
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py:def test(model, device, test_loader):
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_loss = 0
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: for data, target in test_loader:
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_loss /= len(test_loader.dataset)
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_data = datasets.MNIST(
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:src/test_manuel.py: test(model, device, test_loader)
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:testAV1.txt:This is a test
e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29:text-collab.txt:This is the collaboration testing
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:adrian.txt:Adrian's test text file.
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:formerlyadrian.txt:Adrian's test text file.
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:ms/protocol.md:This is my test.
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:redone17.txt:this is a test file from redone17
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py:def test(model, device, test_loader):
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_loss = 0
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: for data, target in test_loader:
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_loss /= len(test_loader.dataset)
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_data = datasets.MNIST(
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:src/test_manuel.py: test(model, device, test_loader)
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:testAV1.txt:This is a test
15fdec6afb552e4ba2ec5f7a0b371543c9966c27:text-collab.txt:This is the collaboration testing
In combination with the fuzzy finder tool fzf, this can make finding a particular commit extremely easy.
For instance, the code below allows you to dynamically search in the result through incremental completion:
git grep "test" $(git rev-list --all) | fzf --cycle -i -e
Or even better, you can automatically copy the short form of the hash of the selected commit to clipboard so that you can use it with git show
, git checkout
, etc.:
git grep "test" $(git rev-list --all) |
fzf --cycle -i -e |
cut -c 1-7 |
xclip -r -selection clipboard
Here, I am using xclip to copy to the clipboard as I am on Linux. Depending on your OS you might need to use a different tool.
Of course, you can create a function in your .bashrc
file with such code so that you wouldn’t have to type it each time:
grep_all_commits () {
git grep "$1" $(git rev-list --all) |
fzf --cycle -i -e |
cut -c 1-7 |
xclip -r -selection clipboard
}
Alternatively, you can pass the result directly into whatever git command you want to use that commit for.
Here is an example with git show
:
git grep "test" $(git rev-list --all) |
fzf --cycle -i -e |
cut -c 1-7 |
git show
And if you wanted to get really fancy, you could go with:
git grep "test" $(git rev-list --all) |
fzf --cycle -i -e --no-multi \
--ansi --preview="$_viewGitLogLine" \
--header "enter: view, C-c: copy hash" \
--bind "enter:execute:$_viewGitLogLine | less -R" \
--bind "ctrl-c:execute:$_gitLogLineToHash |
xclip -r -selection clipboard"
Wrapped in a function:
grep_all_commits_preview () {
git grep "$1" $(git rev-list --all) |
fzf --cycle -i -e --no-multi \
--ansi --preview="$_viewGitLogLine" \
--header "enter: view, C-c: copy hash" \
--bind "enter:execute:$_viewGitLogLine |
less -R" \
--bind "ctrl-c:execute:$_gitLogLineToHash |
xclip -r -selection clipboard"
}
This last function allows you to search through all the results in an incremental fashion while displaying a preview of the selected diff (the changes made at that particular commit). If you want to see more of the diff than the preview displays, press <enter>
(then q
to quit the pager), if you want to copy the hash of a commit, press C-c
(Control + c).
With this function, you can now instantly get a preview of the changes made to any line containing an expression for any file, at any commit, and copy the hash of the selected commit. This is really powerful.
Aliases
If you don’t want to type a series of flags all the time, you can configure aliases for Git. For instance, Alex Razoumov uses the alias git search
for git grep --break --heading -n -i
.
Let’s add to it the -p
flag. Here is how you would set this alias:
git config --global alias.search 'grep --break --heading -n -i -p'
This setting gets added to your main Git configuration file (on Linux, by default, at ~/.gitconfig
).
From there on, you can use your alias with:
git search test
adrian.txt
1:Adrian's test text file.
formerlyadrian.txt
1:Adrian's test text file.
ms/protocol.md
6=using our awesome Rscript.
9:This is my test.
ms/smabraha.txt
1:This is a test file that I wanted to make, then push it somehow
redone17.txt
1:this is a test file from redone17
src/test_manuel.py
35=def train(model, device, train_loader, optimizer, epoch):
50:def test(model, device, test_loader):
52: test_loss = 0
55: for data, target in test_loader:
58: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
62: test_loss /= len(test_loader.dataset)
64: print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
65: test_loss, correct, len(test_loader.dataset),
66: 100. * correct / len(test_loader.dataset)))
69=def main():
84: test_data = datasets.MNIST(
90: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
97: test(model, device, test_loader)
testAV1.txt
1:This is a test
text-collab.txt
1:This is the collaboration testing
Searching logs
The second thing that can happen is that you are looking for some pattern in your version control logs.
git log
git log
allows to get information on commit logs.
By default, it outputs all the commits of the current branch.
Let’s show the logs of the last 3 commits:
git log -3
commit e3cfb2ebbcb77e52851c32fbba1f6b8b0c788a29
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date: Sat Jan 7 22:32:23 2023 -0800
Update gitignore with Quarto files
commit 15fdec6afb552e4ba2ec5f7a0b371543c9966c27
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date: Fri Jan 6 10:18:28 2023 -0800
Update README.org
commit 15d4ee937db18fb26f84d17ec4be3f0c81614a1c
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date: Wed Mar 16 10:55:28 2022 -0700
change values training
The output can be customized thanks to a plethora of options.
For instance, here are the logs of the last 15 commits, in a graph, with one line per commit:
git log --graph --oneline -n 15
* e3cfb2e Update gitignore with Quarto files
* 15fdec6 Update README.org
* 15d4ee9 change values training
* 06efa34 add lots of code
* 1457143 remove stupid line
* 711e1dc add real py content to test_manual.py
* 90016aa adding new python file
* 2c0f612 Merge branch 'main' of github.com:prosoitos/git_workshop_collab
|\
| * 6f7d03d Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab into main
| |\
| * \ 3c53269 Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab into main
| |\ \
| * \ \ eef5b78 Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab into main
| |\ \ \
| * | | | a55ca0d new comment add just as test
* | | | | dedc94f Merge branch 'main' of github.com:prosoitos/git_workshop_collab
|\ \ \ \ \
| | |_|_|/
| |/| | |
| * | | | b861a65 Merge branch 'main' of https://github.com/prosoitos/git_workshop_collab
| |\ \ \ \
| | | |_|/
| | |/| |
| | * | | 35e8d5a Merge branch 'main' of github.com:prosoitos/git_workshop_collab
| | |\ \ \
| | | | |/
| | | |/|
But git log
has also flags that allow to search for patterns.
Searching commit messages
One of the reasons it is so important to write informative commit messages is that they are key to finding commits later on.
To look for a pattern in all your commit messages, use git log --grep=<pattern>
.
Let’s look for test
in the commit messages and limit the output to 3 commits:
git log --grep=test -3
commit 711e1dc53011e5071b17dc7c35b516f6e066f396
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date: Tue Mar 15 11:52:48 2022 -0700
add real py content to test_manual.py
commit a55ca0d60d82578c94bd49fc4ca987727b851216
Author: Manuelhrokr <zl.manuel@protonmail.com>
Date: Thu Feb 17 15:19:42 2022 -0700
new comment add just as test
commit ea74e46f487fba09c31524a110fdf060796e3cf8
Author: mpkin <mikin@physics.ubc.ca>
Date: Thu Sep 23 14:51:24 2021 -0700
Add test_mk.txt
For a more compact output:
git log --grep="test" -3 --oneline
711e1dc add real py content to test_manual.py
a55ca0d new comment add just as test
ea74e46 Add test_mk.txt
Here too you can use this in combination to fzf with for instance:
git log --grep="test" | fzf --cycle -i -e
Or:
git log --grep="test" --oneline |
fzf --cycle -i -e --no-multi \
--ansi --preview="$_viewGitLogLine" \
--header "enter: view, C-c: copy hash" \
--bind "enter:execute:$_viewGitLogLine | less -R" \
--bind "ctrl-c:execute:$_gitLogLineToHash |
xclip -r -selection clipboard"
Changes made to a pattern
Remember that test
was present in the file src/test_manuel.py
. If we want to see when the pattern was first created and then each time it was modified, we use the -L
flag in this fashion:
git log -L :<pattern>:file
In our case:
git log -L :test:src/test_manuel.py
commit 711e1dc53011e5071b17dc7c35b516f6e066f396
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date: Tue Mar 15 11:52:48 2022 -0700
add real py content to test_manual.py
diff --git a/src/test_manuel.py b/src/test_manuel.py
--- a/src/test_manuel.py
+++ b/src/test_manuel.py
@@ -1,1 +50,19 @@
-test
+def test(model, device, test_loader):
+ model.eval()
+ test_loss = 0
+ correct = 0
+ with torch.no_grad():
+ for data, target in test_loader:
+ data, target = data.to(device), target.to(device)
+ output = model(data)
+ test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
+ pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
+ correct += pred.eq(target.view_as(pred)).sum().item()
+
+ test_loss /= len(test_loader.dataset)
+
+ print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
+ test_loss, correct, len(test_loader.dataset),
+ 100. * correct / len(test_loader.dataset)))
+
+
commit 90016aa3ed3a6cf71e206392bbf10adfe1a14c17
Author: Manuelhrokr <zl.manuel@protonmail.com>
Date: Thu Feb 17 15:33:03 2022 -0700
adding new python file
diff --git a/src/test_manuel.py b/src/test_manuel.py
--- /dev/null
+++ b/src/test_manuel.py
@@ -0,0 +1,1 @@
+test
This is very useful if you want to see, for instance, changes made to a function in a script.
Changes in number of occurrences of a pattern
Now, if we want to list all commits that created a change in the number of occurrences of test
in our project, we run:
git log -S test --oneline
711e1dc add real py content to test_manual.py
90016aa adding new python file
652faa5 delete my file
b684eac Deleted file
6717236 For collab
ca1845d delete alex.txt
6b56198 editing adrians text file
01a7358 test dtrad
e44a454 Create testAV1.txt
5ee88e6 For collab
cf3d4ea Collab-test
13faa1e test, test
0366115 Adrian's test file
9ebd3ce This is my test
6dfefa8 create redone17.txt
e43163c added alex.txt
This can be useful to identify the commit you need.
TL;DR
Here are the search functions you are the most likely to use:
- Search for a pattern in the current version of your tracked files:
git grep <pattern>
- Search for a pattern in your files at a certain commit:
git grep <pattern> <commit>
- Search for a pattern in your files in all the commits:
git grep <pattern> $(git rev-list --all)
- Search for a pattern in your commit messages:
git log --grep=<pattern>
Now you should be able to find pretty much anything in your projects and their histories.