Install Git for Windows. This will also install “Git Bash”, a Bash emulator.
Using Git
We will use Git from the command line throughout this workshop.
MacOS users: open “Terminal”.
Windows users: open “Git Bash”.
Linux users: open the terminal emulator of your choice.
Practice repo
Get a repo
You are welcome to use a repository of yours to follow this workshop. Alternatively, you can clone a practice repo I have on GitHub:
Navigate to an appropriate location:
cd /path/to/appropriate/location
Clone the repo:
# If you have set SSH for your GitHub accountgit clone git@github.com:prosoitos/practice_repo.git
# If you haven't set SSHgit clone https://github.com/prosoitos/practice_repo.git
Enter the repo:
cd practice_repo
Searching files
The first thing that can happen is that you are looking for a certain pattern somewhere in your project (for instance a certain function or a certain word).
git grep
The main command to look through versioned files is git grep.
You might be familiar with the command-line utility grep which allows to search for lines matching a certain pattern in files. git grep does a similar job with these differences:
it is much faster since all files under version control are already indexed by Git,
you can easily search any commit without having to check it out,
By default, git grep searches recursively through the tracked files in the working directory (that is, the current version of the tracked files).
First, let’s look for the word test in the current version of the tracked files in the test repo:
git grep test
adrian.txt:Adrian's test text file.
formerlyadrian.txt:Adrian's test text file.
ms/protocol.md:This is my test.
ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
redone17.txt:this is a test file from redone17
src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py: test_loss = 0
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
src/test_manuel.py: test_loss /= len(test_loader.dataset)
src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
src/test_manuel.py: test_data = datasets.MNIST(
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py: test(model, device, test_loader)
testAV1.txt:This is a test
text-collab.txt:This is the collaboration testing
Let’s add blank lines between the results of each file for better readability:
git grep --break test
adrian.txt:Adrian's test text file.
formerlyadrian.txt:Adrian's test text file.
ms/protocol.md:This is my test.
ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
redone17.txt:this is a test file from redone17
src/test_manuel.py:def test(model, device, test_loader):
src/test_manuel.py: test_loss = 0
src/test_manuel.py: for data, target in test_loader:
src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
src/test_manuel.py: test_loss /= len(test_loader.dataset)
src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
src/test_manuel.py: test_data = datasets.MNIST(
src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
src/test_manuel.py: test(model, device, test_loader)
testAV1.txt:This is a test
text-collab.txt:This is the collaboration testing
Let’s also put the file names on separate lines:
git grep --break--heading test
adrian.txt
Adrian's test text file.
formerlyadrian.txt
Adrian's test text file.
ms/protocol.md
This is my test.
ms/smabraha.txt
This is a test file that I wanted to make, then push it somehow
redone17.txt
this is a test file from redone17
src/test_manuel.py
def test(model, device, test_loader):
test_loss = 0
for data, target in test_loader:
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
test_loss /= len(test_loader.dataset)
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
test_data = datasets.MNIST(
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
test(model, device, test_loader)
testAV1.txt
This is a test
text-collab.txt
This is the collaboration testing
We can display the line numbers for the results with the -n flag:
git grep --break--heading-n test
adrian.txt
1:Adrian's test text file.
formerlyadrian.txt
1:Adrian's test text file.
ms/protocol.md
9:This is my test.
ms/smabraha.txt
1:This is a test file that I wanted to make, then push it somehow
redone17.txt
1:this is a test file from redone17
src/test_manuel.py
50:def test(model, device, test_loader):
52: test_loss = 0
55: for data, target in test_loader:
58: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
62: test_loss /= len(test_loader.dataset)
65: test_loss, correct, len(test_loader.dataset),
66: 100. * correct / len(test_loader.dataset)))
84: test_data = datasets.MNIST(
90: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
97: test(model, device, test_loader)
testAV1.txt
1:This is a test
text-collab.txt
1:This is the collaboration testing
Notice how the results for the file src/test_manuel.py involve functions. It would be very convenient to have the names of the functions in which test appears.
We can do this with the -p flag:
git grep --break--heading-p test src/test_manuel.py
We are getting results for the word data, but also for the pattern data in longer expressions such as train_data or dataset. If we only want results for the word data, we can use the -w flag:
For simple searches, you don’t have to use the -e flag before the pattern you are searching for. Here however, our command has gotten complex enough that we have to use it before each pattern.
There were 2 lines matching __ in src/test_manuel.py and we have indeed excluded them from our search.
Extended regular expressions are also covered with the flag -E.
Searching other trees
So far, we have searched the current version of tracked files, but we can just as easily search files at any commit.
Let’s search for test in the tracked files 20 commits ago:
git grep test HEAD~20
HEAD~20:adrian.txt:Adrian's test text file.
HEAD~20:formerlyadrian.txt:Adrian's test text file.
HEAD~20:ms/protocol.md:This is my test.
HEAD~20:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
HEAD~20:redone17.txt:this is a test file from redone17
HEAD~20:testAV1.txt:This is a test
HEAD~20:text-collab.txt:This is the collaboration testing
As you can see, the file src/test_manuel.py is not in the results. Either it didn’t exist or it didn’t have the word test at that commit.
If you want to search tracked files AND untracked files, you need to use the --untracked flag.
Let’s create a new (thus untracked) file with some content including the word test:
The directory data is in .gitignore. This means that it is not under version control and it thus doesn’t exist in our repo (since we cloned our repo, we only have the version-controlled files). Let’s create it:
mkdir data
Now, let’s create a file in it that contains test:
echo"And another test"> data/file
We can rerun our previous two searches to verify that files excluded from version control are not searched:
data/file, despite being excluded from version control, is also searched.
Searching all commits
We saw that git grep <pattern> <commit> can search a pattern in any commit. Now, what if we all to search all commits for a pattern?
For this, we pass the expression $(git rev-list --all) in lieu of <commit>.
git rev-list --all creates a list of all the commits in a way that can be used as an argument to other functions. The $() allows to run the expression inside it and pass the result as and argument.
To search for test in all the commits, we thus run:
git grep "test"$(git rev-list --all)
I am not running this command has it has a huge output. Instead, I will limit the search to the last two commits:
git grep "test"$(git rev-list --all-2)
388fdc13de66537cac2169253cb385dfd409e710:adrian.txt:Adrian's test text file.
388fdc13de66537cac2169253cb385dfd409e710:formerlyadrian.txt:Adrian's test text file.
388fdc13de66537cac2169253cb385dfd409e710:ms/protocol.md:This is my test.
388fdc13de66537cac2169253cb385dfd409e710:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
388fdc13de66537cac2169253cb385dfd409e710:redone17.txt:this is a test file from redone17
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py:def test(model, device, test_loader):
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_loss = 0
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: for data, target in test_loader:
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_loss /= len(test_loader.dataset)
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_data = datasets.MNIST(
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
388fdc13de66537cac2169253cb385dfd409e710:src/test_manuel.py: test(model, device, test_loader)
388fdc13de66537cac2169253cb385dfd409e710:testAV1.txt:This is a test
388fdc13de66537cac2169253cb385dfd409e710:text-collab.txt:This is the collaboration testing
423f454765d45e21e0ae401da0b3dec2d84113ce:adrian.txt:Adrian's test text file.
423f454765d45e21e0ae401da0b3dec2d84113ce:formerlyadrian.txt:Adrian's test text file.
423f454765d45e21e0ae401da0b3dec2d84113ce:ms/protocol.md:This is my test.
423f454765d45e21e0ae401da0b3dec2d84113ce:ms/smabraha.txt:This is a test file that I wanted to make, then push it somehow
423f454765d45e21e0ae401da0b3dec2d84113ce:redone17.txt:this is a test file from redone17
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py:def test(model, device, test_loader):
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_loss = 0
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: for data, target in test_loader:
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_loss /= len(test_loader.dataset)
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_loss, correct, len(test_loader.dataset),
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: 100. * correct / len(test_loader.dataset)))
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_data = datasets.MNIST(
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test_loader = torch.utils.data.DataLoader(test_data, batch_size=100)
423f454765d45e21e0ae401da0b3dec2d84113ce:src/test_manuel.py: test(model, device, test_loader)
423f454765d45e21e0ae401da0b3dec2d84113ce:testAV1.txt:This is a test
423f454765d45e21e0ae401da0b3dec2d84113ce:text-collab.txt:This is the collaboration testing
In combination with the fuzzy finder tool fzf, this can make finding a particular commit extremely easy.
For instance, the code below allows you to dynamically search in the result through incremental completion:
Or even better, you can automatically copy the short form of the hash of the selected commit to clipboard so that you can use it with git show, git checkout, etc.:
This last function allows you to search through all the results in an incremental fashion while displaying a preview of the selected diff (the changes made at that particular commit). If you want to see more of the diff than the preview displays, press <enter> (then q to quit the pager), if you want to copy the hash of a commit, press C-c (Control + c).
With this function, you can now instantly get a preview of the changes made to any line containing an expression for any file, at any commit, and copy the hash of the selected commit. This is really powerful.
Aliases
If you don’t want to type a series of flags all the time, you can configure aliases for Git. For instance, Alex Razoumov uses the alias git search for git grep --break --heading -n -i.
Let’s add to it the -p flag. Here is how you would set this alias:
Remember that test was present in the file src/test_manuel.py. If we want to see when the pattern was first created and then each time it was modified, we use the -L flag in this fashion:
git log -L :<pattern>:file
In our case:
git log -L :test:src/test_manuel.py
commit 711e1dc53011e5071b17dc7c35b516f6e066f396
Author: Marie-Helene Burle <marie.burle@westgrid.ca>
Date: Tue Mar 15 11:52:48 2022 -0700
add real py content to test_manual.py
diff --git a/src/test_manuel.py b/src/test_manuel.py
--- a/src/test_manuel.py
+++ b/src/test_manuel.py
@@ -1,1 +50,19 @@
-test
+def test(model, device, test_loader):
+ model.eval()
+ test_loss = 0
+ correct = 0
+ with torch.no_grad():
+ for data, target in test_loader:
+ data, target = data.to(device), target.to(device)
+ output = model(data)
+ test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
+ pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
+ correct += pred.eq(target.view_as(pred)).sum().item()
+
+ test_loss /= len(test_loader.dataset)
+
+ print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
+ test_loss, correct, len(test_loader.dataset),
+ 100. * correct / len(test_loader.dataset)))
+
+
commit 90016aa3ed3a6cf71e206392bbf10adfe1a14c17
Author: Manuelhrokr <zl.manuel@protonmail.com>
Date: Thu Feb 17 15:33:03 2022 -0700
adding new python file
diff --git a/src/test_manuel.py b/src/test_manuel.py
--- /dev/null
+++ b/src/test_manuel.py
@@ -0,0 +1,1 @@
+test
This is very useful if you want to see, for instance, changes made to a function in a script.
Changes in number of occurrences of a pattern
Now, if we want to list all commits that created a change in the number of occurrences of test in our project, we run:
git log -S test --oneline
818c32a Delete ml_models directory
b3c2414 Created using Colaboratory
711e1dc add real py content to test_manual.py
90016aa adding new python file
652faa5 delete my file
b684eac Deleted file
6717236 For collab
ca1845d delete alex.txt
6b56198 editing adrians text file
01a7358 test dtrad
e44a454 Create testAV1.txt
5ee88e6 For collab
cf3d4ea Collab-test
13faa1e test, test
0366115 Adrian's test file
9ebd3ce This is my test
6dfefa8 create redone17.txt
e43163c added alex.txt
This can be useful to identify the commit you need.
TL;DR
Here are the search functions you are the most likely to use:
Search for a pattern in the current version of your tracked files:
git grep <pattern>
Search for a pattern in your files at a certain commit:
git grep <pattern><commit>
Search for a pattern in your files in all the commits:
git grep <pattern>$(git rev-list --all)
Search for a pattern in your commit messages:
git log --grep=<pattern>
Now you should be able to find pretty much anything in your projects and their histories.