Searching a version-controlled project

Author

Marie-Hélène Burle

What is the point of creating all these commits if you are unable to make use of them because you can’t find the information you need in them?

In this workshop, we will learn how to search:

  • your files (at any of their versions) and
  • your commit logs.

By the end of the workshop, you should be able to retrieve anything you need from your versioned project.

Prerequisites:

This special Git topic is suitable for people who already use Git.

You don’t need to be an expert, but we expect that you are able to run basic Git commands in the command line.

Installation

MacOS & Linux users:

Install Git from the official website.

Windows users:

Install Git for Windows. This will also install “Git Bash”, a Bash emulator.

Using Git

We will use Git from the command line throughout this workshop.

MacOS users:    open “Terminal”.
Windows users:   open “Git Bash”.
Linux users:    open the terminal emulator of your choice.

Practice repo

Get a repo

You are welcome to use a repository of yours to follow this workshop. Alternatively, you can clone a practice repo I have on GitHub:

  1. Navigate to an appropriate location:
cd /path/to/appropriate/location
  1. Clone the repo:
# If you have set SSH for your GitHub account
git clone git@github.com:prosoitos/practice_repo.git
# If you haven't set SSH
git clone https://github.com/prosoitos/practice_repo.git
  1. Enter the repo:
cd practice_repo

Searching files

The first thing that can happen is that you are looking for a certain pattern somewhere in your project (for instance a certain function or a certain word).

git grep

The main command to look through versioned files is git grep.

You might be familiar with the command-line utility grep which allows to search for lines matching a certain pattern in files. git grep does a similar job with these differences:

  • it is much faster since all files under version control are already indexed by Git,
  • you can easily search any commit without having to check it out,
  • it has features lacking in grep such as, for instance, pattern arithmetic or tree search using globs.

Let’s try it

By default, git grep searches recursively through the tracked files in the working directory (that is, the current version of the tracked files).

First, let’s look for the word test in the current version of the tracked files in the test repo:

git grep test
intro_aliases.qmd:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
intro_aliases.qmd:from the *"Searching a Git project"* section below will search for the string "test" in all previous
intro_aliases.qmd:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
intro_aliases.qmd:git search test
intro_aliases.qmd:should search the entire current Git project history for "test".
intro_branches.qmd:git branch test
intro_branches.qmd:git switch test
intro_branches.qmd:* test
intro_branches.qmd:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
intro_branches.qmd:git diff main test
intro_branches.qmd:When you are happy with the changes you made on your test branch, you can merge it into `main`.
intro_branches.qmd:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
intro_branches.qmd:git merge test
intro_branches.qmd:Then, usually, you delete the branch `test` as it has served its purpose:
intro_branches.qmd:git branch -d test
intro_branches.qmd:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
intro_branches.qmd:Let's go back to our situation before we created the branch `test`:
intro_branches.qmd:This time, you create a branch called `test2`:
intro_branches.qmd:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
intro_branches.qmd:git merge test2
intro_branches.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
intro_branches.qmd:>>>>>>> test2
intro_intro_old.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
intro_intro_old.qmd:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
intro_intro_old.qmd:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
intro_intro_old.qmd:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
intro_intro_old.qmd:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
intro_intro_old.qmd:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
intro_intro_old.qmd:You create a test branch and switch to it:
intro_intro_old.qmd:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
intro_intro_old.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
intro_remotes.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.
intro_revisiting_old_commits_alternate.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
intro_undo.qmd:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:
ws_collab.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

Let’s add blank lines between the results of each file for better readability:

git grep --break test
intro_aliases.qmd:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
intro_aliases.qmd:from the *"Searching a Git project"* section below will search for the string "test" in all previous
intro_aliases.qmd:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
intro_aliases.qmd:git search test
intro_aliases.qmd:should search the entire current Git project history for "test".

intro_branches.qmd:git branch test
intro_branches.qmd:git switch test
intro_branches.qmd:* test
intro_branches.qmd:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
intro_branches.qmd:git diff main test
intro_branches.qmd:When you are happy with the changes you made on your test branch, you can merge it into `main`.
intro_branches.qmd:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
intro_branches.qmd:git merge test
intro_branches.qmd:Then, usually, you delete the branch `test` as it has served its purpose:
intro_branches.qmd:git branch -d test
intro_branches.qmd:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
intro_branches.qmd:Let's go back to our situation before we created the branch `test`:
intro_branches.qmd:This time, you create a branch called `test2`:
intro_branches.qmd:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
intro_branches.qmd:git merge test2
intro_branches.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
intro_branches.qmd:>>>>>>> test2

intro_intro_old.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
intro_intro_old.qmd:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
intro_intro_old.qmd:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
intro_intro_old.qmd:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
intro_intro_old.qmd:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
intro_intro_old.qmd:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
intro_intro_old.qmd:You create a test branch and switch to it:
intro_intro_old.qmd:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
intro_intro_old.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):

intro_remotes.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

intro_revisiting_old_commits_alternate.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.

intro_undo.qmd:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:

ws_collab.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

Let’s also put the file names on separate lines:

git grep --break --heading test
intro_aliases.qmd
Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
from the *"Searching a Git project"* section below will search for the string "test" in all previous
commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
git search test
should search the entire current Git project history for "test".

intro_branches.qmd
git branch test
git switch test
* test
The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
git diff main test
When you are happy with the changes you made on your test branch, you can merge it into `main`.
If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
git merge test
Then, usually, you delete the branch `test` as it has served its purpose:
git branch -d test
Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
Let's go back to our situation before we created the branch `test`:
This time, you create a branch called `test2`:
To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
git merge test2
After which, you can delete the (now useless) test branch (with `git branch -d test2`):
>>>>>>> test2

intro_intro_old.qmd
The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
You create a test branch and switch to it:
To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
After which, you can delete the (now useless) test branch (with `git branch -d test2`):

intro_remotes.qmd
Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

intro_revisiting_old_commits_alternate.qmd
The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.

intro_undo.qmd
Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:

ws_collab.qmd
Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

We can display the line numbers for the results with the -n flag:

git grep --break --heading -n test
intro_aliases.qmd
48:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
49:from the *"Searching a Git project"* section below will search for the string "test" in all previous
50:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
68:git search test
71:should search the entire current Git project history for "test".

intro_branches.qmd
54:git branch test
72:git switch test
99:* test
102:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
109:git diff main test
116:When you are happy with the changes you made on your test branch, you can merge it into `main`.
120:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
135:git merge test
140:Then, usually, you delete the branch `test` as it has served its purpose:
143:git branch -d test
148:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
154:Let's go back to our situation before we created the branch `test`:
158:This time, you create a branch called `test2`:
182:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
185:git merge test2
190:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
215:>>>>>>> test2

intro_intro_old.qmd
904:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
1219:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
1227:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
1233:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
1237:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
1238:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
1250:You create a test branch and switch to it:
1270:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
1274:After which, you can delete the (now useless) test branch (with `git branch -d test2`):

intro_remotes.qmd
45:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

intro_revisiting_old_commits_alternate.qmd
3:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.

intro_undo.qmd
16:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:

ws_collab.qmd
52:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

Notice how the results for the file src/test_manuel.py involve functions. It would be very convenient to have the names of the functions in which test appears.

We can do this with the -p flag:

git grep --break --heading -p test src/test_manuel.py
fatal: ambiguous argument 'src/test_manuel.py': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

We added the argument src/test_manuel.py to limit the search to that file.

We can now see that the word test appears in the functions test and main.

Now, instead of printing all the matching lines, let’s print the number of matches per file:

git grep -c test
intro_aliases.qmd:5
intro_branches.qmd:17
intro_intro_old.qmd:9
intro_remotes.qmd:1
intro_revisiting_old_commits_alternate.qmd:1
intro_undo.qmd:1
ws_collab.qmd:1

More complex patterns

git grep in fact searches for regular expressions. test is a regular expression matching test, but we can look for more complex patterns.

Let’s look for image:

git grep image
intro_ignore.qmd:- Non-text files (e.g. images, office documents)

No output means that the search is not returning any result.

Let’s make this search case insensitive:

git grep -i image
intro_ignore.qmd:- Non-text files (e.g. images, office documents)

We are now getting some results as Image was present in three lines of the file src/new_file.py.

Let’s now search for data:

git grep data
intro_changes.qmd:Remember that HEAD is a pointer pointing at a branch, that a branch is itself a pointer pointing at a commit, and finally that a commit is a Git object pointing at compressed blobs containing data about your project at a certain commit. When the HEAD pointer moves around, whatever commit it points to populates the [HEAD]{.emph} tree with the corresponding data.
intro_first_steps.qmd:Alternatively, you can download the file with this button: 
intro_first_steps.qmd:data/
intro_first_steps.qmd:data
intro_first_steps.qmd:data
intro_first_steps.qmd:./data:
intro_first_steps.qmd:dataset.csv
intro_first_steps.qmd:├── data
intro_first_steps.qmd:│   └── dataset.csv
intro_first_steps.qmd:This is our very exciting data set:
intro_first_steps.qmd:cat data/dataset.csv
intro_first_steps.qmd:df = pd.read_csv('../data/dataset.csv')
intro_first_steps.qmd:data
intro_first_steps.qmd:        data/
intro_first_steps.qmd:        data/
intro_first_steps.qmd:Remember that each commit contains the following metadata:
intro_first_steps.qmd:        data/
intro_ignore.qmd:- Your initial data
intro_ignore.qmd:Notice how `data/` is not listed in the untracked files anymore.
intro_ignore.qmd:git commit -m "Add .gitignore file with data and results"
intro_ignore.qmd:[main a1df8e5] Add .gitignore file with data and results
intro_intro_old.qmd:mkdir chapter3/src chapter3/ms chapter3/data chapter3/results
intro_intro_old.qmd:df <- data.frame(
intro_intro_old.qmd:data
intro_intro_old.qmd:Each commit is identified by a unique *hash* and contains these metadata:
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:Not everything should be under version control. For instance, you don't want to put under version control non-text files or your initial data. You also shouldn't put under version control documents that can be easily recreated such as graphs and script outputs.
intro_intro_old.qmd:echo "/data/
intro_intro_old.qmd:This creates a `.gitignore` file with two entries (`/data/` and `/results/`) and from now on, any file in either of these directories will be ignored by Git.
intro_intro_old.qmd:git commit -m "Add .gitignore file with data and results"
intro_intro_old.qmd:[main a1df8e5] Add .gitignore file with data and results
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_old.qmd:a1df8e5 (HEAD -> main) Add .gitignore file with data and results
intro_intro_old.qmd:|     Add .gitignore file with data and results
intro_intro_old.qmd:* a1df8e5 88 seconds ago  (HEAD -> main)Add .gitignore file with data and results xxx@xxx
intro_intro_old.qmd:In addition to displaying the commit metadata, this also displays the difference with the previous commit.
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_old.qmd:+/data/
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_slides.qmd:The data is stored as blobs, doesn't create unnecessary copies (unchanged files are referenced from old blobs), and uses excellent compression
intro_intro_slides.qmd:Each commit has a unique *hash* and contains the following metadata:
intro_logs.qmd:    Add .gitignore file with data and results
intro_logs.qmd:c4ab5e7 Add .gitignore file with data and results
intro_logs.qmd:|     Add .gitignore file with data and results
intro_logs.qmd:* c4ab5e7 34 minutes ago Add .gitignore file with data and results xxx@xxx
intro_logs.qmd:+df = pd.read_csv('../data/dataset.csv')
intro_logs.qmd:    Add .gitignore file with data and results
intro_logs.qmd:+/data/
intro_logs.qmd:In addition to displaying the commit metadata, `git show` also displays the diff of that commit with its parent commit.
intro_remotes.qmd:## Getting data from a remote
intro_remotes.qmd:If you collaborate on a project, you have to get the data added by your teammates to keep your local project up to date.
intro_remotes.qmd:To download new data from a remote, you have 2 options:
intro_remotes.qmd:*Fetching* downloads the data from a remote that you don't already have in your local version of the project:
intro_remotes.qmd:Uploading data to the remote is called *pushing*:
intro_undo.qmd:As you just experienced, this command leads to data loss. \
Binary file project.zip matches
wb_dvc.qmd:title: Version control for data science and machine learning with DVC

We are getting results for the word data, but also for the pattern data in longer expressions such as train_data or dataset. If we only want results for the word data, we can use the -w flag:

git grep -w data
intro_changes.qmd:Remember that HEAD is a pointer pointing at a branch, that a branch is itself a pointer pointing at a commit, and finally that a commit is a Git object pointing at compressed blobs containing data about your project at a certain commit. When the HEAD pointer moves around, whatever commit it points to populates the [HEAD]{.emph} tree with the corresponding data.
intro_first_steps.qmd:Alternatively, you can download the file with this button: 
intro_first_steps.qmd:data/
intro_first_steps.qmd:data
intro_first_steps.qmd:data
intro_first_steps.qmd:./data:
intro_first_steps.qmd:├── data
intro_first_steps.qmd:This is our very exciting data set:
intro_first_steps.qmd:cat data/dataset.csv
intro_first_steps.qmd:df = pd.read_csv('../data/dataset.csv')
intro_first_steps.qmd:data
intro_first_steps.qmd:        data/
intro_first_steps.qmd:        data/
intro_first_steps.qmd:        data/
intro_ignore.qmd:- Your initial data
intro_ignore.qmd:Notice how `data/` is not listed in the untracked files anymore.
intro_ignore.qmd:git commit -m "Add .gitignore file with data and results"
intro_ignore.qmd:[main a1df8e5] Add .gitignore file with data and results
intro_intro_old.qmd:mkdir chapter3/src chapter3/ms chapter3/data chapter3/results
intro_intro_old.qmd:df <- data.frame(
intro_intro_old.qmd:data
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:Not everything should be under version control. For instance, you don't want to put under version control non-text files or your initial data. You also shouldn't put under version control documents that can be easily recreated such as graphs and script outputs.
intro_intro_old.qmd:echo "/data/
intro_intro_old.qmd:This creates a `.gitignore` file with two entries (`/data/` and `/results/`) and from now on, any file in either of these directories will be ignored by Git.
intro_intro_old.qmd:git commit -m "Add .gitignore file with data and results"
intro_intro_old.qmd:[main a1df8e5] Add .gitignore file with data and results
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_old.qmd:a1df8e5 (HEAD -> main) Add .gitignore file with data and results
intro_intro_old.qmd:|     Add .gitignore file with data and results
intro_intro_old.qmd:* a1df8e5 88 seconds ago  (HEAD -> main)Add .gitignore file with data and results xxx@xxx
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_old.qmd:+/data/
intro_intro_old.qmd:@@ -7,3 +7,5 @@ df <- data.frame(
intro_intro_old.qmd:    Add .gitignore file with data and results
intro_intro_slides.qmd:The data is stored as blobs, doesn't create unnecessary copies (unchanged files are referenced from old blobs), and uses excellent compression
intro_logs.qmd:    Add .gitignore file with data and results
intro_logs.qmd:c4ab5e7 Add .gitignore file with data and results
intro_logs.qmd:|     Add .gitignore file with data and results
intro_logs.qmd:* c4ab5e7 34 minutes ago Add .gitignore file with data and results xxx@xxx
intro_logs.qmd:+df = pd.read_csv('../data/dataset.csv')
intro_logs.qmd:    Add .gitignore file with data and results
intro_logs.qmd:+/data/
intro_remotes.qmd:## Getting data from a remote
intro_remotes.qmd:If you collaborate on a project, you have to get the data added by your teammates to keep your local project up to date.
intro_remotes.qmd:To download new data from a remote, you have 2 options:
intro_remotes.qmd:*Fetching* downloads the data from a remote that you don't already have in your local version of the project:
intro_remotes.qmd:Uploading data to the remote is called *pushing*:
intro_undo.qmd:As you just experienced, this command leads to data loss. \
Binary file project.zip matches
wb_dvc.qmd:title: Version control for data science and machine learning with DVC

Now, let’s use a more complex regular expression. We want the counts for the pattern ".*_.*" (i.e. any name with a snail case such as train_loader):

git grep -c ".*_.*"
img/01.png:16
img/02.png:32
img/03.png:31
img/04.png:26
img/05.png:31
img/06.png:32
img/07.png:30
img/08.png:34
img/09.png:35
img/10.png:41
img/11.png:47
img/12.png:40
img/13.png:39
img/14.png:32
img/15.png:38
img/16.png:43
img/17.png:34
img/18.png:35
img/19.png:30
img/20.png:33
img/21.png:40
img/22.png:41
img/23.png:47
img/24.png:64
img/25.png:66
img/26.png:50
img/27.png:60
img/28.png:57
img/29.png:33
img/30.png:39
img/31.png:14
img/32.png:16
img/33.png:18
img/34.png:16
img/35.png:20
img/36.png:18
img/37.png:18
img/51.png:55
img/52.png:46
img/53.png:55
img/collab.jpg:178
img/git_graph.png:121
img/gitout.png:42
img/logo_git.png:4
img/vc.jpg:259
index.qmd:4
intro_documentation.qmd:1
intro_first_steps.qmd:4
intro_install.qmd:2
intro_intro.qmd:1
intro_intro_old.qmd:8
intro_intro_slides.qmd:5
intro_logs.qmd:1
intro_tags.qmd:5
intro_time_travel.qmd:1
top_intro.qmd:2
top_ws.qmd:3
wb_dvc.qmd:1

Let’s print the first 3 results per file:

git grep -m 3 ".*_.*"
Binary file img/01.png matches
Binary file img/02.png matches
Binary file img/03.png matches
Binary file img/04.png matches
Binary file img/05.png matches
Binary file img/06.png matches
Binary file img/07.png matches
Binary file img/08.png matches
Binary file img/09.png matches
Binary file img/10.png matches
Binary file img/11.png matches
Binary file img/12.png matches
Binary file img/13.png matches
Binary file img/14.png matches
Binary file img/15.png matches
Binary file img/16.png matches
Binary file img/17.png matches
Binary file img/18.png matches
Binary file img/19.png matches
Binary file img/20.png matches
Binary file img/21.png matches
Binary file img/22.png matches
Binary file img/23.png matches
Binary file img/24.png matches
Binary file img/25.png matches
Binary file img/26.png matches
Binary file img/27.png matches
Binary file img/28.png matches
Binary file img/29.png matches
Binary file img/30.png matches
Binary file img/31.png matches
Binary file img/32.png matches
Binary file img/33.png matches
Binary file img/34.png matches
Binary file img/35.png matches
Binary file img/36.png matches
Binary file img/37.png matches
Binary file img/51.png matches
Binary file img/52.png matches
Binary file img/53.png matches
Binary file img/collab.jpg matches
Binary file img/git_graph.png matches
Binary file img/gitout.png matches
Binary file img/logo_git.png matches
Binary file img/vc.jpg matches
index.qmd:  Version control & collaboration with &nbsp;[![](img/logo_git.png){width="1.3em" fig-alt="noshadow"}](https://git-scm.com/)
index.qmd:[Getting started with &nbsp;![](img/logo_git.png){width="1.2em" fig-alt="noshadow"}](top_intro.qmd){.card-title2 .stretched-link}
index.qmd:[Workshops](top_ws.qmd){.card-title2 .stretched-link}
intro_documentation.qmd:All these methods lead to the same thing: the manual page corresponding to the command will open in a pager (usually [less](https://en.wikipedia.org/wiki/Less_(Unix))). A pager is a program which makes it easier to read documents in the command line.
intro_first_steps.qmd:  - first_steps.html
intro_first_steps.qmd:wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1SJV5mRGexf91lNyFwdS_JmuAXX0xS4pE' -O project.zip
intro_first_steps.qmd:df = pd.read_csv('../data/dataset.csv')
intro_install.qmd:Git is built for Unix-like systems (Linux and MacOS). In order to use Git from the command line on Windows, you need a Unix shell such as [Bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)). To make this very easy, Git for Windows comes with its Bash emulator.
intro_install.qmd:git config user.email "your_other@email"
intro_intro.qmd:[Slides](intro_intro_slides.html){.btn .btn-outline-primary} [(Click and wait: the presentation might take a few instants to load)]{.inlinenote}
intro_intro_old.qmd:  - intro_old.html
intro_intro_old.qmd:<script type="text/javascript" src="https://ssl.gstatic.com/trends_nrtr/3045_RC01/embed_loader.js"></script> <script type="text/javascript"> trends.embed.renderExploreWidget("TIMESERIES", {"comparisonItem":[{"keyword":"/m/05vqwg","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/08441_","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/012ct9","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/09d6g","geo":"","time":"2004-01-01 2022-10-03"}],"category":0,"property":""}, {"exploreQuery":"date=all&q=%2Fm%2F05vqwg,%2Fm%2F08441_,%2Fm%2F012ct9,%2Fm%2F09d6g","guestPath":"https://trends.google.com:443/trends/embed/"}); </script>
intro_intro_old.qmd:git config user.email "your_other@email"
intro_intro_slides.qmd:  - intro_slides.html
intro_intro_slides.qmd:frontpic: "img/git_graph.png"
intro_intro_slides.qmd:    logo: /img/logo_sfudrac.png
intro_logs.qmd:+df = pd.read_csv('../data/dataset.csv')
intro_tags.qmd:git tag J_Climate_2009
intro_tags.qmd:git show J_Climate_2009
intro_tags.qmd:git checkout J_Climate_2009
intro_time_travel.qmd:  - time_travel.html
top_intro.qmd:description: An introductory course to version control with &nbsp;[![](img/logo_git.png){width="1.3em" fig-alt="noshadow"}](https://git-scm.com/)
top_intro.qmd:[[Start course ➤](intro_intro.qmd)]{.topinline}
top_ws.qmd:[Searching a Git project](practice_repo/ws_search.qmd){.card-title-ws .stretched-link}
top_ws.qmd:[Collaborating through Git](ws_collab.qmd){.card-title-ws .stretched-link}
top_ws.qmd:[Contributing to projects](ws_contrib.qmd){.card-title-ws .stretched-link}
wb_dvc.qmd:[As DVC is a popular tool in machine learning, **please find this webinar [in the AI section](/ai/wb_dvc.html){.stretched-link}**.]{.btn-redirect}

As you can see, our results also include __init__ which is not what we were looking for. So let’s exclude __:

git grep -m 3 -e ".*_.*" --and --not -e "__"
Binary file img/01.png matches
Binary file img/02.png matches
Binary file img/03.png matches
Binary file img/04.png matches
Binary file img/05.png matches
Binary file img/06.png matches
Binary file img/07.png matches
Binary file img/08.png matches
Binary file img/09.png matches
Binary file img/10.png matches
Binary file img/11.png matches
Binary file img/12.png matches
Binary file img/13.png matches
Binary file img/14.png matches
Binary file img/15.png matches
Binary file img/16.png matches
Binary file img/17.png matches
Binary file img/18.png matches
Binary file img/19.png matches
Binary file img/20.png matches
Binary file img/21.png matches
Binary file img/22.png matches
Binary file img/23.png matches
Binary file img/24.png matches
Binary file img/25.png matches
Binary file img/26.png matches
Binary file img/27.png matches
Binary file img/28.png matches
Binary file img/29.png matches
Binary file img/30.png matches
Binary file img/31.png matches
Binary file img/32.png matches
Binary file img/33.png matches
Binary file img/34.png matches
Binary file img/35.png matches
Binary file img/36.png matches
Binary file img/37.png matches
Binary file img/51.png matches
Binary file img/52.png matches
Binary file img/53.png matches
Binary file img/collab.jpg matches
Binary file img/git_graph.png matches
Binary file img/gitout.png matches
Binary file img/logo_git.png matches
Binary file img/vc.jpg matches
index.qmd:  Version control & collaboration with &nbsp;[![](img/logo_git.png){width="1.3em" fig-alt="noshadow"}](https://git-scm.com/)
index.qmd:[Getting started with &nbsp;![](img/logo_git.png){width="1.2em" fig-alt="noshadow"}](top_intro.qmd){.card-title2 .stretched-link}
index.qmd:[Workshops](top_ws.qmd){.card-title2 .stretched-link}
intro_documentation.qmd:All these methods lead to the same thing: the manual page corresponding to the command will open in a pager (usually [less](https://en.wikipedia.org/wiki/Less_(Unix))). A pager is a program which makes it easier to read documents in the command line.
intro_first_steps.qmd:  - first_steps.html
intro_first_steps.qmd:wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1SJV5mRGexf91lNyFwdS_JmuAXX0xS4pE' -O project.zip
intro_first_steps.qmd:df = pd.read_csv('../data/dataset.csv')
intro_install.qmd:Git is built for Unix-like systems (Linux and MacOS). In order to use Git from the command line on Windows, you need a Unix shell such as [Bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)). To make this very easy, Git for Windows comes with its Bash emulator.
intro_install.qmd:git config user.email "your_other@email"
intro_intro.qmd:[Slides](intro_intro_slides.html){.btn .btn-outline-primary} [(Click and wait: the presentation might take a few instants to load)]{.inlinenote}
intro_intro_old.qmd:  - intro_old.html
intro_intro_old.qmd:<script type="text/javascript" src="https://ssl.gstatic.com/trends_nrtr/3045_RC01/embed_loader.js"></script> <script type="text/javascript"> trends.embed.renderExploreWidget("TIMESERIES", {"comparisonItem":[{"keyword":"/m/05vqwg","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/08441_","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/012ct9","geo":"","time":"2004-01-01 2022-10-03"},{"keyword":"/m/09d6g","geo":"","time":"2004-01-01 2022-10-03"}],"category":0,"property":""}, {"exploreQuery":"date=all&q=%2Fm%2F05vqwg,%2Fm%2F08441_,%2Fm%2F012ct9,%2Fm%2F09d6g","guestPath":"https://trends.google.com:443/trends/embed/"}); </script>
intro_intro_old.qmd:git config user.email "your_other@email"
intro_intro_slides.qmd:  - intro_slides.html
intro_intro_slides.qmd:frontpic: "img/git_graph.png"
intro_intro_slides.qmd:    logo: /img/logo_sfudrac.png
intro_logs.qmd:+df = pd.read_csv('../data/dataset.csv')
intro_tags.qmd:git tag J_Climate_2009
intro_tags.qmd:git show J_Climate_2009
intro_tags.qmd:git checkout J_Climate_2009
intro_time_travel.qmd:  - time_travel.html
top_intro.qmd:description: An introductory course to version control with &nbsp;[![](img/logo_git.png){width="1.3em" fig-alt="noshadow"}](https://git-scm.com/)
top_intro.qmd:[[Start course ➤](intro_intro.qmd)]{.topinline}
top_ws.qmd:[Searching a Git project](practice_repo/ws_search.qmd){.card-title-ws .stretched-link}
top_ws.qmd:[Collaborating through Git](ws_collab.qmd){.card-title-ws .stretched-link}
top_ws.qmd:[Contributing to projects](ws_contrib.qmd){.card-title-ws .stretched-link}
wb_dvc.qmd:[As DVC is a popular tool in machine learning, **please find this webinar [in the AI section](/ai/wb_dvc.html){.stretched-link}**.]{.btn-redirect}

For simple searches, you don’t have to use the -e flag before the pattern you are searching for. Here however, our command has gotten complex enough that we have to use it before each pattern.

Let’s make sure this worked as expected:

git grep -c ".*_.*"
echo "---"
git grep -c "__"
echo "---"
git grep -ce ".*_.*" --and --not -e "__"
img/01.png:16
img/02.png:32
img/03.png:31
img/04.png:26
img/05.png:31
img/06.png:32
img/07.png:30
img/08.png:34
img/09.png:35
img/10.png:41
img/11.png:47
img/12.png:40
img/13.png:39
img/14.png:32
img/15.png:38
img/16.png:43
img/17.png:34
img/18.png:35
img/19.png:30
img/20.png:33
img/21.png:40
img/22.png:41
img/23.png:47
img/24.png:64
img/25.png:66
img/26.png:50
img/27.png:60
img/28.png:57
img/29.png:33
img/30.png:39
img/31.png:14
img/32.png:16
img/33.png:18
img/34.png:16
img/35.png:20
img/36.png:18
img/37.png:18
img/51.png:55
img/52.png:46
img/53.png:55
img/collab.jpg:178
img/git_graph.png:121
img/gitout.png:42
img/logo_git.png:4
img/vc.jpg:259
index.qmd:4
intro_documentation.qmd:1
intro_first_steps.qmd:4
intro_install.qmd:2
intro_intro.qmd:1
intro_intro_old.qmd:8
intro_intro_slides.qmd:5
intro_logs.qmd:1
intro_tags.qmd:5
intro_time_travel.qmd:1
top_intro.qmd:2
top_ws.qmd:3
wb_dvc.qmd:1
---
img/01.png:1
img/02.png:2
img/03.png:2
img/04.png:1
img/05.png:3
img/06.png:3
img/07.png:1
img/08.png:1
img/09.png:1
img/10.png:1
img/11.png:1
img/12.png:1
img/13.png:2
img/14.png:2
img/15.png:3
img/16.png:1
img/17.png:1
img/18.png:2
img/19.png:1
img/20.png:1
img/21.png:1
img/22.png:2
img/23.png:4
img/24.png:2
img/25.png:1
img/26.png:1
img/27.png:2
img/28.png:3
img/29.png:2
img/30.png:1
img/31.png:1
img/51.png:1
img/52.png:2
img/53.png:1
img/collab.jpg:1
img/git_graph.png:2
img/gitout.png:1
---
img/01.png:15
img/02.png:30
img/03.png:29
img/04.png:25
img/05.png:28
img/06.png:29
img/07.png:29
img/08.png:33
img/09.png:34
img/10.png:40
img/11.png:46
img/12.png:39
img/13.png:37
img/14.png:30
img/15.png:35
img/16.png:42
img/17.png:33
img/18.png:33
img/19.png:29
img/20.png:32
img/21.png:39
img/22.png:39
img/23.png:43
img/24.png:62
img/25.png:65
img/26.png:49
img/27.png:58
img/28.png:54
img/29.png:31
img/30.png:38
img/31.png:13
img/32.png:16
img/33.png:18
img/34.png:16
img/35.png:20
img/36.png:18
img/37.png:18
img/51.png:54
img/52.png:44
img/53.png:54
img/collab.jpg:177
img/git_graph.png:119
img/gitout.png:41
img/logo_git.png:4
img/vc.jpg:259
index.qmd:4
intro_documentation.qmd:1
intro_first_steps.qmd:4
intro_install.qmd:2
intro_intro.qmd:1
intro_intro_old.qmd:8
intro_intro_slides.qmd:5
intro_logs.qmd:1
intro_tags.qmd:5
intro_time_travel.qmd:1
top_intro.qmd:2
top_ws.qmd:3
wb_dvc.qmd:1

There were 2 lines matching __ in src/test_manuel.py and we have indeed excluded them from our search.

Extended regular expressions are also covered with the flag -E.

Searching other trees

So far, we have searched the current version of tracked files, but we can just as easily search files at any commit.

Let’s search for test in the tracked files 20 commits ago:

git grep test HEAD~20
HEAD~20:intro_aliases.qmd:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
HEAD~20:intro_aliases.qmd:from the *"Searching a Git project"* section below will search for the string "test" in all previous
HEAD~20:intro_aliases.qmd:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
HEAD~20:intro_aliases.qmd:git search test
HEAD~20:intro_aliases.qmd:should search the entire current Git project history for "test".
HEAD~20:intro_branches.qmd:git branch test
HEAD~20:intro_branches.qmd:git switch test
HEAD~20:intro_branches.qmd:* test
HEAD~20:intro_branches.qmd:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
HEAD~20:intro_branches.qmd:git diff main test
HEAD~20:intro_branches.qmd:When you are happy with the changes you made on your test branch, you can merge it into `main`.
HEAD~20:intro_branches.qmd:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
HEAD~20:intro_branches.qmd:git merge test
HEAD~20:intro_branches.qmd:Then, usually, you delete the branch `test` as it has served its purpose:
HEAD~20:intro_branches.qmd:git branch -d test
HEAD~20:intro_branches.qmd:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
HEAD~20:intro_branches.qmd:Let's go back to our situation before we created the branch `test`:
HEAD~20:intro_branches.qmd:This time, you create a branch called `test2`:
HEAD~20:intro_branches.qmd:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
HEAD~20:intro_branches.qmd:git merge test2
HEAD~20:intro_branches.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
HEAD~20:intro_branches.qmd:>>>>>>> test2
HEAD~20:intro_intro_old.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
HEAD~20:intro_intro_old.qmd:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
HEAD~20:intro_intro_old.qmd:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
HEAD~20:intro_intro_old.qmd:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
HEAD~20:intro_intro_old.qmd:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
HEAD~20:intro_intro_old.qmd:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
HEAD~20:intro_intro_old.qmd:You create a test branch and switch to it:
HEAD~20:intro_intro_old.qmd:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
HEAD~20:intro_intro_old.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
HEAD~20:intro_remotes.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.
HEAD~20:intro_revisiting_old_commits_alternate.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
HEAD~20:intro_undo.qmd:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:
HEAD~20:ws_collab.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

As you can see, the file src/test_manuel.py is not in the results. Either it didn’t exist or it didn’t have the word test at that commit.

If you want to search tracked files AND untracked files, you need to use the --untracked flag.

Let’s create a new (thus untracked) file with some content including the word test:

echo "This is a test" > newfile

Now compare the following:

git grep -c test
intro_aliases.qmd:5
intro_branches.qmd:17
intro_intro_old.qmd:9
intro_remotes.qmd:1
intro_revisiting_old_commits_alternate.qmd:1
intro_undo.qmd:1
ws_collab.qmd:1

with:

git grep -c --untracked test
index.html:1
intro_aliases.html:4
intro_aliases.qmd:5
intro_branches.html:18
intro_branches.qmd:17
intro_changes.html:1
intro_documentation.html:1
intro_first_steps.html:1
intro_ignore.html:1
intro_install.html:1
intro_intro.html:1
intro_intro_old.qmd:9
intro_intro_slides.html:18
intro_logs.html:1
intro_remotes.html:2
intro_remotes.qmd:1
intro_reset.html:1
intro_resources.html:1
intro_revisiting_old_commits_alternate.html:2
intro_revisiting_old_commits_alternate.qmd:1
intro_stash.html:1
intro_tags.html:1
intro_three_trees.html:1
intro_time_travel.html:1
intro_tools.html:1
intro_undo.html:2
intro_undo.qmd:1
newfile:1
top_intro.html:1
top_ws.html:1
wb_dvc.html:1
ws_collab.html:2
ws_collab.qmd:1
ws_contrib.html:1
ws_search.rmarkdown:41

This last result also returned our untracked file newfile.

If you want to search untracked and ignored files (meaning all your files), use the flags --untracked --no-exclude-standard.

Let’s see what the .gitignore file contains:

cat .gitignore
cat: .gitignore: No such file or directory

The directory data is in .gitignore. This means that it is not under version control and it thus doesn’t exist in our repo (since we cloned our repo, we only have the version-controlled files). Let’s create it:

mkdir data

Now, let’s create a file in it that contains test:

echo "And another test" > data/file
bash: line 1: data/file: No such file or directory

We can rerun our previous two searches to verify that files excluded from version control are not searched:

git grep -c test
intro_aliases.qmd:5
intro_branches.qmd:17
intro_intro_old.qmd:9
intro_remotes.qmd:1
intro_revisiting_old_commits_alternate.qmd:1
intro_undo.qmd:1
ws_collab.qmd:1
git grep -c --untracked test
index.html:1
intro_aliases.html:4
intro_aliases.qmd:5
intro_branches.html:18
intro_branches.qmd:17
intro_changes.html:1
intro_documentation.html:1
intro_first_steps.html:1
intro_ignore.html:1
intro_install.html:1
intro_intro.html:1
intro_intro_old.qmd:9
intro_intro_slides.html:18
intro_logs.html:1
intro_remotes.html:2
intro_remotes.qmd:1
intro_reset.html:1
intro_resources.html:1
intro_revisiting_old_commits_alternate.html:2
intro_revisiting_old_commits_alternate.qmd:1
intro_stash.html:1
intro_tags.html:1
intro_three_trees.html:1
intro_time_travel.html:1
intro_tools.html:1
intro_undo.html:2
intro_undo.qmd:1
newfile:1
top_intro.html:1
top_ws.html:1
wb_dvc.html:1
ws_collab.html:2
ws_collab.qmd:1
ws_contrib.html:1
ws_search.rmarkdown:41

And now, let’s try:

git grep -c --untracked --no-exclude-standard test
index.html:1
intro_aliases.html:4
intro_aliases.qmd:5
intro_branches.html:18
intro_branches.qmd:17
intro_changes.html:1
intro_documentation.html:1
intro_first_steps.html:1
intro_ignore.html:1
intro_install.html:1
intro_intro.html:1
intro_intro_old.qmd:9
intro_intro_slides.html:18
intro_logs.html:1
intro_remotes.html:2
intro_remotes.qmd:1
intro_reset.html:1
intro_resources.html:1
intro_revisiting_old_commits_alternate.html:2
intro_revisiting_old_commits_alternate.qmd:1
intro_stash.html:1
intro_tags.html:1
intro_three_trees.html:1
intro_time_travel.html:1
intro_tools.html:1
intro_undo.html:2
intro_undo.qmd:1
newfile:1
top_intro.html:1
top_ws.html:1
wb_dvc.html:1
ws_collab.html:2
ws_collab.qmd:1
ws_contrib.html:1
ws_search.rmarkdown:41

data/file, despite being excluded from version control, is also searched.

Searching all commits

We saw that git grep <pattern> <commit> can search a pattern in any commit. Now, what if we all to search all commits for a pattern?

For this, we pass the expression $(git rev-list --all) in lieu of <commit>.

git rev-list --all creates a list of all the commits in a way that can be used as an argument to other functions. The $() allows to run the expression inside it and pass the result as and argument.

To search for test in all the commits, we thus run:

git grep "test" $(git rev-list --all)

I am not running this command has it has a huge output. Instead, I will limit the search to the last two commits:

git grep "test" $(git rev-list --all -2)
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_aliases.qmd:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_aliases.qmd:from the *"Searching a Git project"* section below will search for the string "test" in all previous
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_aliases.qmd:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_aliases.qmd:git search test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_aliases.qmd:should search the entire current Git project history for "test".
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git branch test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git switch test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:* test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git diff main test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:When you are happy with the changes you made on your test branch, you can merge it into `main`.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git merge test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:Then, usually, you delete the branch `test` as it has served its purpose:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git branch -d test
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:Let's go back to our situation before we created the branch `test`:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:This time, you create a branch called `test2`:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:git merge test2
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_branches.qmd:>>>>>>> test2
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:You create a test branch and switch to it:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_intro_old.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_remotes.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_revisiting_old_commits_alternate.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:intro_undo.qmd:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:
f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce:ws_collab.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_aliases.qmd:Now, let's build an alias for a more complex command: `git grep "test" $(git rev-list --all)`. This example
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_aliases.qmd:from the *"Searching a Git project"* section below will search for the string "test" in all previous
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_aliases.qmd:commits. There are two problems with this command: (1) it takes an argument (the string "test"), and (2) it
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_aliases.qmd:git search test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_aliases.qmd:should search the entire current Git project history for "test".
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git branch test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git switch test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:* test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:The `*` shows the branch you are currently on (i.e. the branch to which `HEAD` points to). In our example, the project has two branches and we are on the branch `test`.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git diff main test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:When you are happy with the changes you made on your test branch, you can merge it into `main`.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:If you have only created new commits on the branch `test`, the merge is called a "fast-forward merge" because `main` and `test` have not diverged: it is simply a question of having `main` catch up to `test`.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git merge test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:Then, usually, you delete the branch `test` as it has served its purpose:
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git branch -d test
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:Alternatively, you can switch back to `test` and do the next bit of experimental work on it. This allows to keep `main` free of mishaps and bad developments.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:Let's go back to our situation before we created the branch `test`:
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:This time, you create a branch called `test2`:
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:To merge your branch `test2` into `main`, a new commit is now required. Git will create this new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge:
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:git merge test2
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_branches.qmd:>>>>>>> test2
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:Instead of working on your branch `main`, you create a test branch and work on it (so `HEAD` is on the branch `test` and both move along as you create commits):
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:When you are happy with the changes you made on your test branch, you decide to merge `main` onto it.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:Then you do the fast-forward merge from `main` onto `test` (so `main` catches up to `test`):
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:Then, usually, you delete the branch `test` as it has served its purpose (with `git branch -d test`). Alternatively, you can switch back to it and do the next bit of experimental work in it.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:This allows to keep `main` free of possible mishaps and bad developments (if you aren't happy with the work you did on your test branch, you can simply delete it and Git will clean the commits that are on it but not on `main` during the next garbage collection.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:You create a test branch and switch to it:
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:To merge your main branch and your test branch, a new commit is now required (note that the command is the same as in the case of a fast-forward merge: `git merge`. Git will create the new commit automatically. As long as there is no conflict, it is just as easy as a fast-forward merge. We will talk later about resolving conflicts).
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_intro_old.qmd:After which, you can delete the (now useless) test branch (with `git branch -d test2`):
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_remotes.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_revisiting_old_commits_alternate.qmd:The pointer `HEAD`, which normally points to the branch `main` which itself points to latest commit, can be moved around. By moving `HEAD` to any commit, you can revisit the state of your project at that particular version.
397ef976e18724c06713ffbf7ebe205b7016a35f:intro_undo.qmd:Here is a common scenario: you make a commit, then realize that you forgot to include some changes in that commit; or you aren't happy with the commit message; or both. You can edit your latest commit with the `--amend` flag:
397ef976e18724c06713ffbf7ebe205b7016a35f:ws_collab.qmd:Click on the `Code` green drop-down button, select SSH [if you have set SSH for your GitHub account](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh) or HTTPS and copy the address.

In combination with the fuzzy finder tool fzf, this can make finding a particular commit extremely easy.

For instance, the code below allows you to dynamically search in the result through incremental completion:

git grep "test" $(git rev-list --all) | fzf --cycle -i -e

Or even better, you can automatically copy the short form of the hash of the selected commit to clipboard so that you can use it with git show, git checkout, etc.:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e |
    cut -c 1-7 |
    xclip -r -selection clipboard

Here, I am using xclip to copy to the clipboard as I am on Linux. Depending on your OS you might need to use a different tool.

Of course, you can create a function in your .bashrc file with such code so that you wouldn’t have to type it each time:

grep_all_commits () {
    git grep "$1" $(git rev-list --all) |
        fzf --cycle -i -e |
        cut -c 1-7 |
        xclip -r -selection clipboard
}

Alternatively, you can pass the result directly into whatever git command you want to use that commit for.

Here is an example with git show:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e |
    cut -c 1-7 |
    git show

And if you wanted to get really fancy, you could go with:

git grep "test" $(git rev-list --all) |
    fzf --cycle -i -e --no-multi \
        --ansi --preview="$_viewGitLogLine" \
        --header "enter: view, C-c: copy hash" \
        --bind "enter:execute:$_viewGitLogLine | less -R" \
        --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"

Wrapped in a function:

grep_all_commits_preview () {
    git grep "$1" $(git rev-list --all) |
        fzf --cycle -i -e --no-multi \
            --ansi --preview="$_viewGitLogLine" \
            --header "enter: view, C-c: copy hash" \
            --bind "enter:execute:$_viewGitLogLine |
              less -R" \
            --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"
}

This last function allows you to search through all the results in an incremental fashion while displaying a preview of the selected diff (the changes made at that particular commit). If you want to see more of the diff than the preview displays, press <enter> (then q to quit the pager), if you want to copy the hash of a commit, press C-c (Control + c).

With this function, you can now instantly get a preview of the changes made to any line containing an expression for any file, at any commit, and copy the hash of the selected commit. This is really powerful.

Aliases

If you don’t want to type a series of flags all the time, you can configure aliases for Git. For instance, Alex Razoumov uses the alias git search for git grep --break --heading -n -i.

Let’s add to it the -p flag. Here is how you would set this alias:

git config --global alias.search 'grep --break --heading -n -i -p'

This setting gets added to your main Git configuration file (on Linux, by default, at ~/.gitconfig).

From there on, you can use your alias with:

git search test
git: 'search' is not a git command. See 'git --help'.

Searching logs

The second thing that can happen is that you are looking for some pattern in your version control logs.

git log

git log allows to get information on commit logs.

By default, it outputs all the commits of the current branch.

Let’s show the logs of the last 3 commits:

git log -3
commit f1802fb9273fdbaad5fa0f1381ff8b18a84a15ce
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Wed Apr 24 12:26:52 2024 -0700

    update site

commit 397ef976e18724c06713ffbf7ebe205b7016a35f
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Wed Apr 24 12:26:43 2024 -0700

    styles: improve callouts

commit e24aebab9ca82555effde05942503ba677df36e3
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Wed Apr 24 12:25:09 2024 -0700

    minor edits numpy

The output can be customized thanks to a plethora of options.

For instance, here are the logs of the last 15 commits, in a graph, with one line per commit:

git log --graph --oneline -n 15
* f1802fb9 update site
* 397ef976 styles: improve callouts
* e24aebab minor edits numpy
* 212577bc minor edit benchmark
* 84337761 update site
* b44b9816 big improvements benchmark
* 45cd8e29 update site
* 4d9e0e27 correct static function explanation + make graph bigger
* 949277e2 jx numpy: change order headers
* e0ccfb29 update site
* e79d100d edit jax parallel
* 5e7e0a0b update site
* 7b202e6f small edits to jax parallel
* 66f50cb4 add jax parallel to navbar
* c9d7da6e update site

But git log has also flags that allow to search for patterns.

Searching commit messages

One of the reasons it is so important to write informative commit messages is that they are key to finding commits later on.

To look for a pattern in all your commit messages, use git log --grep=<pattern>.

Let’s look for test in the commit messages and limit the output to 3 commits:

git log --grep=test -3
commit 6f07fd90be8045378b482d5ca0175446b42797c8
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Tue Feb 27 20:32:43 2024 -0800

    add test csv data file into the site

commit 7167606e3188e9497768761963af0c4bdc7aad90
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Tue Feb 27 20:16:20 2024 -0800

    add test csv data file

commit 87f11c6715a5da31888dc6b92645156e6738d207
Author: Marie-Helene Burle <marie.burle@westdri.ca>
Date:   Mon Dec 18 14:06:51 2023 -0800

    test blockquote media for phones

For a more compact output:

git log --grep="test" -3 --oneline
6f07fd90 add test csv data file into the site
7167606e add test csv data file
87f11c67 test blockquote media for phones

Here too you can use this in combination to fzf with for instance:

git log --grep="test" | fzf --cycle -i -e

Or:

git log --grep="test" --oneline |
    fzf --cycle -i -e --no-multi \
        --ansi --preview="$_viewGitLogLine" \
        --header "enter: view, C-c: copy hash" \
        --bind "enter:execute:$_viewGitLogLine | less -R" \
        --bind "ctrl-c:execute:$_gitLogLineToHash |
        xclip -r -selection clipboard"

Changes made to a pattern

Remember that test was present in the file src/test_manuel.py. If we want to see when the pattern was first created and then each time it was modified, we use the -L flag in this fashion:

git log -L :<pattern>:file

In our case:

git log -L :test:src/test_manuel.py
fatal: There is no path git/src/test_manuel.py in the commit

This is very useful if you want to see, for instance, changes made to a function in a script.

Changes in number of occurrences of a pattern

Now, if we want to list all commits that created a change in the number of occurrences of test in our project, we run:

git log -S test --oneline
84337761 update site
45cd8e29 update site
93bb9017 add jax benchmark section
112a5403 update site
1c85d9fe edits to install
5a3afe3a jax parallel: remove asynchronous dispatch moved to benchmarking section
7da77a9a update site
f0f936cc finish jax webinar slides presentation and embed resources
a847af69 update site
cd5ac347 update site after embedding resources JAX webinar slides
1639488f update site
e5b6373a update site
fc1e5925 improve jax jit
c8da0b3d update site
d395fe8b add intro to ipython + better formatting with tabset (rather than columns)
b1e330ba add JAX section on jobs
67a51585 add JAX section on installation
361c27bf update title and abstract jax webinar
eba13650 update title and abstract jax course
f84c38bc update site
8877228e update site
cd4d8869 update site
4a6b96cb update site
b3e68d9b update site
f9abd0f1 update site
f162c6a9 update site
b0f28e7b update site
cc696002 update site
7f4d8801 update site
373d2f49 add content in intro ml nlp slides
93503a04 update site after full render
0fa44f29 update site
ed3df92b update site
8976cb6a update site
7b1d418e update site
c813fca7 update site
4c2cca59 update site
4ec83107 update
289d7eee update
2e9865d5 update
db5eebf7 update
c59f0926 update
84016b9c update
62af7b93 update
88bcbaba update
428ee1db update
97cf5731 update
27bb78e3 add qmd files from molecules
8e70f8b6 update site
e395a204 update
c0a93fbd add prefix (intro_, ws_, wb_) to bash, git, and tools sections
63ba55fc update
dbded239 update
1ca5c158 update
79ebb58b minor fixes dvc slides
ac8afe70 update
525ca03c dvc slides before another big change
d287f33c update
a6dc00eb update
a4e14d34 update
91d6403f update
a38386a2 update
413a323d update
e126a321 update after render
fee353f5 update
e934a832 update site
7e6fe93b update site after render
10277778 update
e2640dc6 update
69ab00ba update site
0cd6525f update site
484054e8 update site
7d4a19bb finish draft stateless
f4bae193 update site
c2aadc75 minor edits
e9f983a8 update site
8aa9f4e5 finish quick draft of parallel section
719274ed update site
b648de49 combine datasets loading from 3 options in one section
ba698653 update site
169845ab version with: loading datasets with Hugging Face
5376cca0 update site
7aaf321f add state draft
4d98a5b0 update site after full render
4fbc9736 update site
fbad50d0 update site
7c19d98d update site
92dad162 update site
4a5ef1ea update site
f763bc1a update site
3b955ea5 update site
27e1a097 update site
3158ff17 update site
7c132d5e update site
141204ed add flax abstract
52152ef3 update JAX abstract after removing dl part with flax
8f250d57 update site
255b3205 update freeze
b61b6df9 update site
99a77db3 finish jit draft
57d56299 update site
d7e0bcb9 finish jx numpy section
d344ad8a update site
858e5422 JAX: big revamp course structure
62c400b2 update site
1b2e846f jx principles: add async dispatch
8c8116e0 update site
09080851 update site
8f32ea61 update site
cd78f159 udpate site
4a003c52 update site
4b3bec75 update site
e8e73865 update site
c545c684 update site
0b628db2 finish hpc data partition chapter
1a5c59f7 save a version of hpc_partition before modifying it
08552b77 update site
c7fccd5b update site
d85a0541 edit running htop on local machine
b5fa9d35 update site
acb1a4ff rename and heavily modify the foreach chapter
4536fc63 add pics
7c318e74 f4 and f5: do testing on the cluster and remove quarto comment
2aa72b9b update site
22192dbf add example releasing memory
08d8f896 add jx profiling page
a841844f improve hpc optimization
bde35ec8 improve intro indexing
f1780432 jx why: replace gtrend embedded by img (keeps breaking), improve graph color, add abstract, add a bit of content
78c3b7b9 add colors to jax intro diagrams
fb5662b6 add a number of jx early drafts
c54f6ed7 add new jx sections and update site using a virtual env for Python
dbcfca2a add prefix for various sections in julia and r, prevent old and bk files from being executed/rendered, prevent webscraping files from being created
43b79abd update site
64ddfd7d small edits to hpc r before course
9204ff32 add jax top intro
09f64aee add info on profiling
0cce7374 move jax below PyTorch as it is a more advanced course
f33135e1 update gcc and r module versions
96c23b77 update site
279f6937 update site after full render
084d5221 embed resources
ae9b67be update site
4e9c611e more improvements and little tweaks frameworks slides. Add more info
d58ae469 update site
445bb296 improvements frameworks slides
21882057 update site
fd17244c update formatting framework slides
185e051a update site
bead0b06 many tweaks of formatting
c68c4611 first very rough draft of frameworks slides
09f1fe51 replace mermaid diagram in file system exercise with a graphviz one
c6d4a350 filesystem: add exercise with a diagram
2798c6a1 minor fix
7a08552d update skl workflow
ac76eea3 update site
8a504230 add sklearn workflow
ea800fde add an sklearn serie
c7ac7301 update site
ccf1a85c added content to aliases.qmd
f17c8c95 added aliases.qmd
dc7aae4e update site
74a4e08b finish logs
0ffe3e1a add logs draft
23827256 add project.zip
c8afe95a add downloadthis extension
fe3f956d add abstract to documentation section
bb145734 edits intro slides
40571641 embed resources intro slides
ffa2c749 total revamp of git intro with simply link to slides
28032012 update ml course
043c6cf4 minor edits r course
4888db47 edit sections on how to run r
e16c9467 update site
44ceba8e update site
0ece9c2e update site
51b8534e replace old webscraping (Python) workshop to new version from DHSI
fecc161b add webscraping old to gitignore
4bb8faf0 little improvements web scraping R
b784f17d update site
91fa0247 update site
3bbe8ac5 add (bad) intro blurb to Python course
3a3361af update site
f589a660 rename the ext section into talks
73dfeda0 improvements collections section
6617a5bc update site
622631fc fix and improve pandas section
3eaacd19 update site
f8c47d0a add index for new big section (talks)
e29ae75f update site
9e6a1b8a edit scripts
0f60d0db finish redirections
583f27c1 add filesystem section
a7c08558 gis slides: fix typo
fec32a19 makie: add content in html below video
954eb10d makie slides: minor improvements
90adee86 more info in workflow section
2250d59a minor edits workflow
39f737d4 add workflow section
2cd8c6f3 update navbar by moving data, model, and training in a single section
5f33a182 make backup of autograd in autograd_old and start to make new version of autograd (not complete)
fc853e13 some edits to training, but still not complete
01f0f732 finish tensor section
24b059f9 finalize parallel loops
e5401892 again many changes to parallel loops before changing yet again
1428b5fa many changes to parallel loops before making yet many new alternative changes
6abe68e0 move copy on modify from basics to indexing and make it better
fd1d1408 update site
0649af93 move concepts to reading and create a new intro section with slides
ba7938d1 update site
2525e9e8 finish function section
b71c7614 finish control flow section
7f226584 finalize plotting section
302627a3 add plotting section
a477bd4d add publishing page with links to quarto workshop and webinar
a30a0759 add data structures section
c5a92ed6 add blurb basics
68489a39 basics: change title + move a lot of content into section specific pages
58e586ec packages: add blurb
848f4362 update navbar
b7284048 minor edits bash intro
075dd527 update site
a519d857 create wildcards section
42f51d4a rename file from search to find
7df872bf add videos: 4 workshops for HSS series + staff to staff webinar + regular webinar
b2e565c7 update site
590bb505 quarto: add installation links
eca80cdb make slides less wordy based on the s2s webinar given on quarto
43d1ea7f update site
f424bba7 add slides for quarto staff to staff webinar
1f33a777 update site
99c7e56f add new minor optimization
dd71a56a turn the parallel loop lesson from the webinar version to the workshop version using batch jobs
07871c1f add 2nd optimization by louis
55e7c61a update site
abb1ced7 update site
f758b8c1 update site
61f3b8ec add function suggested by workshop participant
649f8258 update site
bafb7696 remove profiler from performance section
176e5ba4 finish optimization section
3d42df1c add section on memory
1ba9cc4a important commit: remove "avoid type conversions" in R hpc optimizations section as this doesn't change the timing consistently
b64856dd first draft optimizations
a563fa97 re-render site
95086e88 update site
473d722b many edits r resource page
e0ab5c78 tweak all heading levels
934f9a86 update site
37cef298 remove front page for workshops and webinars + add logo image for front page for intro and hpc courses + re-shuffle a few sections + move most ml topics into a course + minor edits (abstract, etc.)
fc82cf5d update site
3bcf2df8 rename first git section of git course to match structure of other topics
628525d5 fix how to download bash data
952d027c create front pages for Bash and cards on main topic page
c6ea3e02 add buttons to r main page
f3e70692 several improvements to web scraping
c5258d5f split parallel r section into 3 section and add improvements and edits
046b607d minor fixes hss slides
cda4fc9d add missing image and very minor edits ml hss slides
b9f8b5f5 embed resources intro hss ml slides
898cdd93 update site
4c07296e add intro ml for hss slides
38a825da many formatting edits all reveal.js presentations
8688a717 replace workshop by webinar in all webinars
20224c1e update site
50fcdc72 finish script section (shortened. Need to add more content)
5ef06d8d finish function section
bf2653fa move control flow, script, and search to molecules folder
bad3da30 transfer: add globus and abstract
e7412c33 finish redirections and move it to the molecules repo to run code
23bed49d add html_children
9ff42fbe add html section
9c2f06ef add delay at each iteration to reduce risk of being blocked
246c64d1 web scraping: minor edits and improvements
0d08979d add explanations and comments web scraping
4599ed35 disable cache for webscraping as it conflicts with rvest
2703353f minor edit nav titles
5cd69306 change rstudio server time to 1.5h and remove jh option image as it is not the right one
8159f9b2 minor edits: add some explanations, improve code a bit
b78b4b6d add first decent draft webscraping with more or less all code and some explanations
1c251034 add alliance wiki page for r in intro hss resources
cf851391 first draft bash redirections and pipes
03b9e332 first draft bash script
20a392b6 add a little content to intro r
5c3cfb03 add 2 new sections (not covered by alex)
fb69b01d add draft content to intro hss r
6c6d0cf3 add bash empty chapters for online course
b900e688 intro hpc slides by re-embeding resources
66bca7e2 fix typo git search
d72b1706 finish hpc r slides
1f14f6be open link to hpc slides
95f20745 finish control flow chapter
12d248d8 add many little things in list and make a correction for strings
c9d2e1dd update site
a65bd054 remove out of the package section everything that can live elsewhere
173d128f move content jh instructions to a new chapter on running Python
b259e525 remove alex acknowledgement
b4b87e69 minor edit git front page
bb2c6fe2 add acknowledgement of alex content
dd132ab7 supress redundancy between basic and functions
6f2f0ae4 finish list section (Python)
db33cd1f finish basic chapter (Python)
7ac57447 first draft collections
04db2fcc edit basics
fd677ff1 add first draft Python intro hpc copied from Julia. Still needs lots of work
7464d141 add first draft Python functions (not far from ready)
1ccb1569 add first draft web scraping in Python
663e401f yml: uncomment Python tab and add first two Python workshops
4450be7c add alias
479f1aa6 many small edits search + add fzf example for searching the logs
2556ab4e update site
c53e0f51 add a big info block with more fancy searches using fzf
1752db62 first final version of search workshop
fab10006 minor updates hpc r slides
534a0da6 first draft hpc in r slides
2f647bd8 remove unnecessary jupyter: python3 in 2 ml revealjs
919d2061 git lessons: adjust img new names + fixes, corrections, improvements
7a3d5b13 rename all git diagram img in some sensible order
16ca4f6a finish branches and add it to the nav
2725f415 improve front page image
115aa01f edits many julia files: remove unnecessary jupyter: julia-1.8, small additions, small fixes, small improvements
9db0e744 add preliminary draft of branches (git) and commented out nav entry
4ca06a6a add distributed (julia)
a181d66f add symlink for search.qmd which is in the nested git repo
bdb30cb3 edit control flow
da7332f8 add remotes
774a9d45 rename git main workshop
8f8eb031 add tags and its img
c1d8385f add multithreading
a31c5aad edits, additions, fixes
2ba97085 remove from basics elements moved to other files
8472020c remove from intro hpc everything that is intro julia (move it in various other sections)
a8f09800 turn arrays to collections and add content
0cc0c59a add julia functions
1d65319d add julia control flow
5ec2bccf add julia basics
22494671 add julia types
4b09f00e add julia performance
55813908 add non interactive julia
f9dd47ca add julia arrays
ea059d51 edit paths and remove shadow img three trees
38e370fe add undo
29eb1ac1 add tools
81fa8648 add stashing
73ee1b14 add ignore
d8970df8 add three trees of git
8148a866 move all top levels to h2 instead of h1 following chat with quarto developpers
4637b686 julia intro: change header levels + reformat all code blocks
c072c6c9 add packages
1601205a add r resources
3e81b34a add contrib workshop
5cc2efa0 move section about collaboration through git to new workshop
96108ed4 uncomment grid section with wider body width
cddb972c add ml hpc.qmd
6fa885a4 add quarto link to about
08e488af make all page start with h1 instead of h2 + add author where missing + move intro to def block
f09e60bf update site
9f97ec82 add 5 new ml workshops
50a465a0 add note about revealjs presentations slow to load in all links
2a6a3faf improvement to mnist: small additions + run code
e2056339 add choosing frameworks
2e281131 add concept workshop in ml
68183ee7 add autograd to workshops
8e6c61c4 add mnist to workshops
e8b21668 add intro scripting workshop
6e5faa1f add julia intro hpc workshop
2ab3deb6 add julia covid plotting
7e19653b add torchtensor slides
515a4ddb fix R logo (not good on light bg) and add logos for all other sections
6a781ea0 rename ml intro hss
06496189 finish formatting upscaling slides
44bc8530 reduce high res pics upscaling because GitHub's limit is reach with revealjs with self embedded option
7989e1a5 add gis mapping slides
0454b612 add upscaling slides
d9e4e106 turn link to slides into buttons
82356532 add _site to vc to solve publish issue on GitHub
eb7f3b17 delete publish.yml for GitHub actions
82781458 update freeze
f7f08ca0 update freeze
74174941 update freeze
ccc30a73 update freeze
9ef79da4 update freeze
f23a4123 remove in code lengthy comment and add note instead
b75bf0d5 add custom title-slide.html with partial template for revealjs title slide
16ebe722 re-add makie slides *after* having rebuilt the site with freeze true
348b7acb remove makie slides
f131770c makie webinar: re-add slides
36539180 remove makie slides for now for gh actions to build
ecda9e1d update freeze with julia makie slides
ff2ec56b add makie slides
6d406585 front page: switch buttons to cards and readjust content accordingly
fc203393 update freeze
4ceac573 add outputs of quarto demos so as not to have to run them all the time (annoying with latex). works with blocking rendering of that dir in yml
0bbc3117 update freeze with computations from r the basics
71c4ab68 add r the basics from autumn school 22 to r workshops
b2e73700 add all quarto example files
358bfb88 add quarto webinar
818cb8c7 first commit with _freeze (for the quarto examples)
8b21abf0 add all ml webinars
feafdc57 front page: finalize title and add aside about main site
13dad000 big changes to about page
3dd050c0 add publish.yml file for GitHub actions
b6fc959e add 2022_git_sfu.qmd

This can be useful to identify the commit you need.

TL;DR

Here are the search functions you are the most likely to use:

  • Search for a pattern in the current version of your tracked files:
git grep <pattern>
  • Search for a pattern in your files at a certain commit:
git grep <pattern> <commit>
  • Search for a pattern in your files in all the commits:
git grep <pattern> $(git rev-list --all)
  • Search for a pattern in your commit messages:
git log --grep=<pattern>

Now you should be able to find pretty much anything in your projects and their histories.