Git is an awesome tool 😄
Table of Contents generated with DocToc
Lost files could be recovered from the server.
Because the code is not located only locally on the computer, but also on a server, it can be accessed from anywhere in the world.
Snapshots of a project are stored with an easy access.
Recorded snapshots make it easy to undo mistakes and go back to a working version.
A bug can be tracked back to its origin by finding out exactly when it started occurring. Git has a cool tool for binary search over commits (git bisect
).
- Many developers can work on the same code.
- Several features could be worked on simultaneously, and switched between
- An idea can be easily discarded if it turns out to be a bad one
Let's start with the definition of some important terms that are used to create the Git world.
A repository ("repo") is a directory that stores the content of a project, and some additional metadata. The repo stores the files themselves, the versions of those files, commits, deletions, and more.
A fork is a copy of a repository at a certain point at time.
The conventional name for the primary version of a repository, in the remote server.
A branch is a version of the repository that diverges from the main working project.
A commit is a snapshot of the repo at a certain time.
An upstream branch is the branch tracked on the remote repository by the local branch
At some point we do want to update code bases between branches, this is called merging in git terminology. Git tracks changes per line, and compares three versions when merging to track what should be in the merged version. (@TODO: what are the names of the different versions, I always confuse BASE and the other ones, Also, maybe add screenshot of managing conflicts) It'll try to do the most merges automatically, but will present the user with conflicts when two versions of the file have changed the same line. This is called conflict, and need to be resolved manually.
There are several types of merging, as elaborated below.
A PR is a request for merging all commits from one branch to another, enabling multiple colaborators to discuss the proposed changes before integrating them into the official project. While this could be done between any two branches, the term is generally used when finishing a feature and merging that branch upstream.
Merging from an upstream branch into a feature branch. This is usally done when another feature has finished developement while working on the current feature.
There are two ways to work with Github - HTTP and SSH. Working with HTTP was probably simpler until Github changed the privacy conditions and decided to add personal access tokens with an expiration date. An explanation about token creation could be found here. The recommended way is using SSH. It is easy to configure, and once you counfigure it you don't need to think about it anymore. To use SSH you need to generate a key, and add it to GitHub, as explained here.
GitHub has a strict file size limit of 100MB (Files that you add to a repository via a browser are limited to 25MB per file. You can add larger files, up to 100MB each, via the command line). This is not an issue if you are just uploading lines of codes, but you shouldn't upload data and binaries to GitHub (there are some ways to overcome this restriction, but it is considered bad practice).
In addition, git stores incremental changes of files, and is great for comparing and tracking changes line-by-line. This doesn't work as effectively for binaries or trained models, etc. For that purpose, we should use an gdrive, dropbox, etc.
Sometimes you may not want to track all your files with Git. This includes some local caches, IDE automatically generated files, some local configurations etc.
Configuring a .gitignore
makes Git ignore these files and not showing them during staging. GitHub has a feature that generates a .gitignore
when creating a repo.
A readme file is the place to write some information about the project using a free language. These include installation instructions, usage instructions, and any other useful information for someone that views the repo without any previous knowledge. GitHub knows to create a fun preview from this file. The file itself is written in Markdown.
TODO
When working on a project with Git, you want to follow a workflow that will help you to enjoy your work:
-
[RECOMMENDED][ONCE] When working on an opensource project, fork it. Especially recommended if there's any chance you'd like to change something in the codebase.
-
[ONCE] Clone the repo
-
Never commit to
master
- always checkout to another branch before you start writting code. It's recommended to choose meaningful names for branches, as the name of the PR will be the same as the branch name. The convention is<purpose>/<name>
, for examplefeature/<name>
. -
[OPT] If you want to work on another feature based on the current one, before merging it to master, checkout to another branch from the current one.
-
After every change you make or at the end of a work day, commit your changes. Always use a meaningful commit message.
-
After every commit, push your changes to the branch you're working on in the origin to backup your work. Pushing shouldn't be something dangerous, or that has side-effects implications. If it does in your setup, you should revisit your configuration.
-
If you have tests, make sure to run them.
-
When you finish working on a feature, open a pull request to
master
. If you are working with someone else on the project, ask for a review and fix the code according to the comments before merging. -
[OPT] If you have conflicts, pull the changes from the branch you want to merge to, and solve them. Then commit the changes.
-
Merge the PR, checkout to
master
, pull the changes, and so on (back to step 3 - checkout to a new branch for the next feature). -
Keep a clean
git status
at all times. Don't conform with having tons of files in there that you don't remember where they belong or where they came from. Maintain a strong.gitignore
file to achieve this.git status
should clearly tell you what you're working on right now, and should be clean after every commit and push. -
[TIP] Use a git-aware shell (e.g., fish), to display the branch that you're on and whether you have untracked changes, etc.
-
[TIP] If you switch a computer and want to fetch some new branches from the origin, use
git pull
. Then you may checkout to the branch you worked on before, and continue from the same place.
Git has commands that let you mess up with the structure of the tree, erase versions, change HEAD pointers and more. You'll see some (dubious) advice online directing you to do this when running into problems, such as accidently commiting a file. Gabi is strongly against these solutions, especially if you're not sure what you're doing. Often just pushing a new version of the file resolves issues much more cleanly (what do we care if the tree is ``contaminated''?). @Saifun has a different view on this? In any case, you should definitely approach with utmost caution when seeing these commands:
git reset
- @Saifun, more?
Some of the explanations are taken from here. You may find this guide useful.
git init
- initialize an existing directory as a Git repositorygit clone <URL>
- locally retrieve an entire repository from a remote locationgit branch
- list all local branches. An asterisk represent the current branchgit checkout
- switch to another branch and check it out into the working directorygit checkout -b <branch-name>
- create a new branch at the current commitgit status
- show modified files in working directory
git add
- add a file as it looks now to the next commitgit commit -m "<message>"
- commit the staged changes with the specified commit messagegit diff [--staged]
- shows the diff of the changed that are not staged (--staged
shows diff for staged but not commited files)
git pull [origin <branch>]
- fetch and merge commits from the remote branchgit push [origin <branch>, -u <upstream>]
- transmit local branch commits to the remote repository branchgit merge
- merge a remote branch into the current branchgit rm; git mv
- delete/move a file and stage the removal for commitgit stash [pop]
- save modified and staged changes, (pop
adds to the current branch the top of stash stack)git log
- show the commit history for the currently active branch