Skip to content

Instantly share code, notes, and snippets.

@saifun
Last active December 1, 2021 10:06
Show Gist options
  • Save saifun/019882b55685e2fb844583a19b08a1ec to your computer and use it in GitHub Desktop.
Save saifun/019882b55685e2fb844583a19b08a1ec to your computer and use it in GitHub Desktop.
Git basics for SLAB

Git Tutorial

Git is an awesome tool 😄

Table of Contents generated with DocToc

Motivation

Backup and Accessibility

Lost files could be recovered from the server.

image

Because the code is not located only locally on the computer, but also on a server, it can be accessed from anywhere in the world.

image

Version control

Snapshots of a project are stored with an easy access.

Mistakes happen

Recorded snapshots make it easy to undo mistakes and go back to a working version. A bug can be tracked back to its origin by finding out exactly when it started occurring. Git has a cool tool for binary search over commits (git bisect).

Branching

  • Many developers can work on the same code.
  • Several features could be worked on simultaneously, and switched between
  • An idea can be easily discarded if it turns out to be a bad one

image

Some Important Terms

Let's start with the definition of some important terms that are used to create the Git world.

General

Repository

A repository ("repo") is a directory that stores the content of a project, and some additional metadata. The repo stores the files themselves, the versions of those files, commits, deletions, and more. image

Fork

A fork is a copy of a repository at a certain point at time.

Origin

The conventional name for the primary version of a repository, in the remote server.

Workflow

Branch

A branch is a version of the repository that diverges from the main working project.

Commit

A commit is a snapshot of the repo at a certain time.

Upstream

An upstream branch is the branch tracked on the remote repository by the local branch.

Merging

At some point we do want to update code bases between branches, this is called merging in git terminology. Git tracks changes per line, and compares three versions when merging to track what should be in the merged version. It'll try to do the most merges automatically, but will present the user with conflicts when two versions of the file have changed the same line. This is called conflict, and need to be resolved manually. image

There are several types of merging, as elaborated below.

Pull Request (PR)

A PR is a request for merging all commits from one branch to another, enabling multiple colaborators to discuss the proposed changes before integrating them into the official project. While this could be done between any two branches, the term is generally used when finishing a feature and merging that branch upstream. image

Rebase

Moving some commits from an upstream branch into a feature branch. This is usally done when another feature has finished developement while working on the current feature. If this is the current branches state:

image

Then a regular merge command creates a new “merge commit”, resulting in a branch structure that looks like this:

image

And a rebase command moves the commits from main to the current branch:

image

Healthy Workflow

Working with SSH

There are two ways to work with Github - HTTP and SSH. Working with HTTP was probably simpler until Github changed the privacy conditions and decided to add personal access tokens with an expiration date. An explanation about token creation could be found here. The recommended way is using SSH. It is easy to configure, and once you counfigure it you don't need to think about it anymore. To use SSH you need to generate a key, and add it to GitHub, as explained here.

What should we track with Git?

GitHub has a strict file size limit of 100MB (Files that you add to a repository via a browser are limited to 25MB per file. You can add larger files, up to 100MB each, via the command line). This is not an issue if you are just uploading lines of codes, but you shouldn't upload data and binaries to GitHub (there are some ways to overcome this restriction, but it is considered bad practice).

In addition, git stores incremental changes of files, and is great for comparing and tracking changes line-by-line. This doesn't work as effectively for binaries or trained models, etc. For that purpose, we should use an gdrive, dropbox, etc.

Special files

.gitignore

Sometimes you may not want to track all your files with Git. This includes some local caches, IDE automatically generated files, some local configurations etc. Configuring a .gitignore makes Git ignore these files and not showing them during staging. GitHub has a feature that generates a .gitignore when creating a repo.

README.md

A readme file is the place to write some information about the project using a free language. These include installation instructions, usage instructions, and any other useful information for someone that views the repo without any previous knowledge. GitHub knows to create a fun preview from this file. The file itself is written in Markdown.

LICENSE.md

TODO

Git working cycle

When working on a project with Git, you want to follow a workflow that will help you to enjoy your work:

  • [RECOMMENDED][ONCE] When working on an opensource project, fork it. Especially recommended if there's any chance you'd like to change something in the codebase.

  • [ONCE] Clone the repo

  • Never commit to master - always checkout to another branch before you start writting code. It's recommended to choose meaningful names for branches, as the name of the PR will be the same as the branch name. The convention is <purpose>/<name>, for example feature/<name>.

  • [OPT] If you want to work on another feature based on the current one, before merging it to master, checkout to another branch from the current one.

  • After every change you make or at the end of a work day, commit your changes. Always use a meaningful commit message.

  • After every commit, push your changes to the branch you're working on in the origin to backup your work. Pushing shouldn't be something dangerous, or that has side-effects implications. If it does in your setup, you should revisit your configuration.

  • If you have tests, make sure to run them.

  • When you finish working on a feature, open a pull request to master. If you are working with someone else on the project, ask for a review and fix the code according to the comments before merging.

  • [OPT] If you have conflicts, pull the changes from the branch you want to merge to, and solve them. Then commit the changes.

  • Merge the PR, checkout to master, pull the changes, and so on (back to step 3 - checkout to a new branch for the next feature).

  • Keep a clean git status at all times. Don't conform with having tons of files in there that you don't remember where they belong or where they came from. Maintain a strong .gitignore file to achieve this. git status should clearly tell you what you're working on right now, and should be clean after every commit and push.

  • [TIP] Use a git-aware shell (e.g., fish), to display the branch that you're on and whether you have untracked changes, etc.

  • [TIP] If you switch a computer and want to fetch some new branches from the origin, use git pull. Then you may checkout to the branch you worked on before, and continue from the same place.

⚠️ Danger zone ⚠️

image

Git has commands that let you mess up with the structure of the tree, erase versions, change HEAD pointers and more. You'll see some (dubious) advice online directing you to do this when running into problems, such as accidently commiting a file. Gabi is strongly against these solutions, especially if you're not sure what you're doing. Often just pushing a new version of the file resolves issues much more cleanly (what do we care if the tree is ''contaminated''?). However these commands shouldn’t be avoided - every developer needs these commands under some circumstances. Developers simply need to be aware that some commands, such as these, have irrecoverable consequences and they should understand what those consequences are. In any case, you should definitely approach with utmost caution when seeing these commands:

  • git reset
  • git revert
  • git push --force

Useful Git Commands

Some of the explanations are taken from here. You may find this guide useful.

  • git init - initialize an existing directory as a Git repository
  • git clone <URL> - locally retrieve an entire repository from a remote location
  • git branch - list all local branches. An asterisk represent the current branch
  • git checkout - switch to another branch and check it out into the working directory
  • git checkout -b <branch-name> - create a new branch at the current commit
  • git status - show modified files in working directory

image

  • git add - add a file as it looks now to the next commit
  • git commit -m "<message>" - commit the staged changes with the specified commit message
  • git diff [--staged] - shows the diff of the changed that are not staged (--staged shows diff for staged but not commited files)

image

  • git pull [origin <branch>] - fetch and merge commits from the remote branch
  • git push [origin <branch>, -u <upstream>] - transmit local branch commits to the remote repository branch
  • git merge - merge a remote branch into the current branch
  • git rm; git mv - delete/move a file and stage the removal for commit
  • git stash [pop] - save modified and staged changes, (pop adds to the current branch the top of stash stack)
  • git log - show the commit history for the currently active branch

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment