Skip to content

Instantly share code, notes, and snippets.

@eterekhin
Created October 8, 2023 11:12
Show Gist options
  • Save eterekhin/9d2dd7bbb772e3b8aa9d9627d35450ac to your computer and use it in GitHub Desktop.
Save eterekhin/9d2dd7bbb772e3b8aa9d9627d35450ac to your computer and use it in GitHub Desktop.
Pro Git

Version control systems before Git

  1. Local. All changes remain on a developer's machine. Changes tracking is handled by a database, that stores history of a file as a serias of revisions(or diff deltas). RCS (Revision Control Systems) is an example
  2. Centralized. There's one remote server which stores the code. Clients (developers) can pull the changes and push them, but they don't have history on their local machines, it's stored on the server. Downside of this approach is centralization. If the remote server is corrupted you may lose all the history. If the server goes down for a while you can't use version control system until it's back alive. Examples of a such approach are CVS, Subversion and Perforce
  3. Distributed. All changes are mirrored to the clients. In case of a remote server downtime you can keep working locally and push the changes when the server is back alive. If the data on the server is lost, clients have the actual data (may be not the latest version since somebody may push changes after your latest pulling, but it's way better than in centralized VCS scenario). Examples of distributed VCS are Git, Mercurial, Darcs

Git

Checksumm

Git stores everything not by a name but by a hash value of its content. Files and directories are checksummed when it's added to Git. Checksumm is a 40-character string composed of hexadecimal characters (0-9 and a-f). It's calculated based on a file or directory structure in Git. It helps to detect if a file or directory content is changed. Here's an example of a hashcode 24b9da6552252987aa493b52f8696cd6d3b00373. For checksumming Git uses SHA-1 algorithm

Working tree

Git directory is stored on remote repository and copies into your project directory on cloning. In its turn, the working three is a single checkout of one version of the project. On cloning, these files are pulled out from the compressed database in the Git directory and placed on disk for you to use or modify.

States of a file

According to this classification a file can be in a few statuses at the same time:

  1. Untracked. You just create a file, it's not tracked by Git until you stage this file. Also if you unstage already staged file it turn into Untracked state.
  2. Unmodified. You've staged the file. Or you have just cloned a new repository and all files are Staged and Unmodified.
  3. Modified. You've changed a staged file but haven't staged this change.
  4. Staged. You've modified a file and staged the change. Git stores staging area (all the changes that will get into commit) in your .git directory. This file also becomes Unmodified.
  5. Commited. You've modified the file, staged the change and committed the change. If you've just cloned a repository all the files will be in "Commited" state. This file also becomes Staged and Unmodified.

Git workflow

Modify some files -> Stage some of that files -> Commit the staged files. Sometimes you can spin in a loop Modify some files -> Stage some of that file -> Modify some files -> ... and only after a few iterations of the loop you commit. It happens when you review the changes, you review them one by one and stage the reviewed file. After it you continue reviewing and editing the following files.

Installing Git

  1. Linux. For Debian-based OS execute ```sudo apt install git-all``
  2. MacOS. The easiest way is to install Xcode Command Line Tools
  3. Windows. You have to install "Git for Windows". It's a port of Git to Windows. The thing is that Git is built on Posix API, that is not implemented in Windows. Git for Windows additionally provides Posix tools like bash, perl, sed, awk, tr, etc to have the Git working on Windows platform. Git for Windows uses MSYS2 to achieve this.

First Git setup

When you first run Git, you need to configure user.email and user.name settings. It's needed to be recognized when somebody will go through your code in repository.

You can override settings defined in a broader scope (System-wide for example) in a narrower one (User-wide or Project-wide).

To explore all git settings in your system: git config --list If you want to see the origins, pass "--show-origins" argument: git config --list --show-origins

In the git config --list command values can be repeated. It happens when you define value in one scope (see about scopes below) and override it, so if you want to know what Git thinks a specific key's value you can type git config <key>. To see project settins run this command in a project directory.

You should see something like this :

file:/etc/gitconfig     aliases.test=echo 1
file:/home/Evgeny.Terekhin/.gitconfig   user.name=Evgeny.Terekhin
file:/home/Evgeny.Terekhin/.gitconfig   [email protected]
file:.git/config        core.repositoryformatversion=0
file:.git/config        core.filemode=true
file:.git/config        core.bare=false
file:.git/config        core.logallrefupdates=true

Git stores settings in several places:

  1. System-wide /etc/gitconfig. Settings defined here will be applied to every user in the system. Pass --system argument to see these values
  2. User-wide. /home/Evgeny.Terekhin/.gitconfig, C:\Users\$USER on Windows. Setting difined here will be applied to the all projects created within the scope of the user. Pass --global argument to see these values
  3. Project-wide. /home/Evgeny.Terekhin/Documents/SomeProjectSettings/.git/config will be applied to a project. Pass --local argument to see these values

In order to set name and email you can use the following commands:

   git config --global user.email "[email protected]"

Both are set on a global scope (so each your repository will inherit these settings)

If you don't use Graphical application for Git you may want to configure an editor which was called by git when you need to edit the code: git config --global core.editor code" If it's not configured, Git will use the system default editor

Default branch

Default Git branch starting from Git 2.28 is master, but you can specify it by running: git config --global init.defaultBranch <default_branch_name>

Getting a Git Repository:

  1. Turning a project directory to a repository. You need to execute git init in the proejct directory. It creates a .git subdirectory with all necessary repository files - a git repository skeleton. At this point nothing is added to git.
  2. Cloning a repository. You need to execute git clone <url> command. For example git clone https://github.com/libgit2/libgit2. The libgit2 directory is created and /libgit2/.git directory is initialized. After that all the necessary data is pulled down and Git checks out a working copy of the latest version. if you need a directory with another name you can specify it like this: git clone https://github.com/libgit2/libgit2 mylibgit . In this example we've used HTTPS protocol, but SSH is also available ([email protected]:libgit2/libgit2.git)

Recording changes to the repository

Every file can be in 2 states: Untracked(the file hasn't been staged) and Tracked (the file has been staged). Tracked files are files that were added in the last snapshot and staged files that were newly created. They can be in Unmodified, Modified or Staged states. Untracked file are everything else, any files in the working directory that weren't added to Git.

Git tracks the changes in tracked files and ignore any changes in untracked. The main tool to determine which files are in which state is git status command. When you have just cloned a repository you should see that you don't have untracked files:

    > git clone [email protected]:libgit2/libgit2.git
    > ...
    > git status
    > On branch main
      Your branch is up to date with 'origin/main'.
      nothing to commit, working tree clean

This command tells that the working tree is clean (there're no modified tracked files), shows the current branch(main) and tell (Your branch is up to date with origin/master) that local main branch is not diverged from the remote main branch (the commits from the local branch are identical the commits from the remote branch). If there're any untracked files they will be listed here.

If we stage some changes and add some untracked changes we will see something like this:

   > echo 1 >> 1.txt
   > echo 1 >> 2.txt && git add 2.txt
   > git status
   > On branch master

     No commits yet

     Changes to be committed:
         (use "git rm --cached <file>..." to unstage)
	      new file:   2.txt

      Untracked files:
         (use "git add <file>..." to include in what will be committed)
	      1.txt

There's also a short form of this command git status -s:

   > echo 1 >> 1.txt
   > echo 1 >> 2.txt && git add 2.txt 
   > echo 1 >> 3.txt && git add 3.txt && echo 1 >> 3.txt
   > git status -s
   > A  2.txt
     AM 3.txt
     ?? 1.txt

A - means that it's a new file added to stage
M - means modified
? - means untracked file
You may notice that there're 2 columns in the git status -s output. Left-handed column shows changes in stage area, right-handed column shows changes in working tree.

Now, let's stage all files, commit them and repeat the operation:

   > git add . 
   > git commit -m init
   > ...
   > echo 1 >> 1.txt
   > echo 1 >> 2.txt && git add 2.txt 
   > echo 1 >> 3.txt && git add 3.txt && echo 1 >> 3.txt
   > git status -s
   >   M 1.txt
      M  2.txt
      MM 3.txt

Now we don't see A since there're no new files in stage. We see M instead. M in the left-handed column means that already staged in the latest snapshot file was modified and staged again.

Ignoring files

Sometimes you don't want some files to be added to git when you execute git add . command. Sometimes it's build output files that have no profit to be commited and shared between developers. If you want to set some rules about ignoring files that are going to be staged you should create a .gitignore. file

In .gitignore file you can use glob patterns (like simplified regular expressions that shells use), lines starting with # are ignored, if you want to exclude a directory at the .gitignore's level use forward slash as a start symbol (/SomeDir). If you want to exclude SomeDir directory recursively use forward slash as a finish symbol (SomeDir/). If you want to negate a pattern start it with exclamation point (!). Here's small example:

.gitignore file:

# ignore all .obj files
*.obj

# but allow 1.obj
!1.obj

# ignore Bin directory in the current directory
/Bin

#ignore all Output directories
Output/
   > touch 1.obj 2.obj
   > mkdir Bin
   > touch Bin/somefile
   > mkdir Nested
   > mkdir Nested/Bin
   > touch Nested/Bin/somefile
   > mkdir Output
   > mkdir Nested/Output
   > touch Nested/Output/somefile
   > git add .
   > git status --ignored=traditional
     On branch master

     No commits yet

     Changes to be committed:
	 (use "git rm --cached <file>..." to unstage)
	  new file:   .gitignore
	  new file:   1.obj
	  new file:   Nested/Bin/somefile

     Ignored files:
	 (use "git add -f <file>..." to include in what will be committed)
	  2.obj
	  Bin/
	  Nested/Output/

Here we see that Bin was ignored only on a .gitignore file level and was added in a "Nested" folder. 2.obj was ignored, but 1.obj staged. Output folder was ignored everywhere

You also can create a nested .gitignore file. The file will override the parent's rules for all nested files and files at the same level as the .gitignore file. You can create a .gitignore file in a nested directory and its rules will be applied to nested files and files at the same level .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment