Discover the importance of version control when working on data science projects and explore how to use Git to track files, compare differences, modify and save files, undo changes, and allow collaborative development through the use of branches. Introduction to the structure of a repository, create new repositories and clone existing ones, and show how Git stores data. Skills to handle conflicting files.
By George Boorman, Analytics and Data Science Curriculum Manager, DataCamp
Ressources: Git Cheatsheet
Learn what version control is and why it is essential for data projects. Discover what Git is and how to use it for a version control workflow.
- Version control is a group of systems and processes to manage changes made to documents, programs, and directories
- Why is version control important?
- Git is not GitHub, but it's common to use Git with GitHub
- Benefits of Git
- Using Git
- Repository
- Staging and committing
- Comparing with
diff
$ git --version
$ git status
$ git add .
$ git commit -m "initial commit"
$ git diff -r HEAD filename
Examine how Git stores data, learn essential commands to compare files and repositories at different times, and understand the process for restoring earlier versions of files in your data projects.
- The commit structure - metadata, tree, blob
- Git log and hash
- What changed between two commits?
- Unstaging a file and restoring last version of file
- Customizing the log output
- Cleaning a repository
$ got log -2
$ git show c27fa856
$ git annotate report.md
$ git reset HEAD summary_statistics.csv
$ git checkout -- summary_statistics.csv
$ git checkout .
$ git log --since='Apr 2 2022' --until='Apr 11 2022'
$ git clean -n && git clean -f
Tips and tricks for configuring Git to make you more efficient! Discover branches, identify how to create and switch to different branches, compare versions of files between branches, merge branches together, and deal with conflicting files across branches.
- Levels of settings - local repo, global and system
- Ignoring specific files
- Branches
- Creating, reporting, merging
- The difference between branches
- Switch between branches
- Handling conflicts
$ git config --list
$ git config --global user.name 'John Smith'
$ git config --global alias.ci 'commit -m'
$ git checkout -b report
$ git diff main summary-statistics
$ git merge source destination
$ git mergetool
$ cat merge.txt
<<<<<<< HEAD
this is some content to mess with
content to append
=======
totally different content to merge later
>>>>>>> new_branch_to_merge_later
Introduction to remote repositories and how to work with them to synchronize content between the cloud and your local computer. Create new repositories and clone existing ones, discover a workflow to minimize the risk of conflicts between local and remote repositories.
- Creating repos
- Remote repos
- Collaborating on Git projects
- Fetching from a remote
- Synchronizing content
- Pulling from a remote
- Pushing to a remote
- Resolving a conflict
$ git init
$ git init mental-health-workspace
$ git remote -v
$ git clone path-to-project-directory
$ git clone https://github.com/datacamp/project
$ git remote add george https://github.com/george_datacamp/repo
$ git fetch origin main
$ git merge origin main
$ git pull origin main
$ git push remote local_branch