Often, the scope of a project in source form grows as it develops as an idea. As a response, larger projects are often refactored into smaller submodules, some of which may be reused across other codebases as time goes on. To support this use-case, the author has developed a procedure to convert subdirectories of a given codebase directly into submodules, while also preserving all change history for those files. The procedure is outlined in the sections that follow.
Before making any of the destructive changes, described next, one must create a clean clone of the source repository. This repository is the one that contains the subdirectory that you want to isolate into a submodule:
# $source_repo_url is the URL of the source Git repository
git clone $source_repo_url
# $source_repo_name is the name of the directory that was just cloned to
cd $source_repo_name
# $source_repo_branch is the name of the branch whose change history should be adapted
git switch $source_repo_branchOnce the source repository as been cloned, make sure there's a remote destination ready for storing the adapted changes. This can be a bare repository somewhere on disk, a fresh GitHub repository, or any other number of things depending on the workflow. As long as it can be assigned as a git remote, it should all work the same. After initializing the repository, make sure there's an initial commit there with stuff like a .gitignore and a README, etc. so that the adapted changes can be rebased.
Now that the source and destination repositories are ready, run the following commands within the source repository root:
# set new origin to destination repository
git remote set-url origin $dest_repo_url
# $source_dir is the subdirectory of the source repository that should be isolated
git filter-branch --subdirectory-filter $source_dir
git branch -m temp/filtered
# $dest_repo_branch is the name of the default branch in the destination repository
git fetch origin $dest_repo_branch
git rebase $dest_repo_branch
# renaming after rebase
git branch -d $dest_repo_branch
git branch -m $dest_repo_branch
# push changes to destination repository
git push --force --set-upstream origin $dest_repo_branchAfter running the following commands, the destination repository should now contain the adapted changes, rooted at the original subdirectory with any additional files that were committed beforehand.
Finally, it's time to remove the given subdirectory from the source repository and add the destination repository as a new submodule. To do this, navigate to the working copy of the source repository and execute the following commands:
# switch to a clean branch based on the current one
git switch -c temp/add-submodule
# $source_dir is the subdirectory of the source repository that was isolated
git rm -r $source_dir
git commit -m "Remove subdirectory sources"
# add the destination repository as a submodule
git submodule add $dest_repo_url $source_dir
git commit -m "Add submodule"
# merge our changes to whichever branch
git switch $source_repo_branch
git merge add-submodule
git push
# cleanup!
git branch -d add-submoduleNow, all changes have been ported over to a submodule with everything stored in the right place. Yay!