- GSoC: Refactor Temporary File Handling in Git
Git creates temporary files throughout the codebase. It creates these files to perform atomic operations[1] or to pass them external tools[2]. Once used, Git deletes the files from disk--or, at least, it should. Unfortunately, Git source code does not currently have a unified method of creating and deleting temporary files, and some code paths result in supposedly "temporary" files remaining on disk.
This project aims to unify temporary file handling in Git, ensuring that all temporary files behave in the same way, and that all of them are deleted before the program exits. For forensic purposes, the new temporary file API will also provide users the option to specify that temporary files should remain on disk in the case that Git encounters a fatal error during execution.
- Name: Brian Gesiak
- Email: [email protected]
- IRC nick: modocache
- Telephone: +81-XXX-XXXX-XXXX (Japan)
- Other contact methods: @modocache (Twitter), brian.gesiak (Skype)
- Country of residence: Japan
- Timezone: Japan Standard Time (UTC+09:00)
- Primary language: English
This project was originally featured on the idea page.
I propose extracting the temporary file handling functions used in
lockfile.lock_file
and lockfile.remove_lock_file
into a separate interface,
tempfile.h
.
The lockfile
functions maintain a linked list of temporary files. The
remove_lock_file
function is registered as an atexit
exit handler. It
traverses the list of temporary files and ensures they are all deleted when the
program exits.
The project is composed of two main phases:
- Extract of the
lockfile
functions intotempfile.h
. - Replace each custom implementation of temporary files in the Git codebase
with
tempfile.h
functions.
Based on my investigation thus far, temporary files are ubiquitous in Git. There are, however, four main implementations. Below are the four functions which create temporary files (in order of frequency of use):
lockfile.lock_file
[3]git-compat-util.odb_mkstemp
[4]git-compat-util.odb_pack_keep
[5]diff.prepare_temp_file
[6]
The project will involve replacing each of these implementations in turn. I will add tests as necessary to ensure each of the refactored implementations work as expected.
TODO: Examine the differences between each implementation, and perhaps draft an interface for
tempfile.h
that takes those into account.
- 4/22: [GSoC] Accepted student proposals announced.
- 4/22 - 4/28 (one week): Discuss interface for
tempfile.h
with mentor. - 4/29 - 5/12 (two weeks): As a "proof of concept", experiment with replacing
implementation of
lockfile.c
withtempfile.h
functions. - 5/13 - 5/18 (one week): Present a diff with the "proof of concept" and request feedback from mentor.
- 5/19: [GSoC] Students begin coding.
- 5/19 - 6/9 (three weeks): Submit patches for replacing
lockfile.c
implementation withtempfile.h
functions as a RFC. The goal of this project is unifying the implementation of temporary files, but at this point I will have simply moved the implemenetation oflockfile.c
elsewhere; that is, no unification will have yet occurred. Therefore, the patches submitted at this point are not meant to be merged, but rather, will be used to gather feedback from the community on the interface and implementation oftempfile.h
. - 6/10 - 6/23 (two weeks): Submit patches for replacing
diff.prepare_temp_file
withtempfile.h
functions. These patches represent a tangible benefit for the project: two distinct implementations of temporary files will have been unified. My personal goal is to have these patches merged intopu
, although depending on feedback and the position of the project maintainer, this might not be feasible. - 6/24 - 6/28: [GSoC] Mentors and students submit mid-term evaluations.
- 6/29 - 7/19 (three weeks): Submit patches for replacing
git-compat-util.odb_mkstemp
withtempfile.h
functions. - 7/20 - 8/10 (three weeks): Submit patches for replacing
git-compat-util.odb_pack_keep
withtempfile.h
functions. - 8/11: [GSoC] Suggested 'pencils down' date.
- 8/12 - 8/18 (one week): Investigate and report on any remaining parts of the codebase that do not yet use the unified tempfile API.
- 8/19: [GSoC] Firm 'pencils down' date.
- 8/19 - 8/23: [GSoC] Mentors and students submit final evaluations.
Each of the above estimates includes the time necessary to refactor any related areas of the codebase, add tests as necessary, and respond to feedback provided via the mailing list,
- Core member of Kiwi, an Objective-C behavior-driven development framework, since 2013. Authored commits here.
- Submitted small patches to Git and libgit2.
- Author of several open-source libraries in Objective-C, Python, and Ruby, available on GitHub.
- Software Engineer at GREE, Inc. (Tokyo, Japan), 2011 - 2012. Developed mobile and web applications in Objective-C/PHP/JavaScript. Used SVN and Git for version control.
- Senior Software Engineer at ShopKeep POS (New York, USA), 2012 - 2013. Developed mobile and web applications in Objective-C/Ruby/JavaScript. Used Git for version control.
I completed a B.A. in Japanese Language and Literature in 2008, and am now a research student with a concentration in parallel and distributed computing at the University of Tokyo. I've only just begun my education, so I'm still in the process of finding a concrete research topic.
In an email from 2013, the Git organization administrator for GSoC, Shawn Pearce, wrote:
Git has been involved since 2007. In all of that time we have had very few student projects merge successfully into their upstream project...before the end of GSoC. Even fewer students have stuck around and remained active contributors.
I have several years of experience working on distributed teams and open source projects. I'm confident I can use that experience to ensure that I set realistic milestones that result in code getting merged into Git.
I'm hoping to use GSoC as an opportunity to begin making contributions to Git on a regular basis. The ideas page for this year's GSoC highlights the fact that there are many ways to get involved, and I'd love to do so.
I've used Git for many years. Contributing to the project itself has threefold benefits:
- I can contribute to the development of a tool I enjoy using.
- I can brag to my friends and colleagues, most of whom already use Git, that they're running code I helped write.
- Working on the implementation gives me a better understanding of how Git works, and thus more comfortable using it from day to day.
[1] When writing pack files, Git writes to a temporary file first. Once this operation finishes successfully, the temporary file is atomically moved into place.
[2] For example, when displaying a diff using an external tool by running the
git diff --ext-diff
command, Git creates two temporary files and passes them
to the tool. After the tool exits, Git deletes these temporary files.
[3] The function is used in lock_file.hold_locked_index
, which in turn is used in
countless places, including cache-tree.write_cache_as_tree
,
merge-recursive.merge_recursive_generic
, merge.checkout_fast_forward
, and so on.
[4] Used by fast-import.start_packfile
, pack-write.write_idx_file
,
pack-write.create_tmp_packfile
, and index-pack.open_pack_file
. Each
consumer is responsible for removing the file at some point during its
execution. For example, pack-objects.write_pack_file
creates a temporary
file using create_tmp_packfile
, and later renames this file to the appropriate
destination once writing is complete (although there are plenty of opportunities
for the program to halt prior to that point).
[5] Used by fast-import.keep_pack
and index-pack.final
.
[6] Used only internally by diff.run_external_diff
and
diff.run_textconv
.