RepoQuest: The Repository as a Textbook

Analysis: The Weaknesses (and Strengths) of Textbooks

Textbooks have been used for learning for centuries. However, in general, and especially in our specific context — computing education — they suffer from some critical defects:

Textbooks are passive. The reader does not know how well they have actually understood something. (Some people, including us, have tried to create “active textbooks”, but these only partially ameliorate the other issues.)
Textbooks tend to be focused on conceptual learning. Often, students want and need hands-on, practical learning.
Textbooks are not an authentic learning environment. When a student goes into the workforce, as a programmer, they will be working with repositories, not textbooks.
In computing learning, it is typical for a learner to also be constructing some software. However, this sits completely separate from the textbook, and becomes a second, independent entity they need to maintain. Moreover, the textbook itself has no knowledge of the software, and hence cannot help the student structure that process.
Students are often more engaged, and learn better, when their learning environment feels challenging and gamified. Textbooks cannot easily (or at all) simulate that.

At the same time, textbooks provide a structured, sequential learning process and guides a learner and lets them “bail out” of messy situations. They also work better for a wide variety of learners, as opposed to just the self-identified “hackers”.

Proposal: RepoQuest

Our response to this is to create a new model of textbook centered around the repository. We call it RepoQuest.

In RepoQuest, there is only one learning object: a repository. A student selects a topic they want to learn that is presented in RepoQuest format. Topics are structured as building towards and end-goal: “build an asynchronous Web server”, “build an image classifier”, etc.

They clone a starter repository, in which they will do all their work.

At every step, a student is given one or more artifacts (we are intentionally eliding many technical details below):

First of all, they are given an issue. The issue tells them what needs to be done next. Initially there will likely be only one new issue at a time; over time there could be multiple, giving students choices to follow.
The issue also gives them access to their learning materials. These could be links into chapters of books already on the Web, blog posts, and even new text that is added to a running document (that records all these materials, for later reference) in the repository itself.
Students create a branch to work on the new feature.
When the student thinks they are done, they commit their work to the repository.
RepoQuest has already populated the repository with pre-commit hooks. These can do usual activities like check for formatting, passing test suites, etc.
A student can also indicate that they are stuck. One option is to see whether we can incorporate CoPilot to help a student at this stage. Of course, a student can also “give up”, at which point they are given the solution by RepoQuest.
When everything passes, the student's work turns into a pull-request for the repo. This is merged in, and the branch can be deleted.
The student is now ready for their next challenge, which comes in the form of a new issue to resolve.

Many educational materials actually take this form — but passively. A book asks a reader to imagine such a workflow, but most readers will not do anything to actively engage with it, and the few that do will have no tools with which to do so. In contrast, RepoQuest forces the learner to get their hands dirty, make sure they are understanding as they go along, and learn valuable software engineering skills in the process.

Benefits of RepoQuest

When students are conscientious and build up a codebase alongside their reading of a book, the two are still disconnected. The book cannot detect what stage the code is at, pause, offer tips specific to the student's code, etc. RepoQuest does away with this distinction entirely.
Most students are not conscientious enough, or more likely even aware enough, to co-develop systems alongside books. Thus, when they finally get around to programming, it is in the form of a homework, that is well after, and potentially also not clearly connected to, the book. They find out that they did not learn clearly not enough not immediately from the book but weeks or months later, in the form of an autograder failure.
Autograders feel inauthentic: like they are artifacts of being in a course and getting a grade. In contrast, RepoQuest manifests similar issues as failure to satisfy the CI, which is a genuine, real-world problem that everyone who has tried to contribute code to a repository has had to encounter.
The student's project is no longer some ad hoc collection of code. Rather, it is a neatly structured sequence of commits (merged pull-requests), with a clean structure that enables them to use git and the GitHub interface to easily time-travel through their solution and study its progression if, months later, they have forgotten how they got to the end.
Students develop a facility with repositories! Rather than having some artificial IDE experience in class and having to learn about repositories either on the job or in some upper-level software engineering course, they get early exposure to the best practices of modern software development: using issues, branches, pull-requests, and so on.
Students build up a portfolio! Students who are part of the “in crowd” (often whose parents are themselves connected to computing) are conscious of the need to build up and maintain a portfolio of projects on a public GitHub page that they can share with employers. Students who are not in the know (which is most of them) do not know these things. RepoQuest forces them to start building up a portfolio, thereby slightly helping level the playing field and improving equity.

Plans

Our goal is to develop:

A RepoQuest framework for implementing all of the above, using GitHub's CLI scripting capabilities.
We intend to develop 2–3 units of learning material to show that RepoQuest really works. A particular focus right now is to teach asynchrony in Rust. Rust is becoming one of the most popular and important programming languages for its mix of safety and performance. However, programmers complain that some of Rust's features are quite difficult to learn. We have already done extensive work [1, 2] on Rust pedagogy. We already have over one million users for this material in about a year, and it's linked to from the standard Rust text [TRPL]. Therefore, we anticipate getting many users quickly.
We will build a lightweight authoring system for authors to build their material into RepoQuest. Our initial version will require the author to create and populate branches, etc., which will require some knowledge of Git. However, we believe we can hide most of this, so that professors with great material but not much knowledge of Git can still make their materials available to students.

Theory

RepoQuest is backed by a bunch of theory in learning and cognitive science, from active learning to legitimate peripheral participation and more. We leave out a discussion of these here.

Qualifications

The project is headed by Shriram Krishnamurthi, a professor of computer science at Brown. Shriram is a highly decorated researcher and educator (SIGPLAN's Robin Milner Young Researcher Award, SIGPLAN's Distinguished Educator Award (jointly), SIGSOFT's Influential Educator Award, and SIGPLAN's Software Award (jointly)). He has built several systems that have had strong industrial influence and also many widely-used educational products (all linked from his home page). The work will be done collaboratively with Will Crichton, soon to also be a professor at Brown, and Gavin Gray, soon to be a PhD student at Brown. Crichton and Gray have been collaborating with Krishnamurthi for nearly two years now on, amongst other projects, the Rust educational materials described above.

shriram/.md