This document attempts to capture some detail about large projects on GitHub. That includes looking at what tools they are using, what processes they use, and how much interaction they have.
This document is intended to provide a high level overview rather than dig into all of the details.
The projects that will be looked at are:
- Ruby on Rails: With over 3,400 contributors and many more users, Rails is one of the more popular and active projects on GitHub.
- Homebrew: The mac package manager had so many contributions to the repository and so many contributors (over 6,000) that they reworked their repository structure to become more maintainable.
- Bootstrap: The Web UI library, started at Twitter, is one of the more popular UI libraries in use. With close to 1,000 contributors they've dealt with about the same level of contributors as Kubernetes. One thing that makes Bootstrap interesting is the number of forks which are #2 on all of GitHub.
- React: The UI library from Facebook as over 1,100 contributors. The project started about the same time as Kubernetes, as well.
- Cloud Foundry: The popular PaaS is similar to Kubernetes in the way they organize the codebase. Leveraging multiple organizations and repositories for microservices is similar to the direction Kubernetes is going.
When comparisons with Kubernetes are drawn between repositories on projects that are primairly developed on a single repository the Kubernetes repository is used. When comparisons are drawn to projects using multiple repositories the comparies are to the Kubernetes organization. The document will attempt to note those when they arise.
Note, other infrastructure heavy projects (e.g., Apache Mesos and OpenStack) tend to run on their own infrastructure other than GitHub for daily tasks. GitHub, for these projects, is a mirror of the codebase.
Some quick stats on Rails, at the time the document was written:
- 3,442 contributors
- Peak contributions in a period were about 200 and only happened on occasion.
- Many change sets are small. For example, numerous developers have just shy of 50 commites with having only hundreds or a couple thousand lines of code changed.
- 710 pull requests are open, 20,028 are closed
- 360 open issues, 10,654 are closed
Rails uses a minimal set of CI tools via Travis CI and Code Climate. The configuration for Travis and tests run are checked in to the rails git repository alongside the codebase. To probot stale issue closer is in addition to testing tied into the pull requests.
For reference, probot is similar to Kubernetes prow. Probot is a service run by GitHub. The issue closer provided by probot is configured per repo where the current Kubernetes one is configured for an entire org.
While Rails has a high number of contributors those contributions typically come in smaller chunks compared to Kubernetes and the number of pull requets processed per week (~25) is less than the Kubernetes repository (~145). Rails as a project has been around longer than Kubernetes and has grown contributors over time. The size of the codebases are quite different. Rails is about 17% the size of Kubernetes.
Rails usage of CI tools is inline with how projects in general use GitHub. The project does not deviate from standard practices.
Homebrew is broken into several repositories. The package manager itself is in one repository, the standard forumula that people can install are in another repository, some test automation scripts are in another reposotiry, formula to specific platforms (e.g., PHP) are in separate repositories, and the website is in another repository. Different repositories have different purposes.
Homebrew-core, the repository with the most contributors at over 6,700, is mostly small configuration files detailing how to install applications. These are called forumula. These are short ruby files. It has over 105,000 commits, can have several hundreds of pull requests merged in a month, and leverages some continious integration customizations.
The continious testing for the formula happens in a Jenkins cluster. The scripts for that are in another repository. The Jenkins cluster enables testing on macOS for various versions and supported configurations. The system is also able to make Bottles which are binary and packaged distributions for applications for different macOS and architecture versions to support download rather than local builds.
The way Jenkins interacts with GitHub is via standard CI channels.
The brew application, that manages the formula, has had almost 590 contributors. For CI testing, brew leverages Travis CI in a standard setup.
It's worth noting that Homebrew has had so many clones against GitHub, as cloning is part of the workflow, that GitHub worked with Homebrew to optimize the way it interacts with git. In addition, Homebrew has caused GitHub to do work to their infrastructure to support projects with the number of clones Homebrew has. Homebrew is an example of a project GitHub adapted to in order to support.
Like Ruby on Rails, Homebrew uses the Probot stale issues closer.
Bootstrap is, possibly, the most widely used web UI framework. In addition to being widely downloaded and used, Bootstrap has had 975 contributors.
There are a few places where Bootstrap is different from the Kubernetes repo:
- The codebase is far smaller
- Over the past 6 years there have been fewer than 18,000 commits. Kubernetes can see that many commits in a single year
- One committer, the projects lead, has nearly 1/3 of all commits to the project
For testing Bootstrap leverages Travis CI in a standard setup.
In less than 5 years time React has become one of the more popular UI frameworks. While it started out as web it has moved into device UIs as well.
Some quick stats on React, at the time the document was written:
- 1,163 contributors
- 54 pull requests are open, 6,426 are closed
- 315 open issues, 5,202 are closed
- The top 10 contributors, by commit, have 3,950 commits accounting for about 41% of the total commits. For reference, Kubernetes to 10 contributors account for approximately 14% of the commits
For CI testing, React leverages CircleCI in a standard setup along with a typical pull request workflow.
Cloud Foundry is similar to Kubernetes in several regards.
- The project leverages multiple GitHub organizations
- There's a scheduler (Diego) that is an approximate competitor to Kubernetes
- Deals with Cloud Native Applications
- Is backed by big enterprises via a non-profit foundation
While not a direct competitor in terms of features and market there is overlap.
Because Cloud Foundry leverages multiple GitHub organizations and multiple repos on a GitHub organization looking at pull requests, issue counts, and starts doesn't offer a good comparison to Kubernetes.
The main Cloud Foundry GitHub organization has 345 repositories on it. Other organizations, such as cloudfoundry-incubator can have many repositories as well. For example, cloudfoundry-incubator has 188 repositories.
Even individual projects can be broken into multiple repositories. Diego and Garden are two examples of that. There are even repositories just to hold the design information and discussion.
Concourse is a CI toolchain, similar to TravisCI and CircleCI, that came out of the Cloud Foundry community. The Cloud Foundry community operates an instance of Councourse and uses it for testing on numberous Cloud Foundry repositories leveraging standard GitHub CI workflows.
While there are numerous repositories, typically used as microservies is a larger application, integration tests are run. A dashboard, powered by Concourse, can be seen at https://release-integration.ci.cf-app.com.
When Concourse is used on a repository that integration works using typical GitHub workflows.
This document is designed to fuel conversation rather than drive immediate conclusions.
Where further detail is needed or there are corrections it can be added as comments or the revision of the document may be updated with more detail.
Because this is being looked at through the eyes of Kubernetes development it can be useful to provide some context and details about Kubernetes.
Lines of code for Kubernetes:
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Go 10160 3010475 326333 526328 2157814
JSON 154 335193 8 0 335185
HTML 67 267185 3929 1 263255
YAML 746 34279 774 1847 31658
Markdown 585 39322 9563 0 29759
Bourne Shell 313 40362 5311 10497 24554
JavaScript 19 13806 1559 2913 9334
Protobuf 88 22437 3748 10655 8034
Plain Text 25 4454 319 0 4135
Assembly 36 3996 292 35 3669
Python 21 4185 771 722 2692
Makefile 85 4035 534 1710 1791
CSS 4 1468 8 5 1455
Perl 8 1128 142 139 847
C/C++ Header 2 5613 401 4371 841
Autoconf 16 669 10 45 614
Java 2 318 47 71 200
XML 3 141 18 24 99
C 4 164 33 37 94
Ruby 1 70 12 1 57
Toml 3 91 18 24 49
PHP 1 41 6 0 35
INI 3 36 6 0 30
ASP.NET 4 18 0 0 18
SQL 1 8 1 0 7
--------------------------------------------------------------------------------
Total 12351 3789494 353843 559425 2876226
--------------------------------------------------------------------------------
Excluding vendored dependencies, Kubernetes is:
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Go 6329 1448459 153478 194565 1100416
HTML 67 267185 3929 1 263255
JSON 143 217972 5 0 217967
Bourne Shell 296 38942 5155 10139 23648
YAML 648 23050 371 1690 20989
Markdown 376 20337 4448 0 15889
JavaScript 19 13806 1559 2913 9334
Protobuf 53 16751 2935 8664 5152
Python 20 4059 757 708 2594
Makefile 61 3467 444 1479 1544
CSS 4 1468 8 5 1455
Autoconf 16 669 10 45 614
Plain Text 10 281 16 0 265
Java 2 318 47 71 200
XML 3 141 18 24 99
C 2 104 18 26 60
Ruby 1 70 12 1 57
PHP 1 41 6 0 35
INI 2 24 4 0 20
ASP.NET 4 18 0 0 18
SQL 1 8 1 0 7
--------------------------------------------------------------------------------
Total 8058 2057170 173221 220331 1663618
--------------------------------------------------------------------------------