Currently the RubyGems index is stored as a Gzip file that is a marshalled array. Whenever Bundler needs to install a gem that is not yet installed it downloads the index, gunzips it and unmarshals it. It then looks for dependencies that are described in another file that is also a gzipped and marshalled file. There are several issues that arise from this setup:
- The full index must be downloaded and parsed, but does not contain dependency data, which must then be downloaded and parsed. This is a relatively time consuming process.
- The index must be centralized.
Additionally the gems themselves are currently centralized since there is nothing in the meta data that indicates where the gem should be downloaded from. However in order to allow this it is important to find ways of keeping the index from being poisoned (is this an issue in the centralized system?)
I'd like to propose an alternate way of indexing RubyGems: let's use DNS.
Here's how this might work:
-
Client sends question to local name server for ALL records at rails.index.rubygems.org
-
Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
-
Root delegates to .org name servers
-
.org name servers delegate to rubygems.org name servers
-
rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
-
rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all PTR records for 1.0.3.rails.index.rubygems.org. For example:
rails.index.rubygems.org. 600 CNAME 1.0.3.rails.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.activesupport.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.actiopack.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.activerecord.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.activeresource.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.actionmailer.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 0.0.3.railties.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 PTR 1.bundler.index.rubygems.org.
Note that some PTR records represent canonical gem names and others would be a CNAME pointing to the appropriate canonical version. The last record is an example of this where the CNAME record would likely resolve to something like 7.0.1.bundler.index.rubygems.org (which would be the reverse notation for bundler-1.0.7)
In addition to dependency management another interesting use of DNS is to provide references to where gems can be downloaded. Here is how this might work:
-
Client sends question to local name server for ALL records at rails.index.rubygems.org
-
Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
-
Root delegates to .org name servers
-
.org name servers delegate to rubygems.org name servers
-
rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
-
rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all NAPTR records for 1.0.3.rails.index.rubygems.org, for example:
rails.index.rubygems.org. 600 CNAME 1.0.3.rails.index.rubygems.org. 1.0.3.rails.index.rubygems.org. 600 NAPTR 100 10 "U" "TCP+http" "!^.*$!http://rubygems.org/rails-3.0.1.gem!i" . 1.0.3.rails.index.rubygems.org. 600 NAPTR 100 20 "U" "TCP+http" "!^.*$!http://backup.rubygems.org/rails-3.0.1.gem!i" .
Note that there is no need to do any complex regex translation to get the various URLs since they are mapped directly to the canonical name of the gem.
To support multiple platforms (i.e. jruby) the client will first try platform.z.y.x.gemname.index.rubygems.org. If this is not found then the client should use z.y.x.gemname.index.rubygems.org. If a platform gem is provided then CNAME records will also need to be provided for all of the variations, i.e platform.y.x, platform.x and platform.
DNS provides the tools necessary to make this a decentralized system if we desire. This would be accomplished by delegating responsibility for gem names out to different DNS servers other than the rubygems.org servers. For example, if responsibility for management of the Rails gem metadata was decrentralized then the interaction might look like this:
-
Client sends question to local name server for TXT records at rails.index.rubygems.org
-
Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
-
Root delegates to .org name servers
-
.org name servers delegate to rubygems.org name servers
-
rubygems.org name servers respond with the following NS record:
rails.index.rubygems.org. 600 NS ds1.rubyonrails.org rails.index.rubygems.org. 600 NS ds2.rubyonrails.org
-
The question is then sent to one of the two name servers which responds with a CNAME record pointing rails.index.rubygems.org to 1.0.3.rails.index.rubyonrails.org.
-
The rubyonrails.org name servers would then respond as shown in the scenarios above.
DNSSEC providers a means for signing DNS records so that you have verification that the name server is authoritative for the particular question. This technology is not yet widely deployed, however it does have the potential for providing layer of protection against gem poisoning when used in conjunction with and SHA signature. The SHA signature could also be stored in the name servers using a TXT or SIG record. This technology is still very experimental, but the potential exists for having a highly trusted distribution system.
DNS does not provide a mechanism for search for records given a part of a name. For example, there is no mechanism in DNS to query for the term "active" and get "activerecord", "activeresource", etc. This functionality would need to be provided using a protocol other than DNS.
I've removed the section on TXT records for holding meta data since it's not clear if DNS TXT records are the appropriate place for them. The goal is to focus on dependency resolution first and foremost.
Also, it may make sense to remove download routing from this as well, at least initially.