Last active
December 27, 2015 16:24
-
-
Save leeper/477a9630dbb698f17bf8 to your computer and use it in GitHub Desktop.
How much of the R package namespace is left?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# R only has one package namespace | |
# All packages exist in a single, non-hierarchical namespace | |
# Once a package name is used, it cannot be reused | |
# CRAN hosts most existing R packages; Bioconductor holds many more; once claimed on either service, a package name is reserved | |
# GitHub hosts many other packages with no "claim" to a given package name and there's no list of such packages | |
# | |
# "Writing R Extensions" says this about names: | |
## "The mandatory ‘Package’ field gives the name of the package. | |
## This should contain only (ASCII) letters, numbers and dot, have | |
## at least two characters and start with a letter and not end in | |
## a dot. If it needs explaining, this should be done in the | |
## ‘Description’ field (and not the ‘Title’ field)." | |
# | |
# CRAN policy says this about names: | |
## Packages should be named in a way that does not conflict | |
## (irrespective of case) with any current or past CRAN package | |
## (the Archive area can be consulted), nor any current Bioconductor | |
## package. Package maintainers give the right to use that package | |
## name to CRAN when they submit, so the CRAN team may orphan a | |
## package and allow another maintainer to take it over. | |
# | |
# So, how much of the package namespace have we used up (on CRAN)? | |
# | |
# Packages on CRAN (as of 2015-12-27) | |
options("repos" = "https://cran.rstudio.com") | |
a <- available.packages() | |
p1 <- a[,"Package"] | |
length(p1) | |
# [1] 7666 | |
# Current packages is an underestimate because archived packages retain a "claim" to their name (permanently held by CRAN). | |
# get all Archived packages (current and former) | |
x <- paste0(readLines("http://cran.r-project.org/src/contrib/Archive/"), collapse = "") | |
# extract package names (this is a bit hacky) | |
p2 <- regmatches(x, gregexpr("(?<=\")[[:alnum:].]+(?=/\")", x, perl=TRUE))[[1]] | |
length(p2) | |
# [1] 7115 | |
# combined names currently from CRAN and Archived | |
p3 <- unique(c(p1,p2)) | |
# [1] 8920 | |
# the total number of possible name combinations is: | |
# + all two character names | |
n2 <- (26*(37^1)) | |
# + all three character names | |
n3 <- (26*(37^2)) | |
# + all four character names (this is actually an overestimate) | |
n4 <- (26*(37^3)) | |
# + all five character names (this is actually an overestimate) | |
n5 <- (26*(37^4)) | |
# + all six character names (this is actually an overestimate) | |
n6 <- (26*(37^5)) | |
# + all seven character names (this is actually an overestimate) | |
n7 <- (26*(37^6)) | |
# + all eight character names (this is actually an overestimate) | |
n8 <- (26*(37^7)) | |
# + all nine character names (this is actually an overestimate) | |
n9 <- (26*(37^8)) | |
# + all ten character names (this is actually an overestimate) | |
n10 <- (26*(37^9)) | |
# How many package names are available | |
possible <- c(n2, n3, n4, n5, n6, n7, n8, n9, n10) | |
# How many package names have we used? | |
table(nchar(p3)) | |
# 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
# 66 692 1010 1001 1171 1125 990 792 613 400 280 200 169 129 81 59 46 23 29 17 10 6 2 4 3 2 | |
used <- as.vector(table(nchar(p3)))[1:9] | |
# percent of names used, by character length | |
cat(paste0("# ", 2:10, ": ", sprintf("%0.4f", 100*used/possible), "\n", collapse = "")) | |
# 2: 6.8607 | |
# 3: 1.9441 | |
# 4: 0.0766 | |
# 5: 0.0021 | |
# 6: 0.0001 | |
# 7: 0.0000 | |
# 8: 0.0000 | |
# 9: 0.0000 | |
# 10: 0.0000 | |
# So, basically we're fine. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment