Skip to content

Instantly share code, notes, and snippets.

@leeper
Last active December 27, 2015 16:24
Show Gist options
  • Save leeper/477a9630dbb698f17bf8 to your computer and use it in GitHub Desktop.
Save leeper/477a9630dbb698f17bf8 to your computer and use it in GitHub Desktop.
How much of the R package namespace is left?
# R only has one package namespace
# All packages exist in a single, non-hierarchical namespace
# Once a package name is used, it cannot be reused
# CRAN hosts most existing R packages; Bioconductor holds many more; once claimed on either service, a package name is reserved
# GitHub hosts many other packages with no "claim" to a given package name and there's no list of such packages
#
# "Writing R Extensions" says this about names:
## "The mandatory ‘Package’ field gives the name of the package.
## This should contain only (ASCII) letters, numbers and dot, have
## at least two characters and start with a letter and not end in
## a dot. If it needs explaining, this should be done in the
## ‘Description’ field (and not the ‘Title’ field)."
#
# CRAN policy says this about names:
## Packages should be named in a way that does not conflict
## (irrespective of case) with any current or past CRAN package
## (the Archive area can be consulted), nor any current Bioconductor
## package. Package maintainers give the right to use that package
## name to CRAN when they submit, so the CRAN team may orphan a
## package and allow another maintainer to take it over.
#
# So, how much of the package namespace have we used up (on CRAN)?
#
# Packages on CRAN (as of 2015-12-27)
options("repos" = "https://cran.rstudio.com")
a <- available.packages()
p1 <- a[,"Package"]
length(p1)
# [1] 7666
# Current packages is an underestimate because archived packages retain a "claim" to their name (permanently held by CRAN).
# get all Archived packages (current and former)
x <- paste0(readLines("http://cran.r-project.org/src/contrib/Archive/"), collapse = "")
# extract package names (this is a bit hacky)
p2 <- regmatches(x, gregexpr("(?<=\")[[:alnum:].]+(?=/\")", x, perl=TRUE))[[1]]
length(p2)
# [1] 7115
# combined names currently from CRAN and Archived
p3 <- unique(c(p1,p2))
# [1] 8920
# the total number of possible name combinations is:
# + all two character names
n2 <- (26*(37^1))
# + all three character names
n3 <- (26*(37^2))
# + all four character names (this is actually an overestimate)
n4 <- (26*(37^3))
# + all five character names (this is actually an overestimate)
n5 <- (26*(37^4))
# + all six character names (this is actually an overestimate)
n6 <- (26*(37^5))
# + all seven character names (this is actually an overestimate)
n7 <- (26*(37^6))
# + all eight character names (this is actually an overestimate)
n8 <- (26*(37^7))
# + all nine character names (this is actually an overestimate)
n9 <- (26*(37^8))
# + all ten character names (this is actually an overestimate)
n10 <- (26*(37^9))
# How many package names are available
possible <- c(n2, n3, n4, n5, n6, n7, n8, n9, n10)
# How many package names have we used?
table(nchar(p3))
# 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
# 66 692 1010 1001 1171 1125 990 792 613 400 280 200 169 129 81 59 46 23 29 17 10 6 2 4 3 2
used <- as.vector(table(nchar(p3)))[1:9]
# percent of names used, by character length
cat(paste0("# ", 2:10, ": ", sprintf("%0.4f", 100*used/possible), "\n", collapse = ""))
# 2: 6.8607
# 3: 1.9441
# 4: 0.0766
# 5: 0.0021
# 6: 0.0001
# 7: 0.0000
# 8: 0.0000
# 9: 0.0000
# 10: 0.0000
# So, basically we're fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment