Skip to content

Instantly share code, notes, and snippets.

@sellisd
Created May 31, 2021 20:37
Show Gist options
  • Save sellisd/b8a3de3b1259484a84d205f16186eee7 to your computer and use it in GitHub Desktop.
Save sellisd/b8a3de3b1259484a84d205f16186eee7 to your computer and use it in GitHub Desktop.
How much R code is in Rmd files
python -m pip install git+https://github.com/sellisd/gitrepodb.git # install gitrepodb tool
# download the 100 top-starred R-language repositories from github
gitrepodb init
gitrepodb query --query "language:R,sort:stars-desc:archived=False" --head 100 --project Rmd
gitrepodb add --basepath ./r_repos
gitrepodb download --project Rmd
# count the total lines of R code
find ./ -name '*.R' |xargs wc -l # gives 198721 total
# count total lines of Rmd files
find ./ -name '*.Rmd' |xargs wc -l # 192270
# extract R code from Rmd files
Rscript -e "library(knitr); rmd_files <- list.files(path='.', pattern='*.Rmd', recursive=TRUE);purl_skip_errors <- function (rmd_file){return(tryCatch(purl(rmd_file), error=function(e) NULL))};lapply(rmd_files, purl_skip_errors)"
#count again
find ./ -name '*.R'|xargs wc -l # now I get 226498 total
# So in the top 100 repositories we have ~14.5 of Rmd files is R code which is about 14% of the R-code in .R files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment