Created
May 31, 2021 20:37
-
-
Save sellisd/b8a3de3b1259484a84d205f16186eee7 to your computer and use it in GitHub Desktop.
How much R code is in Rmd files
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
python -m pip install git+https://github.com/sellisd/gitrepodb.git # install gitrepodb tool | |
# download the 100 top-starred R-language repositories from github | |
gitrepodb init | |
gitrepodb query --query "language:R,sort:stars-desc:archived=False" --head 100 --project Rmd | |
gitrepodb add --basepath ./r_repos | |
gitrepodb download --project Rmd | |
# count the total lines of R code | |
find ./ -name '*.R' |xargs wc -l # gives 198721 total | |
# count total lines of Rmd files | |
find ./ -name '*.Rmd' |xargs wc -l # 192270 | |
# extract R code from Rmd files | |
Rscript -e "library(knitr); rmd_files <- list.files(path='.', pattern='*.Rmd', recursive=TRUE);purl_skip_errors <- function (rmd_file){return(tryCatch(purl(rmd_file), error=function(e) NULL))};lapply(rmd_files, purl_skip_errors)" | |
#count again | |
find ./ -name '*.R'|xargs wc -l # now I get 226498 total | |
# So in the top 100 repositories we have ~14.5 of Rmd files is R code which is about 14% of the R-code in .R files |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment