Skip to content

Instantly share code, notes, and snippets.

@lbusett
Created March 13, 2018 18:56
Show Gist options
  • Save lbusett/da7b1fba4345e03192a450226a17636e to your computer and use it in GitHub Desktop.
Save lbusett/da7b1fba4345e03192a450226a17636e to your computer and use it in GitHub Desktop.
script for importing publications from a "bibtex" file to a hugo-academic website
#' @title bibtex_2academic
#' @description import publications from a bibtex file to a hugo-academic website
#' @author Lorenzo Busetto, phD (2017) <[email protected]>
bibtex_2academic <- function(bibfile,
outfold,
abstract = FALSE,
overwrite = FALSE) {
require(RefManageR)
require(dplyr)
require(stringr)
require(anytime)
# Import the bibtex file and convert to data.frame
mypubs <- ReadBib(bibfile, check = "warn", .Encoding = "UTF-8") %>%
as.data.frame()
# assign "categories" to the different types of publications
mypubs <- mypubs %>%
dplyr::mutate(
pubtype = dplyr::case_when(document_type == "Article" ~ "2",
document_type == "Article in Press" ~ "2",
document_type == "InProceedings" ~ "1",
document_type == "Proceedings" ~ "1",
document_type == "Conference" ~ "1",
document_type == "Conference Paper" ~ "1",
document_type == "MastersThesis" ~ "3",
document_type == "PhdThesis" ~ "3",
document_type == "Manual" ~ "4",
document_type == "TechReport" ~ "4",
document_type == "Book" ~ "5",
document_type == "InCollection" ~ "6",
document_type == "InBook" ~ "6",
document_type == "Misc" ~ "0",
TRUE ~ "0"))
# create a function which populates the md template based on the info
# about a publication
create_md <- function(x) {
# define a date and create filename by appending date and start of title
if (!is.na(x[["year"]])) {
x[["date"]] <- paste0(x[["year"]], "-01-01")
} else {
x[["date"]] <- "2999-01-01"
}
filename <- paste(x[["date"]], x[["title"]] %>%
str_replace_all(fixed(" "), "_") %>%
str_remove_all(fixed(":")) %>%
str_sub(1, 20) %>%
paste0(".md"), sep = "_")
# start writing
if (!file.exists(file.path(outfold, filename)) | overwrite) {
fileConn <- file.path(outfold, filename)
write("+++", fileConn)
# Title and date
write(paste0("title = \"", x[["title"]], "\""), fileConn, append = T)
write(paste0("date = \"", anydate(x[["date"]]), "\""), fileConn, append = T)
# Authors. Comma separated list, e.g. `["Bob Smith", "David Jones"]`.
auth_hugo <- str_replace_all(x["author"], " and ", "\", \"")
auth_hugo <- stringi::stri_trans_general(auth_hugo, "latin-ascii")
write(paste0("authors = [\"", auth_hugo,"\"]"), fileConn, append = T)
# Publication type. Legend:
# 0 = Uncategorized, 1 = Conference paper, 2 = Journal article
# 3 = Manuscript, 4 = Report, 5 = Book, 6 = Book section
write(paste0("publication_types = [\"", x[["pubtype"]],"\"]"),
fileConn, append = T)
# Publication details: journal, volume, issue, page numbers and doi link
publication <- x[["journal"]]
if (!is.na(x[["volume"]])) publication <- paste0(publication,
", (", x[["volume"]], ")")
if (!is.na(x[["number"]])) publication <- paste0(publication,
", ", x[["number"]])
if (!is.na(x[["pages"]])) publication <- paste0(publication,
", _pp. ", x[["pages"]], "_")
if (!is.na(x[["doi"]])) publication <- paste0(publication,
", ", paste0("https://doi.org/",
x[["doi"]]))
write(paste0("publication = \"", publication,"\""), fileConn, append = T)
write(paste0("publication_short = \"", publication,"\""),fileConn, append = T)
# Abstract and optional shortened version.
if (abstract) {
write(paste0("abstract = \"", x[["abstract"]],"\""), fileConn, append = T)
} else {
write("abstract = \"\"", fileConn, append = T)
}
write(paste0("abstract_short = \"","\""), fileConn, append = T)
# other possible fields are kept empty. They can be customized later by
# editing the created md
write("image_preview = \"\"", fileConn, append = T)
write("selected = false", fileConn, append = T)
write("projects = []", fileConn, append = T)
write("tags = []", fileConn, append = T)
#links
write("url_pdf = \"\"", fileConn, append = T)
write("url_preprint = \"\"", fileConn, append = T)
write("url_code = \"\"", fileConn, append = T)
write("url_dataset = \"\"", fileConn, append = T)
write("url_project = \"\"", fileConn, append = T)
write("url_slides = \"\"", fileConn, append = T)
write("url_video = \"\"", fileConn, append = T)
write("url_poster = \"\"", fileConn, append = T)
write("url_source = \"\"", fileConn, append = T)
#other stuff
write("math = true", fileConn, append = T)
write("highlight = true", fileConn, append = T)
# Featured image
write("[header]", fileConn, append = T)
write("image = \"\"", fileConn, append = T)
write("caption = \"\"", fileConn, append = T)
write("+++", fileConn, append = T)
}
}
# apply the "create_md" function over the publications list to generate
# the different "md" files.
apply(mypubs, FUN = function(x) create_md(x), MARGIN = 1)
}
@dcava
Copy link

dcava commented Apr 25, 2018

Hi thanks for this.

On my Mac, running against a google scholar bibtex export, ReadBib produces a column called "bibtype" not "document_type".

Not sure if that's universal, but had to change the var name in the mutate section to get it working.

@petzi53
Copy link

petzi53 commented Jul 26, 2018

Thank you for this script!

I had the same problem on Max OS X as @dcava. RefmanageR produces a column "bibtype". Changing "document_type" to "bibtype" worked BUT only with type "journal". It seems to me that you have focussed only on this publication type with the line publication <- x[["journal"]]. I looked at your website and understand that it worked for you, as you just transferred journal articles.

@seichter
Copy link

seichter commented Oct 1, 2018

I'm getting an error Error: 'case_when' is not an exported object from 'namespace:dplyr'

What am I missing?

@seichter
Copy link

seichter commented Oct 2, 2018

I'm getting an error Error: 'case_when' is not an exported object from 'namespace:dplyr'

What am I missing?

To answer this for others finding this gist - RStudio on the Mac was including an outdated version of dplyr. Using install.packages("dplyr",type="source") solved the problem.

@pppichler
Copy link

Hi Lorenzo, thanks for the great script!

I have made some minor modifications to make the script work for me:

  • Make it work regardless whether the document type is called bibtype or document_type.
  • Added some opportunistic string cleaning code (mostly removing {} and escapes)
  • Make it add publictions in a dedicated folder instead of a file
  • Export a cite.bib file into each folder to make the entries directly citeable on the website.

I have posted my modified version here:
https://pppichler.github.io/pepesblog/post/set-this-up/setting-up-this-site/

paul

@bshor
Copy link

bshor commented Jul 1, 2020

I had the same issue with needing to change document_type to bibtype (which is what it was called in my bib file I used with JabRef). I had an additional issue that overwrite was not set in the inner function. I set it to create_md <- function(x, overwrite=T).

This should probably be changed on Lorenzo's blog post as that is most likely to be seen by people.

@pppichler Your version is down.

@pppichler
Copy link

sorry @bshor, I moved my site. in case that's still useful, the file is now here.
http://www.pik-potsdam.de/~pichler/blog/post/set-this-up/setting-up-this-site/

@bshor
Copy link

bshor commented Jul 2, 2020

That's really helpful, thanks!

@ylelkes
Copy link

ylelkes commented Sep 1, 2020

sorry @bshor, I moved my site. in case that's still useful, the file is now here.
http://www.pik-potsdam.de/~pichler/blog/post/set-this-up/setting-up-this-site/

@pppichler, my url_pdf always renders as NA. Would you mind posting your bibtex file so i can see what I'm doing wrong?
thanks!

@pppichler
Copy link

@pppichler, my url_pdf always renders as NA. Would you mind posting your bibtex file so i can see what I'm doing wrong?
thanks!

hi, does it only render as NA or is the "url_pdf" property empty in the generated index.md file empty? The url_pdf is taken from the "url" property in the bib file. I have just noticed that things can go wrong with better bibtex exports so you might want to try standard bibtex export if you are using zotero or another reference manager. my entries look like this, for example:

@Article{lenzenEnvironmentalFootprintHealth2020,
title = {The environmental footprint of health care: a global assessment},
issn = {2542-5196},
url = {https://www.thelancet.com/journals/lanplh/article/PIIS2542-5196(20)30121-2/abstract},
doi = {10.1016/S2542-5196(20)30121-2},
language = {English},
urldate = {2020-08-06},
journal = {The Lancet Planetary Health},
author = {Manfred Lenzen and Arunima Malik and Mengyu Li and Jacob Fry and Helga Weisz and Peter-Paul Pichler and Leonardo Suveges Moreira Chaves and Anthony Capon and David Pencheon},
year = {2020},
volume = {4},
number = {7},
month = {jul},
pages = {e271--e279},
pmid = {32681898},
shorttitle = {The environmental footprint of health care},
pubtype = {2},
date = {2020-01-01},
}

hope this helps
paul

@ylelkes
Copy link

ylelkes commented Sep 2, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment