Skip to content

Instantly share code, notes, and snippets.

@Btibert3
Last active August 5, 2018 22:15
Show Gist options
  • Select an option

  • Save Btibert3/a29061abb26fb7113a379b4bbd8067eb to your computer and use it in GitHub Desktop.

Select an option

Save Btibert3/a29061abb26fb7113a379b4bbd8067eb to your computer and use it in GitHub Desktop.
This R function builds a rasa NLU training data file

About

This R script provides a helper function to take a dataframe, and build a proper rasa NLU training file in JSON format. This would be helpful for when you want to:

  • export records from a database
  • import the training data file into the webapp found here: https://rasahq.github.io/rasa-nlu-trainer/
  • tag your data with the tool and export
  • re-import the JSON into R and associate with the database ID

Why

Export data in an order that we can match back to our original file. This is not shown, but the order is preserved so it's just a matter of aligning the data once it is tagged from the webapp and read in from the JSON file.

Example

Build a dataset and write a JSON data file that can imported into https://rasahq.github.io/rasa-nlu-trainer/

dat = data.frame(id = 1:3, text = c("test 1", "i like turtles", "where are you"))
write_rasa_nlu(dat)

From here, tag your data and read back into R and align the records 1:1. Typically, you would save out the raw file, and correlate the tagged data from above with each row of that raw file.

## bring in the data trained from the webapp
x = fromJSON("~/Downloads/2018-07-train.json", flatten = TRUE)
y = x$rasa_nlu_data$common_examples
glimpse(y)

Which assumes above that you exported the JSON file from rasa-nlu-trainer to your default downloads directory on your local machine.

and merge

z = y %>% rename(text2 = text)
msgs = cbind(dat, z)
glimpse(msgs)
#' Generate rasa NLU training data file
#'
#' Take a dataframe and export a json dataset in rasa NLUs format. The dataframe must contain a column
#' called text, which should represent the questions that you want to tag with intents and entities in the RASA NLU
#' training tool https://rasahq.github.io/rasa-nlu-trainer/.
#' https://nlu.rasa.com/dataformat.html
#' @param dat The dataframe holding the text data for questions
#' @param path character the path/name of the file to be exported. Default is in current directory with a name of train.json
#' @export
#' write_rasa_nlu
write_rasa_nlu = function(dat, path="train.json") {
## ensure that the text column is present
stopifnot("text" %in% colnames(dat))
## ensure there is at least one row
stopifnot(nrow(dat) > 0)
## keep the column
dat2 = dplyr::select(dat, text)
## for each row, build the entry
## TODO: vectorize and remove for-loop
rasa = list()
for (i in 1:nrow(dat2)) {
rasa[[i]] = list(text=dat2$text[i], intent="", entities=list())
}
## finish the file
rasa_json = list(rasa_nlu_data=list(common_examples=rasa))
## write the file
jsonlite::write_json(rasa_json, path, auto_unbox=TRUE)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment