Skip to content

Instantly share code, notes, and snippets.

@geotheory
Last active September 18, 2020 12:57
Show Gist options
  • Save geotheory/21ff3d027dd230700aa9e581f4761bf6 to your computer and use it in GitHub Desktop.
Save geotheory/21ff3d027dd230700aa9e581f4761bf6 to your computer and use it in GitHub Desktop.
Function for rendering Spacy-processed text in RStudio to visualise its parts-of-speech classifications
require(tidyverse)
# Other required packages: xml2, rvest, shiny
render_pos = function(s, cols = NULL){
if(!exists('.lookup_tbls')){
message('initialising tool')
.lookup_tbls <<- xml2::read_html('https://spacy.io/api/annotation') %>% rvest::html_table()
.lookup_pos <<- as_tibble(.lookup_tbls[[1]]) %>% janitor::clean_names() %>% select(pos, pos_desc = description)
.lookup_dep_rel <<- as_tibble(.lookup_tbls[[5]]) %>% janitor::clean_names() %>% select(dep_rel = label, dep_rel_desc = description)
.tempDir <<- tempfile()
dir.create(.tempDir)
.htmlFile <<- file.path(.tempDir, "index.html")
.viewer <<- getOption("viewer")
}
if(is.null(cols)) cols = sample(rainbow(15, v = .8))
pos_types = s %>% count(pos, sort=T) %>% .[['pos']]
s = left_join(s, .lookup_pos, by = 'pos') %>% left_join(.lookup_dep_rel, by = 'dep_rel')
sink(.htmlFile)
cat('<style>
* { line-height: 2; font-family: "courier"; font-size: 18px;}
span:before {
font-size: 10px;
color: grey;
content: attr(data-pos);
position: absolute;
transform: translate(0, -8px);
}
</style>')
s %>%
rowwise() %>%
mutate(html = as.character(shiny::span(token, style=glue::glue('color:{cols[match(pos, pos_types)]}'), `data-pos` = dep_rel,
title = paste('pos:', pos_desc, '\ndep_rel:', dep_rel_desc, '\nnounphrase:', nounphrase, '\nlemma:', lemma)))) %>%
group_by(doc_id) %>%
summarise(html = paste(html, collapse = ' ') %>% paste('<p>',.,'</p>'), .groups='drop') %>%
.[['html']] %>% paste(collapse = '\n') %>% cat()
sink()
.viewer(.htmlFile)
}
@geotheory
Copy link
Author

geotheory commented Sep 8, 2020

Example usage..

require(spacyr)
s = spacy_parse(sample(stringr::sentences, 700), dependency=T, nounphrase=T) %>% as_tibble()
render_pos(s)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment