Last active
August 29, 2015 14:16
-
-
Save aammd/d394cd38d7296722c425 to your computer and use it in GitHub Desktop.
An R function I wrote in the process of translating a list kept in a .docx file to a proper R dataframe. After moving it from .docx to .txt via pandoc, I needed to turn section headers into levels of a grouping factor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' convert positional information to two columns | |
#' | |
#' Sometimes text is organized by position. This function | |
#' turns positional group labels (e.g headers ) into the levels of a grouping variable | |
#' @param x character vector containing group labels followed by group members | |
#' @param pattern regular expression that identifies the group labels | |
fill_down <- function(x, pattern){ | |
## find matches of the pattern | |
x <- as.character(x) | |
value_matches <- grepl(pattern = pattern, x = x) | |
## get their positions | |
loc <- which(value_matches) | |
## the first of these should be labelled | |
stopifnot(min(loc) == 1) | |
start <- loc | |
end <- c(loc[-1], length(x) + 1) | |
# measure the intervals between labels | |
intervals <- end - start - 1 | |
rps <- Map(f = rep_len, x[loc], intervals) | |
## combine replicated values in a single vector" | |
grps <- do.call(c, rps) | |
## get the values between labels | |
xvals <- x[!value_matches] | |
stopifnot(length(grps) == length(xvals)) | |
dplyr::data_frame(grps, xvals) | |
} | |
# imagine a list of letters divided into two sections: "A" and "B": | |
test <- c("A", "b", "c", "e", "B", "g", "h", "i") | |
fill_down(test, "A|B") | |
# | |
# grps xvals | |
# 1 A b | |
# 2 A c | |
# 3 A e | |
# 4 B g | |
# 5 B h | |
# 6 B i |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment