Skip to content

Instantly share code, notes, and snippets.

@mrdwab
Last active November 23, 2015 12:31
Show Gist options
  • Select an option

  • Save mrdwab/cb8e7014f9c10e4b0c16 to your computer and use it in GitHub Desktop.

Select an option

Save mrdwab/cb8e7014f9c10e4b0c16 to your computer and use it in GitHub Desktop.
Extracts a certain number of words from a text string.
wordExtract <- function(instring, number, start = TRUE, after = NULL) {
len <- length(gregexpr("\\S+", instring)[[1]])
mlen <- if (is.null(after)) number else number + after
if (len <= mlen) stop("can't do what you've asked for....")
if (!is.null(after) & !isTRUE(start)) {
start <- TRUE
message("start specified as FALSE but ignored")
}
if (!is.null(after)) {
pat <- "^(?:\\S+\\s+){%d}((?:\\S+\\s+){%d}\\S+).*"
pat <- sprintf(pat, after, number-1)
} else {
pat <- if (isTRUE(start)) "^((?:\\S+\\s+){%d}\\S+).*"
else "^.*\\s+((?:\\S+\\s+){%d}\\S+)$"
pat <- sprintf(pat, number-1)
}
sub(pat, "\\1", instring, perl = TRUE)
}
@mrdwab
Copy link
Copy Markdown
Author

mrdwab commented Nov 23, 2015

Example:

x <- "this is a string that I want to extract words from"
wordExtract(x, 3)
# [1] "this is a"
wordExtract(x, 3, FALSE)
# [1] "extract words from"
wordExtract(x, 3, after = 5)
# [1] "I want to"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment