Skip to content

Instantly share code, notes, and snippets.

@emhart
Created August 26, 2014 21:14
Show Gist options
  • Save emhart/0c374e66860d02a67b4f to your computer and use it in GitHub Desktop.
Save emhart/0c374e66860d02a67b4f to your computer and use it in GitHub Desktop.
sapply gone awry
# sapply gone terribly terribly wrong, or used perfectly, I can't tell.
# Use:
# input: a vector of words, and I want to know are any of the words in input are in the vector out
# out: a vector of words that I want to check against, a master list of words
# I then want to return an index of TRUE/FALSE to determine if a word is in in the master list
input <- c("Lorem","ipsum","testing")
out <- c("Loremipsum","qwerty","keyboard","dvorak")
dindex <- sapply(input, function(x,y){ind <- grep(x,y);ifelse(length(ind) > 0, for(i in ind){ifelse(x == y[i],return(TRUE),return(FALSE))},return(FALSE))},y=out)
@fmichonneau
Copy link

I had a similar kind of problem I posted about on SO and some nice folks came up with a crazy (and very efficient solution)... it might be helpful: http://stackoverflow.com/questions/25130462/get-disjoint-sets-from-a-list-in-r

@richfitz
Copy link

The description sounds like you want: "word appears anywhere in the output" (partially or not)"

sapply(input, function(x) any(grepl(x, out)))

Or exactly found in the output (matching the code)

sapply(input, `%in%`, out)

In anycase, both are shorter and use sapply

@pitakakariki
Copy link

If you don't need a named result:

input %in% out

@noamross
Copy link

Here's another crack for partial matching. The stringi library has some very fast string search implementations. Use stri_detect_regex if you're doing regex matches.

stringi::stri_detect_fixed(paste(out, collapse=" "), input)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment