Skip to content

Instantly share code, notes, and snippets.

@zh4ngx
Created September 26, 2013 08:39
Show Gist options
  • Save zh4ngx/6711476 to your computer and use it in GitHub Desktop.
Save zh4ngx/6711476 to your computer and use it in GitHub Desktop.
Given string and punctuation, tokenize string on punctuation and whitespace. Done using both split and scan.
def tokenize_query_split query, punctuation
punc_regex = /[#{punctuation}\s]+/
tokens = query.split(punc_regex)
tokens.each do |token|
p token
end
end
def tokenize_query_scan query, punctuation
token_regex = /[^#{punctuation}\s]+/
tokens = query.scan(token_regex)
tokens.each do |token|
p token
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment