Last active
July 10, 2018 14:50
-
-
Save bcoe/6505434 to your computer and use it in GitHub Desktop.
Sanitize a search query for Lucene. Useful if the original query raises an exception, due to bad adherence to DSL. Taken from a discussion on Stack Overflow: http://stackoverflow.com/questions/16205341/symbols-in-query-string-for-elasticsearch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module ElasticSearchHelpers | |
# sanitize a search query for Lucene. Useful if the original | |
# query raises an exception, due to bad adherence to DSL. | |
# Taken from here: | |
# | |
# http://stackoverflow.com/questions/16205341/symbols-in-query-string-for-elasticsearch | |
# | |
def self.sanitize_string(str) | |
# Escape special characters | |
# http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters | |
escaped_characters = Regexp.escape('\\+-&|!(){}[]^~*?:\/') | |
str = str.gsub(/([#{escaped_characters}])/, '\\\\\1') | |
# AND, OR and NOT are used by lucene as logical operators. We need | |
# to escape them | |
['AND', 'OR', 'NOT'].each do |word| | |
escaped_word = word.split('').map {|char| "\\#{char}" }.join('') | |
str = str.gsub(/\s*\b(#{word.upcase})\b\s*/, " #{escaped_word} ") | |
end | |
# Escape odd quotes | |
quote_count = str.count '"' | |
str = str.gsub(/(.*)"(.*)/, '\1\"\3') if quote_count % 2 == 1 | |
str | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment