Skip to content

Instantly share code, notes, and snippets.

@brycesch
Created September 6, 2017 16:30
Show Gist options
  • Save brycesch/5d7cdbe70bddc3ebca0fe3ea940cf26c to your computer and use it in GitHub Desktop.
Save brycesch/5d7cdbe70bddc3ebca0fe3ea940cf26c to your computer and use it in GitHub Desktop.
String sanitizer for MySQL's default utf8 encoding - You can avoid needing to use this by switching to utf8mb4
class MysqlStringSanitizer
def self.sanitize(str)
str = str.force_encoding('utf-8').encode
clean_text = ""
# emoticons 1F601 - 1F64F
regex = /[\u{1f600}-\u{1f64f}]/
clean_text = str.gsub regex, ''
#dingbats 2702 - 27B0
regex = /[\u{2702}-\u{27b0}]/
clean_text = clean_text.gsub regex, ''
# transport/map symbols
regex = /[\u{1f680}-\u{1f6ff}]/
clean_text = clean_text.gsub regex, ''
# enclosed chars 24C2 - 4DFF
regex = /[\u{24C2}-\u{4DFF}]/
clean_text = clean_text.gsub regex, ''
# more enclosed chars 24C2 - 1F251
regex = /[\u{A000}-\u{1F251}]/
clean_text = clean_text.gsub regex, ''
# symbols & pics
regex = /[\u{1f300}-\u{1f5ff}]/
clean_text = clean_text.gsub regex, ''
# supplemental symbols & pictographs
regex = /[\u{1f900}-\u{1f9ff}]/
clean_text = clean_text.gsub regex, ''
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment