Skip to content

Instantly share code, notes, and snippets.

@agibralter
Created January 19, 2011 23:43
Show Gist options
  • Save agibralter/787125 to your computer and use it in GitHub Desktop.
Save agibralter/787125 to your computer and use it in GitHub Desktop.
Using oniguruma for splitting strings...
# ¿á you think á eh?
# "스마트폰을 가지고 있나? "
#
# reg = ORegexp.new( 'р(уби.*)', 'i', 'utf8' )
# matches = reg.match("Text: Ехал Грека Через Реку")
#
# 5ms) SHOW TABLES
# Sphinx Querying: '스마트폰을 | 가지고 | 있나'
# Sphinx (0.001932s) Found results
# Sphinx Sphinx Daemon returned error: index ...: syntax error, unexpected '|' near '스마트폰을 | 가지고 | 있나'
#
# ThinkingSphinx::SphinxError (
require 'rubygems'
require 'oniguruma'
# query.gsub(/\W+/, ' ').split(/\s+/).select { |s| s.split('').length > 1 }.join(' | ')
a = "스마트폰을 가지고 있나? "
puts Oniguruma::ORegexp.new('\w+', :encoding => Oniguruma::ENCODING_UTF8).scan(a).map { |m| m[0].unpack("U*").map { |e| "U+%04x" % e }.join }.join(' | ')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment