Skip to content

Instantly share code, notes, and snippets.

@onesuper
Created December 7, 2015 06:19
Show Gist options
  • Save onesuper/f9c7d4e7540bc7112547 to your computer and use it in GitHub Desktop.
Save onesuper/f9c7d4e7540bc7112547 to your computer and use it in GitHub Desktop.
add or del blanks between Chinese and English words.
#!/usr/bin/env ruby
# Usage: deal_blanks.rb input.txt >out.txt
def isEn(char)
/\w/.match(char) != nil
end
File.open(ARGV[0], "r") do |file|
blanks_del = 0
blanks_add = 0
file.each do |line|
buf = Array.new
arr = line.chars.to_a
skips = ['(', ')', '。', ',', ';', '?', '(', ')']
for i in 1...arr.size
prev, curr = arr[i-1], arr[i]
if prev != ' ' then
buf << prev
else
$stderr.puts '+ ' + prev + '/' + curr
blanks_del += 1
next
end
if skips.include?(prev) or skips.include?(curr) then
next
end
if not isEn(prev) and not isEn(curr) then
next
end
if isEn(prev) and isEn(curr) then
next
end
$stderr.puts '- ' + prev + '/' + curr
blanks_add += 1
buf << ' '
end
$stdout.puts buf.join('')
end
$stderr.puts "blanks add: " + blanks_add.to_s
$stderr.puts "blanks del: " + blanks_del.to_s
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment