Skip to content

Instantly share code, notes, and snippets.

@masasakano
Last active November 14, 2019 18:08
Show Gist options
  • Select an option

  • Save masasakano/10703ccb621d08c51f7e19e8900ea0ae to your computer and use it in GitHub Desktop.

Select an option

Save masasakano/10703ccb621d08c51f7e19e8900ea0ae to your computer and use it in GitHub Desktop.
Ruby sample code to highlight the difference among Japanese-character conversion methods
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
# Sample code for my answer:
# https://stackoverflow.com/questions/58829740/how-do-i-filter-out-invisible-characters-without-affecting-japanese-character-se/58863231#58863231
require 'nkf'
begin
require 'iconv' # https://rubygems.org/gems/iconv/
rescue LoadError
warn "WARNING: Gem 'iconv' is not installed, hence is ignored"
end
orig = "b2◇〒α()あ相〜\u3000_8D━●★】$£€Ç♡㌔③\u2028ハンカク"
print "Orig:"; p orig
print "Enc: "; p orig.encode('ISO-2022-JP', undef: :replace, replace: '').encode('UTF-8')
print "NKF: "; p NKF.nkf('-w -E', NKF.nkf('-e', orig))
output = ''
Iconv.open('iso-2022-jp-2', 'utf-8') do |cd|
cd.discard_ilseq=true
# Note: cd.transliterate='?' raises Iconv::IllegalSequence
output = cd.iconv orig << cd.iconv(nil)
## The original way described in the iconv Gem website
# orig.each_char { |s| output << cd.iconv(s) }
# output << cd.iconv(nil)
end
s2 = Iconv.conv('utf-8', 'iso-2022-jp-2', output)
print "Icon:"; p s2
## Results (as of Ruby 2.6.5):
# Orig:"b2◇〒α()あ相〜 _8D━●★】$£€Ç♡㌔③\u2028ハンカク"
# Enc: "b2◇〒α()あ相〜 _8D━●★】$£"
# NKF: "b2◇〒α()あ相〜 _8D━●★】$£㌔③ハンカク"
# Icon:"b2◇〒α()あ相〜 _8D━●★】$£€Ç♡③ハンカク"
### For developers:
# p output # Sanity check.
# p s2.encoding # Sanity check.
# p s2.valid_encoding? # Sanity check.
# p Iconv.ctlmethods # List of the available instance methods of iconv Gem
## Some say this would work, but it seemed not as of Ruby 2.6.5.
# p Iconv.iconv('iso-2022-jp-2', 'utf-8//IGNORE', orig)
# p Iconv.conv('iso-2022-jp-2', 'utf-8//TRANSLIT', orig)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment