Skip to content

Instantly share code, notes, and snippets.

@kreeger
Last active September 5, 2021 08:18
Show Gist options
  • Save kreeger/4480326 to your computer and use it in GitHub Desktop.
Save kreeger/4480326 to your computer and use it in GitHub Desktop.
Converts a number into a UTF16 surrogate pair.
#!/usr/bin/env ruby
require 'fileutils'
require 'debugger'
class Fixnum
def to_surrogate_pair
if self >= 0x10000 && self <= 0x10FFFF
high = ((self - 0x10000) / 0x400).floor + 0xD800
low = ((self - 0x10000) % 0x400) + 0xDC00
end
'\U' + [high, low].map { |x| x.to_s(16) }.join('\U').upcase
end
end
class String
def to_hex
self.gsub('\U000', '0x').to_i(16)
end
end
file = File.open(ARGV[0], 'rb') { |f| f.read }
begin
encoded = file.encode('UTF-8')
rescue Encoding::UndefinedConversionError => e
file = File.open(ARGV[0], 'rb:UTF-16LE') { |f| f.read }
encoded = file.encode('UTF-8')
end
regex = %r{(\\U\w{8})}i
fixed = 0
revised = encoded.gsub(regex) do |mbc|
fixed += 1
mbc.to_hex.to_surrogate_pair
end
FileUtils.mv(ARGV[0], "#{ARGV[0]}.old")
File.open(ARGV[0], 'w:UTF-16LE') { |f| f.write revised }
puts "File written with #{fixed} corrections."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment