Skip to content

Instantly share code, notes, and snippets.

@CesarChaMal
Forked from mike-neck/encodings.groovy
Created December 16, 2019 02:45
Show Gist options
  • Save CesarChaMal/99afd642a890f955ed89c210766397bd to your computer and use it in GitHub Desktop.
Save CesarChaMal/99afd642a890f955ed89c210766397bd to your computer and use it in GitHub Desktop.
import java.nio.charset.StandardCharsets
import java.nio.charset.Charset
def string = 'おっぱい - うほ'
def bytes = string.getBytes(StandardCharsets.UTF_8)
def toString = {byte[] bs, Charset cs -> new String(bs, cs) }
def toBytes = {String s, Charset cs -> s.getBytes(cs) }
def charsets = [StandardCharsets.UTF_8, StandardCharsets.UTF_16, StandardCharsets.ISO_8859_1, Charset.forName('Shift-JIS'), Charset.forName('EUC-JP'), Charset.forName('ISO-2022-JP')]
def iso8859_1 = [StandardCharsets.ISO_8859_1]
charsets.collect {c1 ->
iso8859_1.collect {c2 ->
def result = c1 == c2 ? '--' : toString(toBytes(string, c1), c2)
[from : c1.displayName(), to: c2.displayName(), result: result]
}.collect { "${it.from} -> ${it.to} : ${it.result}" }.join(' | ')
}.each { println it }
UTF-8 -> ISO-8859-1 : おっぱい - うほ
UTF-16 -> ISO-8859-1 : þÿ0J0c0q0D - 0F0{
ISO-8859-1 -> ISO-8859-1 : --
Shift_JIS -> ISO-8859-1 : ‚¨‚Á‚Ï‚¢ - ‚¤‚Ù
EUC-JP -> ISO-8859-1 : ¤ª¤Ã¤Ñ¤¤ - ¤¦¤Û
ISO-2022-JP -> ISO-8859-1 : $B$*$C$Q$$(B - $B$&$[(B
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment