Skip to content

Instantly share code, notes, and snippets.

@benigumocom
Last active May 28, 2023 15:14
Show Gist options
  • Save benigumocom/8fd25b1ada3518fbd05ad2ec5091011a to your computer and use it in GitHub Desktop.
Save benigumocom/8fd25b1ada3518fbd05ad2ec5091011a to your computer and use it in GitHub Desktop.
fun dump(data: String) {
println(data)
val ss = Regex("\\X").findAll(data)
.map { match -> match.value }
.toList()
println(ss)
println(ss.size)
println()
ss.forEach { s ->
println("$s ${s.toUtf16EscapeSequence()}")
s.codePoints().forEach { cp ->
val hcp = "0x%X".format(cp)
val c = Character.toChars(cp).joinToString("")
println(" ${cp.toUtf16EscapeSequence()} $hcp ($c)")
}
}
}
fun String.toUtf16EscapeSequence(): String {
// String.chars() returns IntStream under 0x10000 Int only,
// no need to consider utf-16 surrogate pair
return this.chars()
.asSequence()
.joinToString("") { i -> "\\u%04X".format(i) }
}
fun Int.toUtf16EscapeSequence(): String {
val cp = this // code point
return Character.toChars(cp).joinToString("")
.toUtf16EscapeSequence()
}
@benigumocom
Copy link
Author

【Kotlin】絵文字を含む Unicode 文字列の文字数をカウントする方法と文字ごとの構成
👉 https://android.benigumo.com/20230529/kotlin-unicode/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment