Skip to content

Instantly share code, notes, and snippets.

@watermint
Created April 27, 2013 01:40
Show Gist options
  • Save watermint/5471541 to your computer and use it in GitHub Desktop.
Save watermint/5471541 to your computer and use it in GitHub Desktop.
文字列をサロゲートペアを考慮した1文字ずつに分割する
object SurrogatePair {
case class Pair(x: Char, y: Option[Char] = None) {
lazy val surrogatePair = y match {
case None => false
case Some(yy) => Character.isSurrogatePair(x, yy)
}
lazy val text = surrogatePair match {
case true => x.toString + y.get
case false => x.toString
}
}
def split(target: String): Seq[String] = {
target.length match {
case 0 => Seq()
case 1 => Seq(target)
case _ => {
val pairs = {
(0 until target.length - 1) map {
i =>
Pair(target(i), Some(target(i + 1)))
}
} :+ Pair(target(target.length - 1))
(0 until pairs.length) flatMap {
i =>
if (i > 0 && pairs(i - 1).surrogatePair) {
None
} else {
Some(pairs(i).text)
}
}
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment