Last active
December 6, 2018 10:36
-
-
Save nozma/dbc0f1ae0951b075722b3fb2bf78c06e to your computer and use it in GitHub Desktop.
Rのソートとlocaleについて ref: https://qiita.com/nozma/items/4aea36022ce18a6aa5ca
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## データの準備 | |
d <- data.frame( | |
kana = c("お", "エ", "う", "イ", "あ"), | |
letter = c("A", "e", "C", "b", "Z"), | |
stringsAsFactors = FALSE | |
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## 空気を読んだソートをしてくれる | |
sort(d$kana) | |
#> [1] "あ" "イ" "う" "エ" "お" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## colCaseFirstの指定で大文字を前にできる | |
stringi::stri_sort(str, locale = "en@colCaseFirst=upper") | |
#> [1] "A" "A" "a" "a" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## 通常のソートは辞書順 | |
str2 <- c("A1", "A12", "A112") | |
stringi::stri_sort(str2, locale = "en") | |
#> [1] "A1" "A112" "A12" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## 数字の大きさを考慮したソート(自然順ソート)ができる | |
stringi::stri_sort(str2, locale = "en@colNumeric=yes") | |
#> [1] "A1" "A12" "A112" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sys.setlocale(locale = "C") | |
sort(d$kana) | |
#> [1] "\343\201\202" "\343\201\206" "\343\201\212" "\343\202\244" | |
#> [5] "\343\202\250" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sorted <- sort(d$kana) | |
Sys.setlocale(locale = "ja_JP") | |
sorted | |
#> [1] "あ" "う" "お" "イ" "エ" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
withr::with_locale(c("LC_COLLATE" = "C"), sort(d$kana)) | |
#> [1] "あ" "う" "お" "イ" "エ" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
withr::with_collate("C", sort(d$kana)) | |
#> [1] "あ" "う" "お" "イ" "エ" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stringi::stri_sort(d$kana, locale = "C") | |
#> [1] "あ" "イ" "う" "エ" "お" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## 通常の空気読んだソート | |
stringi::stri_sort(d$letter, locale = "en_EN") | |
#> [1] "A" "b" "C" "e" "Z" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## locale = "C"っぽさがあるソート | |
stringi::stri_sort(d$letter, locale = "C") | |
#> [1] "A" "C" "Z" "b" "e" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## locale="en"のデフォルトのソートは小文字が前に出る | |
str <- c("a", "A", "a", "A") | |
stringi::stri_sort(str, locale = "en") | |
#> [1] "a" "a" "A" "A" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment