Created
December 18, 2015 15:31
-
-
Save wch/3a629cfe575846a14207 to your computer and use it in GitHub Desktop.
Multibyte non-UTF-8 locales
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ==== Creating UTF-8 strings ==== | |
# This is how to create a string with UTF-8 encoding. This should work | |
# regardless of the current locale settings. | |
x <- rawToChar(as.raw(c(0xe5, 0x8d, 0x88))) | |
Encoding(x) <- "UTF-8" | |
x | |
# [1] "午" | |
# Another string, 'Δ★😎' | |
pat <- rawToChar(as.raw(c(0xce, 0x94, 0xe2, 0x98, 0x85, 0xf0, 0x9f, 0x98, 0x8e))) | |
Encoding(pat) <- "UTF-8" | |
cat(pat) | |
# Δ★😎 | |
# ======================= | |
# Setting locale | |
# ======================= | |
# By default, Mac and Linux use UTF-8 encodings, but sometimes it's useful to | |
# use a multibyte, non-UTF-8 locale for testing. | |
# ==== Mac ==== | |
# On a Mac, you can use a UTF-8 locale like so. (This is default setting for US | |
# English). | |
Sys.setlocale("LC_ALL", "en_US.UTF-8") | |
# To use a multibyte non-UTF-8 locale: | |
Sys.setlocale("LC_ALL", "ja_JP.SJIS") | |
# ==== Ubuntu ==== | |
# On Ubuntu, you need to enable a multibyte non-UTF-8 locale, like ja_JP.EUC-JP. | |
# As root, put the following in a new file /var/lib/locales/supported.d/ja | |
ja_JP.UTF-8 UTF-8 | |
ja_JP.EUC-JP EUC-JP | |
# After adding that file, run: | |
sudo dpkg-reconfigure locales | |
# Restart R if it's already running. Then you can run the same R code as above, | |
# but with the ja_JP.EUC-JP locale: | |
Sys.setlocale("LC_ALL", "ja_JP.EUC-JP") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment