Last active
March 14, 2021 19:50
-
-
Save jca02266/dd86a5da89539c6b492aea0555047e37 to your computer and use it in GitHub Desktop.
Dump Windows-31J extended chars
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
< NEC特殊漢字 > | |
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 | |
0x8740 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯ | |
0x8750 ⑰ ⑱ ⑲ ⑳ Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ ㍉ | |
0x8760 ㌔ ㌢ ㍍ ㌘ ㌧ ㌃ ㌶ ㍑ ㍗ ㌍ ㌦ ㌣ ㌫ ㍊ ㌻ ㎜ | |
0x8770 ㎝ ㎞ ㎎ ㎏ ㏄ ㎡ ㍻ | |
0x8780 〝 〟 № ㏍ ℡ ㊤ ㊥ ㊦ ㊧ ㊨ ㈱ ㈲ ㈹ ㍾ ㍽ ㍼ | |
0x8790 ≒ ≡ ∫ ∮ ∑ √ ⊥ ∠ ∟ ⊿ ∵ ∩ ∪ | |
< NEC選定IBM拡張文字 > | |
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 | |
0xed40 纊 褜 鍈 銈 蓜 俉 炻 昱 棈 鋹 曻 彅 丨 仡 仼 伀 | |
0xed50 伃 伹 佖 侒 侊 侚 侔 俍 偀 倢 俿 倞 偆 偰 偂 傔 | |
0xed60 僴 僘 兊 兤 冝 冾 凬 刕 劜 劦 勀 勛 匀 匇 匤 卲 | |
0xed70 厓 厲 叝 﨎 咜 咊 咩 哿 喆 坙 坥 垬 埈 埇 﨏 | |
0xed80 塚 增 墲 夋 奓 奛 奝 奣 妤 妺 孖 寀 甯 寘 寬 尞 | |
0xed90 岦 岺 峵 崧 嵓 﨑 嵂 嵭 嶸 嶹 巐 弡 弴 彧 德 忞 | |
0xeda0 恝 悅 悊 惞 惕 愠 惲 愑 愷 愰 憘 戓 抦 揵 摠 撝 | |
0xedb0 擎 敎 昀 昕 昻 昉 昮 昞 昤 晥 晗 晙 晴 晳 暙 暠 | |
0xedc0 暲 暿 曺 朎 朗 杦 枻 桒 柀 栁 桄 棏 﨓 楨 﨔 榘 | |
0xedd0 槢 樰 橫 橆 橳 橾 櫢 櫤 毖 氿 汜 沆 汯 泚 洄 涇 | |
0xede0 浯 涖 涬 淏 淸 淲 淼 渹 湜 渧 渼 溿 澈 澵 濵 瀅 | |
0xedf0 瀇 瀨 炅 炫 焏 焄 煜 煆 煇 凞 燁 燾 犱 | |
0xee00 | |
0xee10 | |
0xee20 | |
0xee30 | |
0xee40 犾 猤 猪 獷 玽 珉 珖 珣 珒 琇 珵 琦 琪 琩 琮 瑢 | |
0xee50 璉 璟 甁 畯 皂 皜 皞 皛 皦 益 睆 劯 砡 硎 硤 硺 | |
0xee60 礰 礼 神 祥 禔 福 禛 竑 竧 靖 竫 箞 精 絈 絜 綷 | |
0xee70 綠 緖 繒 罇 羡 羽 茁 荢 荿 菇 菶 葈 蒴 蕓 蕙 | |
0xee80 蕫 﨟 薰 蘒 﨡 蠇 裵 訒 訷 詹 誧 誾 諟 諸 諶 譓 | |
0xee90 譿 賰 賴 贒 赶 﨣 軏 﨤 逸 遧 郞 都 鄕 鄧 釚 釗 | |
0xeea0 釞 釭 釮 釤 釥 鈆 鈐 鈊 鈺 鉀 鈼 鉎 鉙 鉑 鈹 鉧 | |
0xeeb0 銧 鉷 鉸 鋧 鋗 鋙 鋐 﨧 鋕 鋠 鋓 錥 錡 鋻 﨨 錞 | |
0xeec0 鋿 錝 錂 鍰 鍗 鎤 鏆 鏞 鏸 鐱 鑅 鑈 閒 隆 﨩 隝 | |
0xeed0 隯 霳 霻 靃 靍 靏 靑 靕 顗 顥 飯 飼 餧 館 馞 驎 | |
0xeee0 髙 髜 魵 魲 鮏 鮱 鮻 鰀 鵰 鵫 鶴 鸙 黑 ⅰ | |
0xeef0 ⅱ ⅲ ⅳ ⅴ ⅵ ⅶ ⅷ ⅸ ⅹ ¬ ¦ ' " | |
< IBM拡張文字 > | |
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 | |
0xfa40 ⅰ ⅱ ⅲ ⅳ ⅴ ⅵ ⅶ ⅷ ⅸ ⅹ Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ | |
0xfa50 Ⅶ Ⅷ Ⅸ Ⅹ ¬ ¦ ' " ㈱ № ℡ ∵ 纊 褜 鍈 銈 | |
0xfa60 蓜 俉 炻 昱 棈 鋹 曻 彅 丨 仡 仼 伀 伃 伹 佖 侒 | |
0xfa70 侊 侚 侔 俍 偀 倢 俿 倞 偆 偰 偂 傔 僴 僘 兊 | |
0xfa80 兤 冝 冾 凬 刕 劜 劦 勀 勛 匀 匇 匤 卲 厓 厲 叝 | |
0xfa90 﨎 咜 咊 咩 哿 喆 坙 坥 垬 埈 埇 﨏 塚 增 墲 夋 | |
0xfaa0 奓 奛 奝 奣 妤 妺 孖 寀 甯 寘 寬 尞 岦 岺 峵 崧 | |
0xfab0 嵓 﨑 嵂 嵭 嶸 嶹 巐 弡 弴 彧 德 忞 恝 悅 悊 惞 | |
0xfac0 惕 愠 惲 愑 愷 愰 憘 戓 抦 揵 摠 撝 擎 敎 昀 昕 | |
0xfad0 昻 昉 昮 昞 昤 晥 晗 晙 晴 晳 暙 暠 暲 暿 曺 朎 | |
0xfae0 朗 杦 枻 桒 柀 栁 桄 棏 﨓 楨 﨔 榘 槢 樰 橫 橆 | |
0xfaf0 橳 橾 櫢 櫤 毖 氿 汜 沆 汯 泚 洄 涇 浯 | |
0xfb00 | |
0xfb10 | |
0xfb20 | |
0xfb30 | |
0xfb40 涖 涬 淏 淸 淲 淼 渹 湜 渧 渼 溿 澈 澵 濵 瀅 瀇 | |
0xfb50 瀨 炅 炫 焏 焄 煜 煆 煇 凞 燁 燾 犱 犾 猤 猪 獷 | |
0xfb60 玽 珉 珖 珣 珒 琇 珵 琦 琪 琩 琮 瑢 璉 璟 甁 畯 | |
0xfb70 皂 皜 皞 皛 皦 益 睆 劯 砡 硎 硤 硺 礰 礼 神 | |
0xfb80 祥 禔 福 禛 竑 竧 靖 竫 箞 精 絈 絜 綷 綠 緖 繒 | |
0xfb90 罇 羡 羽 茁 荢 荿 菇 菶 葈 蒴 蕓 蕙 蕫 﨟 薰 蘒 | |
0xfba0 﨡 蠇 裵 訒 訷 詹 誧 誾 諟 諸 諶 譓 譿 賰 賴 贒 | |
0xfbb0 赶 﨣 軏 﨤 逸 遧 郞 都 鄕 鄧 釚 釗 釞 釭 釮 釤 | |
0xfbc0 釥 鈆 鈐 鈊 鈺 鉀 鈼 鉎 鉙 鉑 鈹 鉧 銧 鉷 鉸 鋧 | |
0xfbd0 鋗 鋙 鋐 﨧 鋕 鋠 鋓 錥 錡 鋻 﨨 錞 鋿 錝 錂 鍰 | |
0xfbe0 鍗 鎤 鏆 鏞 鏸 鐱 鑅 鑈 閒 隆 﨩 隝 隯 霳 霻 靃 | |
0xfbf0 靍 靏 靑 靕 顗 顥 飯 飼 餧 館 馞 驎 髙 | |
0xfc00 | |
0xfc10 | |
0xfc20 | |
0xfc30 | |
0xfc40 髜 魵 魲 鮏 鮱 鮻 鰀 鵰 鵫 鶴 鸙 黑 | |
0xfc50 | |
0xfc60 | |
0xfc70 | |
0xfc80 | |
0xfc90 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def each_slice(seq, n): | |
arr = [] | |
for i, c in enumerate(seq, 1): | |
arr.append(c) | |
if i % n == 0: | |
yield arr | |
arr = [] | |
if len(arr) > 0: | |
yield arr | |
chunk = each_slice | |
def cp932_chars(start, end): | |
for code in range(start, end+1): | |
c = code.to_bytes(2, byteorder="big") | |
yield (code, c.decode("cp932", errors="replace")) | |
def dump(start, end): | |
heading = " "*7 + " ".join([f'{col:02}' for col in range(16)]) | |
print(heading) | |
for code_char_list in each_slice(cp932_chars(start, end), 16): | |
codes, chars = zip(*code_char_list) | |
fixed_chars = [c if "\ufffd" not in c else " " for c in chars] | |
print(f"{codes[0]:#04x} {' '.join(fixed_chars)}") | |
print() | |
print("< NEC特殊漢字 >") | |
dump(0x8740, 0x879e) | |
print("< NEC選定IBM拡張文字 >") | |
dump(0xed40, 0xeefc) | |
print("< IBM拡張文字 >") | |
dump(0xfa40, 0xfc9e) | |
#print("< 利用者定義領域(外字領域)>") | |
#dump(0xf040, 0xf9fc) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
重複文字を考慮したWindows-31Jの使用文字について(優先順位順)
(1) JIS X 0208 (第1水準、第2水準)
(2) 「¬∵≒≡∫√⊥∠∩∪」の10文字は JIS X 0208 を使用する
(3) 「NEC特殊文字」「IBM拡張文字」で重複している「ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ№℡㈱」の13文字は、NEC特殊文字の方を使う
(4) 「NEC選定IBM拡張文字」は使わない
上記より、使用禁止文字
「NEC選定IBM拡張文字」:すべて使用禁止
「NEC特殊文字」: 0x8740-0x879C(83文字)のうち0x8790-0x8792, 0x8795-0x8797, 0x879A-0x879C(9文字)は使用禁止
「IBM拡張文字」: 0xFA40-0xFC4B(388文字)のうち0xFA4A-0xFA54, 0xFA58-0xFA5B(15文字)は使用禁止
(参考)
Qiita: [Java] シフトJISの扱い
Wikipedia:「Microsoftコードページ932」