-
-
Save terrancesnyder/1345094 to your computer and use it in GitHub Desktop.
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ~ The Big Kahuna! | |
([一-龯]) | |
Regex for matching Hirgana or Katakana | |
([ぁ-んァ-ン]) | |
Regex for matching Non-Hirgana or Non-Katakana | |
([^ぁ-んァ-ン]) | |
Regex for matching Hirgana or Katakana or basic punctuation (、。’) | |
([ぁ-んァ-ン\w]) | |
Regex for matching Hirgana or Katakana and random other characters | |
([ぁ-んァ-ン!:/]) | |
Regex for matching Hirgana | |
([ぁ-ん]) | |
Regex for matching full-width Katakana (zenkaku 全角) | |
([ァ-ン]) | |
Regex for matching half-width Katakana (hankaku 半角) | |
([ァ-ン゙゚]) | |
Regex for matching full-width Numbers (zenkaku 全角) | |
([0-9]) | |
Regex for matching full-width Letters (zenkaku 全角) | |
([A-z]) | |
Regex for matching Hiragana codespace characters (includes non phonetic characters) | |
([ぁ-ゞ]) | |
Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters) | |
([ァ-ヶ]) | |
Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana) | |
([ヲ-゚]) | |
Regex for matching Japanese Post Codes | |
/^¥d{3}¥-¥d{4}$/ | |
/^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/ | |
Regex for matching Japanese mobile phone numbers (keitai bangou) | |
/^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/ | |
/^0¥d0-¥d{4}-¥d{4}$/ | |
Regex for matching Japanese fixed line phone numbers | |
/^[0-9-]{6,9}$|^[0-9-]{12}$/ | |
/^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/ |
I'm working on Android and \d
matches 0
(U+FF10
), too.
This doesn't cover all kanjis. Simple example: 𧓈
To be fair those kanjis are extremely rare and are not used (they would not show up in dictionnaires or rikaichan like extensions) and 99.99% Japanese would not know about them:
https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B
Now you can match them with: [𠀀-𪛟]
and to match everything you would simply do: [𠀀-𪛟]|[一-龯]
@cb372, your list comes close to covering all the kana, but a few characters are still missing. You got 「ゞ」 but missed 「ゝ」 and 「ゟ」, and a few others. I believe this would cover all Hiragana and Katakana separately:
Hiragana = [ぁ-ゖ゛-ゟー]
Katakana = [゠-ヿ]
Combined Hiragana & Katakana would be:
Hiragana+Katakana = [ぁ-ゖ゛-ゟ゠-ヿ]
I used the above hiragana+katakana regex to validate the kana portions of the downloadable version of JMDICT and can confirm that apart from a few errors in the JMDICT data, the kana validation works.
@Jaha96 That doesn't catch all: the first year is commonly marked as 元年 instead of 1年. 令和元年 = year 2019, for example. This case was widely disregarded in many libraries, but in actual life, it was very common to see it written that way.