Last active
July 22, 2018 14:44
-
-
Save elegantcoder/d871a47b8f231a1c659e0c35080bdc64 to your computer and use it in GitHub Desktop.
한글 초성 분리기
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 삼성전자 => ㅅㅅㅈㅈ, 유안타제3호스팩 => ㅇㅇㅌㅈ3ㅎㅅㅍ | |
def extract_korean_initials(keyword) | |
initials = ['ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'] | |
# hangul_range = '가'..'힣' | |
hangul_first = 44032 # '가'.ord | |
size = 588 # '까'.ord - '가'.ord | |
keyword.split('').collect do |k| | |
k_char_code = k.ord | |
initials[(k_char_code - hangul_first) / size] || k | |
end | |
.join('') | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
RoR의
ActiveSupport::Multibyte::Chars
를 이용하면 좀 수월할 듯. 에전에 잠깐 만져본 경험이 있어서 ㅎㅎ