Last active
March 6, 2025 17:57
-
-
Save guy4261/d5134608e0b896a6bd32284a3181e09b to your computer and use it in GitHub Desktop.
Hebrew niqqud unicode point values for Python programmers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# To create a hebrew letter with niqqud: | |
# hebrew letter [+ optional shin_dot if letter is shin] [+ optional dagesh] [+ optional niqqud] | |
# example: print("ש" + chr(shin_dot_right_shin) + chr(dagesh) + chr(kmz_katan)) => שָּׁ | |
# letter should be first, order of the rest does not matter | |
# print("ש" + chr(kmz_katan) + chr(shin_dot_left_sinn) + chr(dagesh)) => שָּׂ | |
# This is how you get things like the reverse of noël being l̈eon instead of lëon, | |
# as discussed in Edaqa Mortoray's two seminal blog posts: | |
# https://mortoray.com/we-dont-need-a-string-type/ | |
# https://mortoray.com/the-string-type-is-broken/ | |
# The following post by Joel Spolsky also helped me grok unicode back in the Python 2.6+ days: | |
# https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ | |
# niqqud marks | |
shva = 1456 # שְ | |
segol_nach = 1457 # שֱ | |
pth_nach = 1458 # שֲ | |
kmz_nach = 1459 # שֳ | |
hirik = 1460 # שִ | |
zere = 1461 # שֵ | |
segol = 1462 # שֶ | |
pth = 1463 # שַ | |
kmz_katan = 1464 # שָ | |
holam = 1465 # שֹ | |
holam_hasser = 1466 # שֺ | |
kubuz = 1467 # שֻ | |
dagesh = 1468 # שּ | |
# 1469 שֽ | |
# 1470 ־ | |
# 1471 שֿ | |
# 1472 ׀ | |
shin_dot_right_shin = 1473 # שׁ | |
shin_dot_left_sinn = 1474 # שׂ | |
# 1475 ׃ | |
# 1476 שׄ | |
# 1477 שׅ | |
# 1478 ׆ | |
kmz_gadol = 1479 # שׇ | |
# [1480, 1487] - [?] | |
# hebrew letters | |
aleph = 1488 # א | |
bet = 1489 # ב | |
gimel = 1490 # ג | |
dalet = 1491 # ד | |
heh = 1492 # ה | |
vav = 1493 # ו | |
zayin = 1494 # ז | |
het = 1495 # ח | |
tet = 1496 # ט | |
yod = 1497 # י | |
kaf_sofit = 1498 # ך | |
kaf = 1499 # כ | |
lamed = 1500 # ל | |
mem_sofit = 1501 # ם | |
mem = 1502 # מ | |
noon_sofit = 1503 # ן | |
noon = 1504 # נ | |
samech = 1505 # ס | |
ayin = 1506 # ע | |
peh_sofit = 1507 # ף | |
peh = 1508 # פ | |
zadi_sofit = 1509 # ץ | |
zadi = 1510 # צ | |
kof = 1511 # ק | |
reyish = 1512 # ר | |
shin = 1513 # ש | |
tav = 1514 # ת | |
# [1515, 1519] - [?] | |
# 1520 װ | |
# 1521 ױ | |
# 1522 ײ | |
geresh = 1523 # ׳ | |
# 64331 וֹ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Only after posting this did I find this page:
http://www.nashbell.com/technology/he-unicode.php
The things I went to get to this gist instead of locating this page and ripping it off are truly embarrassing 😓