Last active
August 31, 2020 19:53
-
-
Save schierlm/aa37036335528b9b12bb to your computer and use it in GitHub Desktop.
Text document containing all characters of the Multilingual European Subsets of Unicode and some other common Unicode subsets (and a small Java program to verify the file has not been garbled)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Common Unicode Subsets | |
====================== | |
ASCII | |
~~~~~ | |
Not exactly known as a Unicode subset; the Unicode character set starts with | |
ASCII, though; therefore, ASCII is the smallest widely-used subset of | |
Unicode. | |
|Latin uppercase letters |0041-5A(26)|ABCDEFGHIJKLMNOPQRSTUVWXYZ|-----| | |
|Latin lowercase letters |0061-7A(26)|abcdefghijklmnopqrstuvwxyz|-----| | |
|Decimal digits |0030-39(10)|0123456789|---------------------| | |
|Symbols and special characters |0020-2F(16)| !"#$%&'()*+,-./|---------------| | |
|'-> |003A-40,5B-60,7B-7E(17)|:;<=>?@[\]^_`{|}~|--------------| | |
ASCII is defined as: | |
>00 20-7E | |
>#95 | |
Multilingual European Subset 1 (MES-1) | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
On top of ASCII, this charset contains common Latin letters and symbols used | |
in Europe (or by European character sets): | |
|Latin-1 symbols |00A0-BF(32)| ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿| | |
|Latin-1 uppercase letters |00C0-DF(32)|ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß| | |
|Latin-1 lowercase letters |00E0-FF(32)|àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ| | |
|Latin extended |0100-13(20)|ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒē|-----------| | |
|'-> |0116-2B(22)|ĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪī|---------| | |
|'-> |012E-4D(32)|ĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌō| | |
|'-> |0150-67(24)|ŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧ|-------| | |
|'-> |0168-7E(23)|ŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž|--------| | |
|Accents |02C7-C7(01)|ˇ|------------------------------| | |
|'-> |02D8-DB,DD-DD(05)|˘˙˚˛˝|--------------------------| | |
|Typographic special characters |2015-15(01)|―|------------------------------| | |
|'-> |2018-19,1C-1D(04)|‘’“”|---------------------------| | |
|Euro symbol |20AC-AC(01)|€|------------------------------| | |
|Trademark symbol |2122-22(01)|™|------------------------------| | |
|Ohm symbol |2126-26(01)|Ω|------------------------------| | |
|Vulgar fractions |215B-5E(04)|⅛⅜⅝⅞|---------------------------| | |
|Arrow symbols |2190-93(04)|←↑→↓|---------------------------| | |
|Musical note symbol |266A-6A(01)|♪|------------------------------| | |
MES-1 is defined as: | |
>00 20-7E A0-FF | |
>01 00-13 16-2B 2E-4D 50-7E | |
>02 C7 D8-DB DD | |
>20 15 18-19 1C-1D AC | |
>21 22 26 5B-5E 90-93 | |
>26 6A | |
>#335 | |
Multilingual European Subset 2 (MES-2) | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
On top of MES-1, this contains more "exotic" Latin European characters | |
as well as Greek and Cyrillic ones, and more symbols: | |
|Latin extended |0114-15(02)|Ĕĕ|-----------------------------| | |
|'-> |012C-2D,4E-4F(04)|ĬĭŎŏ|---------------------------| | |
|'-> |0192-92,FA-FF(07)|ƒǺǻǼǽǾǿ|------------------------| | |
|'-> |1E80-85,F2-F3(08)|ẀẁẂẃẄẅỲỳ|-----------------------| | |
|'-> (*) |01DE-EF(18)|ǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯ|-------------| | |
|'-> (*) |0218-1B,1E-1F(06)|ȘșȚțȞȟ|-------------------------| | |
|'-> (*) |1E02-03,0A-0B,1E-1F,40-41(08)|ḂḃḊḋḞḟṀṁ|-----------------------| | |
|'-> (*) |1E56-57,60-61,6A-6B(06)|ṖṗṠṡṪṫ|-------------------------| | |
|More exotic Latin letters |017F-7F(01)|ſ|------------------------------| | |
|'-> (*) |018F-8F,B7-B7(02)|ƏƷ|-----------------------------| | |
|'-> (*) |0259-59,7C-7C,92-92(03)|əɼʒ|----------------------------| | |
|'-> (*) |1E9B-9B(01)|ẛ|------------------------------| | |
|Latin Modifier letters |02C6-C6(01)|ˆ|------------------------------| | |
|'-> |02C9-C9,DC-DC(02)|ˉ˜|-----------------------------| | |
|'-> (*) |02BB-BD,EE-EE(04)|ʻʼʽˮ|---------------------------| | |
|Greek uppercase letters |0391-A1(17)|ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ|--------------| | |
|'-> |03A3-A9(07)|ΣΤΥΦΧΨΩ|------------------------| | |
|Greek lowercase letters |03B1-C9(25)|αβγδεζηθικλμνξοπρςστυφχψω|------| | |
|Greek extended |0384-8A(07)|΄΅Ά·ΈΉΊ|------------------------| | |
|'-> |038C-8C,8E-90(04)|ΌΎΏΐ|---------------------------| | |
|'-> |03AA-B0,CA-CE(12)|ΪΫάέήίΰϊϋόύώ|-------------------| | |
|'-> (*) |0374-75,7A-7A,7E-7E(04)|ʹ͵ͺ;|---------------------------| | |
|'-> (*) |03D7-D7,DA-E1(09)|ϗϚϛϜϝϞϟϠϡ|----------------------| | |
|'-> (*) |1F00-15,18-1D(28)|ἀἁἂἃἄἅἆἇἈἉἊἋἌἍἎἏἐἑἒἓἔἕἘἙἚἛἜἝ|---| | |
|'-> (*) |1F20-3F(32)|ἠἡἢἣἤἥἦἧἨἩἪἫἬἭἮἯἰἱἲἳἴἵἶἷἸἹἺἻἼἽἾἿ| | |
|'-> (*) |1F40-45,48-4D,50-57(20)|ὀὁὂὃὄὅὈὉὊὋὌὍὐὑὒὓὔὕὖὗ|-----------| | |
|'-> (*) |1F59-59,5B-5B,5D-5D(03)|ὙὛὝ|----------------------------| | |
|'-> (*) |1F5F-7D(31)|ὟὠὡὢὣὤὥὦὧὨὩὪὫὬὭὮὯὰάὲέὴήὶίὸόὺύὼώ|| | |
|'-> (*) |1F80-9F(32)|ᾀᾁᾂᾃᾄᾅᾆᾇᾈᾉᾊᾋᾌᾍᾎᾏᾐᾑᾒᾓᾔᾕᾖᾗᾘᾙᾚᾛᾜᾝᾞᾟ| | |
|'-> (*) |1FA0-B4(21)|ᾠᾡᾢᾣᾤᾥᾦᾧᾨᾩᾪᾫᾬᾭᾮᾯᾰᾱᾲᾳᾴ|----------| | |
|'-> (*) |1FB6-C4,C6-D3(29)|ᾶᾷᾸᾹᾺΆᾼ᾽ι᾿῀῁ῂῃῄῆῇῈΈῊΉῌ῍῎῏ῐῑῒΐ|--| | |
|'-> (*) |1FD6-DB,DD-EF(25)|ῖῗῘῙῚΊ῝῞῟ῠῡῢΰῤῥῦῧῨῩῪΎῬ῭΅`|------| | |
|'-> (*) |1FF2-F4,F6-FE(12)|ῲῳῴῶῷῸΌῺΏῼ´῾|-------------------| | |
|Cyrillic |0400-1F(32)|ЀЁЂЃЄЅІЇЈЉЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОП| | |
|'-> |0420-3F(32)|РСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмноп| | |
|'-> |0440-5F(32)|рстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ| | |
|'-> |0490-91(02)|Ґґ|-----------------------------| | |
|'-> (*) |0492-B1(32)|ҒғҔҕҖҗҘҙҚқҜҝҞҟҠҡҢңҤҥҦҧҨҩҪҫҬҭҮүҰұ| | |
|'-> (*) |04B2-C4,C7-C8(21)|ҲҳҴҵҶҷҸҹҺһҼҽҾҿӀӁӂӃӄӇӈ|----------| | |
|'-> (*) |04CB-CC,D0-EB(30)|ӋӌӐӑӒӓӔӕӖӗӘәӚӛӜӝӞӟӠӡӢӣӤӥӦӧӨөӪӫ|-| | |
|'-> (*) |04EE-F5,F8-F9(10)|ӮӯӰӱӲӳӴӵӸӹ|---------------------| | |
|Typographic symbols |2013-14(02)|–—|-----------------------------| | |
|'-> |2017-17,1A-1B,1E-1E,20-22(07)|‗‚‛„†‡•|------------------------| | |
|'-> |2026-26,30-30,32-33,39-3A(06)|…‰′″‹›|-------------------------| | |
|'-> |203C-3C,3E-3E,44-44,7F-7F(04)|‼‾⁄ⁿ|---------------------------| | |
|'-> (*) |204A-4A,82-82(02)|⁊₂|-----------------------------| | |
|Currency symbols |20A3-A4(02)|₣₤|-----------------------------| | |
|'-> |20A7-A7(01)|₧|------------------------------| | |
|'-> (*) |20AF-AF(01)|₯|------------------------------| | |
|Business symbols |2105-05(01)|℅|------------------------------| | |
|'-> |2116-16(01)|№|------------------------------| | |
|Arrow symbols |2194-95(02)|↔↕|-----------------------------| | |
|'-> |21A8-A8(01)|↨|------------------------------| | |
|Mathematical symbols |2202-02(01)|∂|------------------------------| | |
|'-> |2206-06,0F-0F,11-12,19-1A(06)|∆∏∑−∙√|-------------------------| | |
|'-> |221E-1F,29-29,2B-2B(04)|∞∟∩∫|---------------------------| | |
|'-> |2248-48,60-61,64-65(05)|≈≠≡≤≥|--------------------------| | |
|'-> |2302-02,10-10,20-21(04)|⌂⌐⌠⌡|---------------------------| | |
|'-> (*) |2200-00,03-03,08-09(04)|∀∃∈∉|---------------------------| | |
|'-> (*) |2227-28,2A-2A,59-59(04)|∧∨∪≙|---------------------------| | |
|'-> (*) |2282-83,95-95,97-97(04)|⊂⊃⊕⊗|---------------------------| | |
|'-> (*) |2329-2A(02)|〈〉|-----------------------------| | |
|Box drawing characters |2500-00(01)|─|------------------------------| | |
|'-> |2502-02,0C-0C,10-10(03)|│┌┐|----------------------------| | |
|'-> |2514-14,18-18,1C-1C(03)|└┘├|----------------------------| | |
|'-> |2524-24,2C-2C,34-34,3C-3C(04)|┤┬┴┼|---------------------------| | |
|'-> |2550-6C(29)|═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥╦╧╨╩╪╫╬|--| | |
|Block graphic characters |2580-80(01)|▀|------------------------------| | |
|'-> |2584-84,88-88,8C-8C(03)|▄█▌|----------------------------| | |
|'-> |2590-93(04)|▐░▒▓|---------------------------| | |
|Shapes |25A0-A0(01)|■|------------------------------| | |
|'-> |25AC-AC(01)|▬|------------------------------| | |
|'-> |25B2-B2,BA-BA,BC-BC,C4-C4(04)|▲►▼◄|---------------------------| | |
|'-> |25CA-CB,D8-D9(04)|◊○◘◙|---------------------------| | |
|Miscellaneous symbols |263A-3C(03)|☺☻☼|----------------------------| | |
|'-> |2640-40,42-42(02)|♀♂|-----------------------------| | |
|'-> |2660-60,63-63,65-66(04)|♠♣♥♦|---------------------------| | |
|'-> |266B-6B(01)|♫|------------------------------| | |
|Ligatures |FB01-02(02)|fifl|-----------------------------| | |
|Replacement character (*) |FFFD-FD(01)|�|------------------------------| | |
MES-2 is defined as: | |
>00 20-7E A0-FF | |
>01 00-7F 8F 92 B7 DE-EF FA-FF | |
>02 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE | |
>03 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1 | |
>04 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9 | |
>1E 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3 | |
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB | |
>1F DD-EF F2-F4 F6-FE | |
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF | |
>21 05 16 22 26 5B-5E 90-95 A8 | |
>22 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97 | |
>23 02 10 20-21 29-2A | |
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC C4 | |
>25 CA-CB D8-D9 | |
>26 3A-3C 40 42 60 63 65-66 6A-6B | |
>FB 01-02 | |
>FF FD | |
>#1052 | |
Windows Glyph List 4 (WGL-4) | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
A superset of MES-1 and mostly a subset of MES-2 (everything not marked with | |
(*) above), but with a few additional characters; this was defined by | |
Microsoft as the character set that is supposed to be displayable on all mayor | |
Windows versions without installing additional fonts. | |
|Special letters |2113-13(01)|ℓ|------------------------------| | |
|'-> |212E-2E(01)|℮|------------------------------| | |
|Special symbols |2215-15(01)|∕|------------------------------| | |
|'-> |25A1-A1,AA-AB,CF-CF,E6-E6(05)|□▪▫●◦|--------------------------| | |
WGL4 is defined as: | |
>00 20-7E A0-FF | |
>01 00-7F 92 FA-FF | |
>02 C6-C7 C9 D8-DD | |
>03 84-8A 8C 8E-A1 A3-CE | |
>04 00-5F 90-91 | |
>1E 80-85 F2-F3 | |
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 7F A3-A4 A7 AC | |
>21 05 13 16 22 26 2E 5B-5E 90-95 A8 | |
>22 02 06 0F 11-12 15 19-1A 1E-1F 29 2B 48 60-61 64-65 | |
>23 02 10 20-21 | |
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0-A1 AA-AC B2 BA | |
>25 BC C4 CA-CB CF D8-D9 E6 | |
>26 3A-3C 40 42 60 63 65-66 6A-6B | |
>FB 01-02 | |
>#*655 | |
Multilingual European Subset 3 (MES-3) and its variants | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
MES-3 contains even more characters. There are several version of this subset, | |
MES-3A is an open subset (that may receive more characters if they are added | |
to the respective code ranges), so is not included in this file here. | |
MES-3B and MES-3KS are two fixed subsets. The latter does not contain some | |
characters that are not used by languages of European origin, and is therefore | |
shown first here (as difference to MES-2 and WGL): | |
|MES-3KS |0180-81(02)|ƀƁ|-----------------------------| | |
|'-> |018B-8C(02)|Ƌƌ|-----------------------------| | |
|'-> |0195-95(01)|ƕ|------------------------------| | |
|'-> |019A-9B(02)|ƚƛ|-----------------------------| | |
|'-> |019E-9F(02)|ƞƟ|-----------------------------| | |
|'-> |01A2-A3(02)|Ƣƣ|-----------------------------| | |
|'-> |01A6-A6(01)|Ʀ|------------------------------| | |
|'-> |01AA-AB(02)|ƪƫ|-----------------------------| | |
|'-> |01B5-B6(02)|Ƶƶ|-----------------------------| | |
|'-> |01B8-BB(04)|Ƹƹƺƻ|---------------------------| | |
|'-> |01BE-CC(15)|ƾƿǀǁǂǃDŽDždžLJLjljNJNjnj|----------------| | |
|'-> |01D5-D6(02)|Ǖǖ|-----------------------------| | |
|'-> |01F0-F7(08)|ǰDZDzdzǴǵǶǷ|-----------------------| | |
|'-> |0200-17(24)|ȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗ|-------| | |
|'-> |021C-1D(02)|Ȝȝ|-----------------------------| | |
|'-> |0224-27(04)|ȤȥȦȧ|---------------------------| | |
|'-> |022A-33(10)|ȪȫȬȭȮȯȰȱȲȳ|---------------------| | |
|'-> |0250-58(09)|ɐɑɒɓɔɕɖɗɘ|----------------------| | |
|'-> |025A-79(32)|ɚɛɜɝɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹ| | |
|'-> |027A-7B(02)|ɺɻ|-----------------------------| | |
|'-> |027D-91(21)|ɽɾɿʀʁʂʃʄʅʆʇʈʉʊʋʌʍʎʏʐʑ|----------| | |
|'-> |0293-AD(27)|ʓʔʕʖʗʘʙʚʛʜʝʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭ|----| | |
|'-> |02B0-BA(11)|ʰʱʲʳʴʵʶʷʸʹʺ|--------------------| | |
|'-> |02BE-C5(08)|ʾʿˀˁ˂˃˄˅|-----------------------| | |
|'-> |02C8-C8(01)|ˈ|------------------------------| | |
|'-> |02CA-D7(14)|ˊˋˌˍˎˏːˑ˒˓˔˕˖˗|-----------------| | |
|'-> |02DE-ED(16)|˞˟ˠˡˢˣˤ˥˦˧˨˩˪˫ˬ˭|---------------| | |
|'-> |0300-1F(32)|̛̖̗̘̙̜̝̞̟̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̕̚| | |
|'-> |0320-3F(32)|̴̵̶̷̸̡̢̧̨̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼̽̾̿| | |
|'-> |0340-4E(15)|͇͈͉͍͎̀́͂̓̈́͆͊͋͌ͅ|----------------| | |
|'-> |0360-62(03)|͢͠͡|----------------------------| | |
|'-> |03D0-D6(07)|ϐϑϒϓϔϕϖ|------------------------| | |
|'-> |03E2-F3(18)|ϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳ|-------------| | |
|'-> |0460-7F(32)|ѠѡѢѣѤѥѦѧѨѩѪѫѬѭѮѯѰѱѲѳѴѵѶѷѸѹѺѻѼѽѾѿ| | |
|'-> |0480-86(07)|Ҁҁ҂҃҄҅҆|------------------------| | |
|'-> |0488-89(02)|҈҉|-----------------------------| | |
|'-> |048C-8F(04)|ҌҍҎҏ|---------------------------| | |
|'-> |04EC-ED(02)|Ӭӭ|-----------------------------| | |
|'-> |0531-50(32)|ԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀՁՂՃՄՅՆՇՈՉՊՋՌՍՎՏՐ| | |
|'-> |0551-56(06)|ՑՒՓՔՕՖ|-------------------------| | |
|'-> |0559-5F(07)|ՙ՚՛՜՝՞՟|------------------------| | |
|'-> |0561-80(32)|աբգդեզէըթժիլխծկհձղճմյնշոչպջռսվտր| | |
|'-> |0581-87(07)|ցւփքօֆև|------------------------| | |
|'-> |0589-8A(02)|։֊|-----------------------------| | |
|'-> |10D0-EF(32)|აბგდევზთიკლმნოპჟრსტუფქღყშჩცძწჭხჯ| | |
|'-> |10F0-F6(07)|ჰჱჲჳჴჵჶ|------------------------| | |
|'-> |10FB-FB(01)|჻|------------------------------| | |
|'-> |1E00-01(02)|Ḁḁ|-----------------------------| | |
|'-> |1E04-09(06)|ḄḅḆḇḈḉ|-------------------------| | |
|'-> |1E0C-1D(18)|ḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝ|-------------| | |
|'-> |1E20-3F(32)|ḠḡḢḣḤḥḦḧḨḩḪḫḬḭḮḯḰḱḲḳḴḵḶḷḸḹḺḻḼḽḾḿ| | |
|'-> |1E42-55(20)|ṂṃṄṅṆṇṈṉṊṋṌṍṎṏṐṑṒṓṔṕ|-----------| | |
|'-> |1E58-5F(08)|ṘṙṚṛṜṝṞṟ|-----------------------| | |
|'-> |1E62-69(08)|ṢṣṤṥṦṧṨṩ|-----------------------| | |
|'-> |1E6C-7F(20)|ṬṭṮṯṰṱṲṳṴṵṶṷṸṹṺṻṼṽṾṿ|-----------| | |
|'-> |1E86-9A(21)|ẆẇẈẉẊẋẌẍẎẏẐẑẒẓẔẕẖẗẘẙẚ|----------| | |
|'-> |2000-12(19)| ‐‑‒|------------| | |
|'-> |2016-16(01)|‖|------------------------------| | |
|'-> |201F-1F(01)|‟|------------------------------| | |
|'-> |2023-25(03)|‣․‥|----------------------------| | |
|'-> |2027-2F(09)|‧ |----------------------| | |
The previous line contains right-to-left separators and may look strange. | |
|'-> |2031-31(01)|‱|------------------------------| | |
|'-> |2034-38(05)|‴‵‶‷‸|--------------------------| | |
|'-> |203B-3B(01)|※|------------------------------| | |
|'-> |203D-3D(01)|‽|------------------------------| | |
|'-> |203F-43(05)|‿⁀⁁⁂⁃|--------------------------| | |
|'-> |2045-46(02)|⁅⁆|-----------------------------| | |
|'-> |2048-49(02)|⁈⁉|-----------------------------| | |
|'-> |204B-4D(03)|⁋⁌⁍|----------------------------| | |
|'-> |206A-70(07)|⁰|------------------------| | |
|'-> |2074-7E(11)|⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾|--------------------| | |
|'-> |2080-81(02)|₀₁|-----------------------------| | |
|'-> |2083-8E(12)|₃₄₅₆₇₈₉₊₋₌₍₎|-------------------| | |
|'-> |20A0-A2(03)|₠₡₢|----------------------------| | |
|'-> |20A5-A6(02)|₥₦|-----------------------------| | |
|'-> |20A8-AB(04)|₨₩₪₫|---------------------------| | |
|'-> |20AD-AE(02)|₭₮|-----------------------------| | |
|'-> |20D0-E3(20)|⃒⃓⃘⃙⃚⃐⃑⃔⃕⃖⃗⃛⃜⃝⃞⃟⃠⃡⃢⃣|-----------| | |
|'-> |2100-04(05)|℀℁ℂ℃℄|--------------------------| | |
|'-> |2106-12(13)|℆ℇ℈℉ℊℋℌℍℎℏℐℑℒ|------------------| | |
|'-> |2114-15(02)|℔ℕ|-----------------------------| | |
|'-> |2117-21(11)|℗℘ℙℚℛℜℝ℞℟℠℡|--------------------| | |
|'-> |2123-25(03)|℣ℤ℥|----------------------------| | |
|'-> |2127-2D(07)|℧ℨ℩KÅℬℭ|------------------------| | |
|'-> |212F-3A(12)|ℯℰℱℲℳℴℵℶℷℸℹ℺|-------------------| | |
|'-> |2153-5A(08)|⅓⅔⅕⅖⅗⅘⅙⅚|-----------------------| | |
|'-> |215F-7E(32)|⅟ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾ| | |
|'-> |217F-83(05)|ⅿↀↁↂↃ|--------------------------| | |
|'-> |2196-A7(18)|↖↗↘↙↚↛↜↝↞↟↠↡↢↣↤↥↦↧|-------------| | |
|'-> |21A9-C8(32)|↩↪↫↬↭↮↯↰↱↲↳↴↵↶↷↸↹↺↻↼↽↾↿⇀⇁⇂⇃⇄⇅⇆⇇⇈| | |
|'-> |21C9-E8(32)|⇉⇊⇋⇌⇍⇎⇏⇐⇑⇒⇓⇔⇕⇖⇗⇘⇙⇚⇛⇜⇝⇞⇟⇠⇡⇢⇣⇤⇥⇦⇧⇨| | |
|'-> |21E9-F3(11)|⇩⇪⇫⇬⇭⇮⇯⇰⇱⇲⇳|--------------------| | |
|'-> |2201-01(01)|∁|------------------------------| | |
|'-> |2204-05(02)|∄∅|-----------------------------| | |
|'-> |2207-07(01)|∇|------------------------------| | |
|'-> |220A-0E(05)|∊∋∌∍∎|--------------------------| | |
|'-> |2210-10(01)|∐|------------------------------| | |
|'-> |2213-14(02)|∓∔|-----------------------------| | |
|'-> |2216-18(03)|∖∗∘|----------------------------| | |
|'-> |221B-1D(03)|∛∜∝|----------------------------| | |
|'-> |2220-26(07)|∠∡∢∣∤∥∦|------------------------| | |
|'-> |222C-47(28)|∬∭∮∯∰∱∲∳∴∵∶∷∸∹∺∻∼∽∾∿≀≁≂≃≄≅≆≇|---| | |
|'-> |2249-58(16)|≉≊≋≌≍≎≏≐≑≒≓≔≕≖≗≘|---------------| | |
|'-> |225A-5F(06)|≚≛≜≝≞≟|-------------------------| | |
|'-> |2262-63(02)|≢≣|-----------------------------| | |
|'-> |2266-81(28)|≦≧≨≩≪≫≬≭≮≯≰≱≲≳≴≵≶≷≸≹≺≻≼≽≾≿⊀⊁|---| | |
|'-> |2284-94(17)|⊄⊅⊆⊇⊈⊉⊊⊋⊌⊍⊎⊏⊐⊑⊒⊓⊔|--------------| | |
|'-> |2296-96(01)|⊖|------------------------------| | |
|'-> |2298-B7(32)|⊘⊙⊚⊛⊜⊝⊞⊟⊠⊡⊢⊣⊤⊥⊦⊧⊨⊩⊪⊫⊬⊭⊮⊯⊰⊱⊲⊳⊴⊵⊶⊷| | |
|'-> |22B8-D7(32)|⊸⊹⊺⊻⊼⊽⊾⊿⋀⋁⋂⋃⋄⋅⋆⋇⋈⋉⋊⋋⋌⋍⋎⋏⋐⋑⋒⋓⋔⋕⋖⋗| | |
|'-> |22D8-F1(26)|⋘⋙⋚⋛⋜⋝⋞⋟⋠⋡⋢⋣⋤⋥⋦⋧⋨⋩⋪⋫⋬⋭⋮⋯⋰⋱|-----| | |
|'-> |2300-01(02)|⌀⌁|-----------------------------| | |
|'-> |2303-0F(13)|⌃⌄⌅⌆⌇⌈⌉⌊⌋⌌⌍⌎⌏|------------------| | |
|'-> |2311-1F(15)|⌑⌒⌓⌔⌕⌖⌗⌘⌙⌚⌛⌜⌝⌞⌟|----------------| | |
|'-> |2322-28(07)|⌢⌣⌤⌥⌦⌧⌨|------------------------| | |
|'-> |232B-4A(32)|⌫⌬⌭⌮⌯⌰⌱⌲⌳⌴⌵⌶⌷⌸⌹⌺⌻⌼⌽⌾⌿⍀⍁⍂⍃⍄⍅⍆⍇⍈⍉⍊| | |
|'-> |234B-6A(32)|⍋⍌⍍⍎⍏⍐⍑⍒⍓⍔⍕⍖⍗⍘⍙⍚⍛⍜⍝⍞⍟⍠⍡⍢⍣⍤⍥⍦⍧⍨⍩⍪| | |
|'-> |236B-7B(17)|⍫⍬⍭⍮⍯⍰⍱⍲⍳⍴⍵⍶⍷⍸⍹⍺⍻|--------------| | |
|'-> |237D-9A(30)|⍽⍾⍿⎀⎁⎂⎃⎄⎅⎆⎇⎈⎉⎊⎋⎌⎍⎎⎏⎐⎑⎒⎓⎔⎕⎖⎗⎘⎙⎚|-| | |
|'-> |2440-4A(11)|⑀⑁⑂⑃⑄⑅⑆⑇⑈⑉⑊|--------------------| | |
|'-> |2501-01(01)|━|------------------------------| | |
|'-> |2503-0B(09)|┃┄┅┆┇┈┉┊┋|----------------------| | |
|'-> |250D-0F(03)|┍┎┏|----------------------------| | |
|'-> |2511-13(03)|┑┒┓|----------------------------| | |
|'-> |2515-17(03)|┕┖┗|----------------------------| | |
|'-> |2519-1B(03)|┙┚┛|----------------------------| | |
|'-> |251D-23(07)|┝┞┟┠┡┢┣|------------------------| | |
|'-> |2525-2B(07)|┥┦┧┨┩┪┫|------------------------| | |
|'-> |252D-33(07)|┭┮┯┰┱┲┳|------------------------| | |
|'-> |2535-3B(07)|┵┶┷┸┹┺┻|------------------------| | |
|'-> |253D-4F(19)|┽┾┿╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏|------------| | |
|'-> |256D-7F(19)|╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿|------------| | |
|'-> |2581-83(03)|▁▂▃|----------------------------| | |
|'-> |2585-87(03)|▅▆▇|----------------------------| | |
|'-> |2589-8B(03)|▉▊▋|----------------------------| | |
|'-> |258D-8F(03)|▍▎▏|----------------------------| | |
|'-> |2594-95(02)|▔▕|-----------------------------| | |
|'-> |25A2-A9(08)|▢▣▤▥▦▧▨▩|-----------------------| | |
|'-> |25AD-B1(05)|▭▮▯▰▱|--------------------------| | |
|'-> |25B3-B9(07)|△▴▵▶▷▸▹|------------------------| | |
|'-> |25BB-BB(01)|▻|------------------------------| | |
|'-> |25BD-C3(07)|▽▾▿◀◁◂◃|------------------------| | |
|'-> |25C5-C9(05)|◅◆◇◈◉|--------------------------| | |
|'-> |25CC-CE(03)|◌◍◎|----------------------------| | |
|'-> |25D0-D7(08)|◐◑◒◓◔◕◖◗|-----------------------| | |
|'-> |25DA-E5(12)|◚◛◜◝◞◟◠◡◢◣◤◥|-------------------| | |
|'-> |25E7-F7(17)|◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷|--------------| | |
|'-> |2600-13(20)|☀☁☂☃☄★☆☇☈☉☊☋☌☍☎☏☐☑☒☓|-----------| | |
|'-> |2619-38(32)|☙☚☛☜☝☞☟☠☡☢☣☤☥☦☧☨☩☪☫☬☭☮☯☰☱☲☳☴☵☶☷☸| | |
|'-> |2639-39(01)|☹|------------------------------| | |
|'-> |263D-3F(03)|☽☾☿|----------------------------| | |
|'-> |2641-41(01)|♁|------------------------------| | |
|'-> |2643-5F(29)|♃♄♅♆♇♈♉♊♋♌♍♎♏♐♑♒♓♔♕♖♗♘♙♚♛♜♝♞♟|--| | |
|'-> |2661-62(02)|♡♢|-----------------------------| | |
|'-> |2664-64(01)|♤|------------------------------| | |
|'-> |2667-69(03)|♧♨♩|----------------------------| | |
|'-> |266C-71(06)|♬♭♮♯♰♱|-------------------------| | |
|'-> |FB00-00(01)|ff|------------------------------| | |
|'-> |FB03-06(04)|ffifflſtst|---------------------------| | |
|'-> |FB13-17(05)|ﬓﬔﬕﬖﬗ|--------------------------| | |
|'-> |FE20-23(04)|︠︡︢︣|---------------------------| | |
|'-> |FFF9-FC(04)||---------------------------| | |
MES-3KS is defined as: | |
>00 20-7E A0-FF | |
>01 00-81 8B-8C 8F 92 95 9A-9B 9E-9F A2-A3 A6 AA-AB B5-BB BE-CC D5-D6 DE-F7 | |
>01 FA-FF | |
>02 00-1F 24-27 2A-33 50-AD B0-EE | |
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3 | |
>04 00-86 88-89 8C-8F 90-C4 C7-C8 CB-CC D0-ED EE-F5 F8-F9 | |
>05 31-56 59-5F 61-87 89-8A | |
>10 D0-F6 FB | |
>1E 00-9B F2-F3 | |
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF | |
>1F F2-F4 F6-FE | |
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3 | |
>21 00-3A 53-83 90-F3 | |
>22 00-F1 | |
>23 00-7B 7D-9A | |
>24 40-4A | |
>25 00-95 A0-F7 | |
>26 00-13 19-6F 70-71 | |
>FB 00-06 13-17 | |
>FE 20-23 | |
>FF F9-FD | |
>#2671 | |
Here are the characters that are missing from MES-3KS but are included in | |
MES-3B: | |
|MES-3B |0182-8A(09)|ƂƃƄƅƆƇƈƉƊ|----------------------| | |
|'-> |018D-8E(02)|ƍƎ|-----------------------------| | |
|'-> |0190-91(02)|ƐƑ|-----------------------------| | |
|'-> |0193-94(02)|ƓƔ|-----------------------------| | |
|'-> |0196-99(04)|ƖƗƘƙ|---------------------------| | |
|'-> |019C-9D(02)|ƜƝ|-----------------------------| | |
|'-> |01A0-A1(02)|Ơơ|-----------------------------| | |
|'-> |01A4-A5(02)|Ƥƥ|-----------------------------| | |
|'-> |01A7-A9(03)|ƧƨƩ|----------------------------| | |
|'-> |01AC-B4(09)|ƬƭƮƯưƱƲƳƴ|----------------------| | |
|'-> |01BC-BD(02)|Ƽƽ|-----------------------------| | |
|'-> |01CD-D4(08)|ǍǎǏǐǑǒǓǔ|-----------------------| | |
|'-> |01D7-DD(07)|ǗǘǙǚǛǜǝ|------------------------| | |
|'-> |01F8-F9(02)|Ǹǹ|-----------------------------| | |
|'-> |0222-23(02)|Ȣȣ|-----------------------------| | |
|'-> |0228-29(02)|Ȩȩ|-----------------------------| | |
|'-> |1EA0-BF(32)|ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾế| | |
|'-> |1EC0-DF(32)|ỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞở| | |
|'-> |1EE0-F1(18)|ỠỡỢợỤụỦủỨứỪừỬửỮữỰự|-------------| | |
|'-> |1EF4-F9(06)|ỴỵỶỷỸỹ|-------------------------| | |
MES-3B is defined as: | |
>00 20-7E A0-FF | |
>01 00-FF | |
>02 00-1F 22-33 50-AD B0-EE | |
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3 | |
>04 00-86 88-89 8C-C4 C7-C8 CB-CC D0-F5 F8-F9 | |
>05 31-56 59-5F 61-87 89-8A | |
>10 D0-F6 FB | |
>1E 00-9B A0-F9 | |
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF | |
>1F F2-F4 F6-FE | |
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3 | |
>21 00-3A 53-83 90-F3 | |
>22 00-F1 | |
>23 00-7B 7D-9A | |
>24 40-4A | |
>25 00-95 A0-F7 | |
>26 00-13 19-71 | |
>FB 00-06 13-17 | |
>FE 20-23 | |
>FF F9-FD | |
>#2819 | |
Some unrelated subsets: | |
~~~~~~~~~~~~~~~~~~~~~~~ | |
There are also subsets used by Adobe to define glyph names, which are AGL and | |
AGLFN. Not shown here (yet), but (for now) only their definition is provided | |
for reference: | |
AGLFN: | |
>00 20-7E A1-AC AE-B1 B4-B4 B6-B8 BA-FF | |
>01 00-7F 92-92 A0-A1 AF-B0 E6-E7 FA-FF | |
>02 18-19 BC-BD C6-C7 D8-DD | |
>03 00-01 03-03 09-09 23-23 84-8A 8C-8C 8E-A1 A3-CE D1-D2 D5-D6 | |
>04 01-0C 0E-4F 51-5C 5E-5F 62-63 72-75 90-91 D9-D9 | |
>05 B0-B9 BB-C3 D0-EA F0-F2 | |
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6A 6D-6D 79-79 7E-7E 86-86 88-88 91-91 | |
>06 98-98 A4-A4 AF-AF BA-BA D2-D2 D5-D5 | |
>1E 80-85 F2-F3 | |
>20 0C-0F 12-15 17-1E 20-22 24-26 2C-2E 30-30 32-33 39-3A 3C-3C 44-44 A1-A1 | |
>20 A3-A4 A7-A7 AA-AC | |
>21 05-05 11-11 13-13 16-16 18-18 1C-1C 1E-1E 22-22 2E-2E 35-35 53-54 5B-5E | |
>21 90-95 A8-A8 B5-B5 D0-D4 | |
>22 00-00 02-03 05-05 07-09 0B-0B 0F-0F 11-12 17-17 1A-1A 1D-20 27-2B 34-34 | |
>22 3C-3C 45-45 48-48 60-61 64-65 82-84 86-87 95-95 97-97 A5-A5 C5-C5 | |
>23 02-02 10-10 20-21 29-2A | |
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C | |
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 AA-AC B2-B2 BA-BA BC-BC C4-C4 CA-CB | |
>25 CF-CF D8-D9 E6-E6 | |
>26 3A-3C 40-40 42-42 60-60 63-63 65-66 6A-6B | |
>#?835 | |
AGL: | |
>00 01-7F A0-FF | |
>01 00-F5 FA-FF | |
>02 00-19 50-61 63-69 6B-73 75-75 77-7F 81-8E 90-98 9A-9B 9D-9E A0-A8 B0-B2 | |
>02 B4-DE E0-E0 E3-E9 | |
>03 00-25 27-45 60-61 74-75 7A-7A 7E-7E 84-8A 8C-8C 8E-A1 A3-CE D0-D6 DA-DA | |
>03 DC-DC DE-DE E0-E0 E2-F3 | |
>04 01-0C 0E-4F | |
>05 31-87 89-89 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9 | |
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6D 79-79 7E-7E 86-86 88-88 91-91 98-98 | |
>06 A4-A4 AF-AF BA-BA C1-C1 D1-D2 D5-D5 F0-F9 | |
>09 01-03 05-39 3C-4D 50-54 58-70 81-83 85-8C 8F-90 93-A8 AA-B0 B2-B2 B6-B9 | |
>09 BC-BC BE-C4 C7-C8 CB-CD D7-D7 DC-DD DF-E3 E6-FA | |
>0A 02-02 05-0A 0F-10 13-28 2A-30 32-32 35-36 38-39 3C-3C 3E-42 47-48 4B-4D | |
>0A 59-5C 5E-5E 66-74 81-83 85-8B 8D-8D 8F-91 93-A8 AA-B0 B2-B3 B5-B9 BC-BC | |
>0A BE-C5 C7-C9 CB-CD D0-D0 E0-E0 E6-EF | |
>0E 01-3A 3F-5B | |
>1E 00-9B A0-F9 | |
>20 02-02 0B-10 12-1E 20-22 24-26 2C-2E 30-30 32-33 35-35 39-3C 3E-3E 42-42 | |
>20 44-44 70-70 74-7A 7C-89 8D-8E A1-A4 A7-A7 A9-AC | |
>21 03-03 05-05 09-09 11-11 13-13 16-16 18-18 1C-1C 1E-1E 21-22 26-26 2B-2B | |
>21 2E-2E 35-35 53-54 5B-5E 60-6B 70-7B 90-99 A8-A8 B5-B5 BC-BC C0-C0 C4-C6 | |
>21 CD-CD CF-D4 DE-EA | |
>22 00-00 02-03 05-09 0B-0C 0F-0F 11-13 15-15 17-17 19-1A 1D-20 23-23 25-2C | |
>22 2E-2E 34-37 3C-3D 43-43 45-45 48-48 4C-4C 50-53 60-62 64-67 6A-6B 6E-73 | |
>22 76-77 79-7B 80-87 8A-8B 95-97 99-99 A3-A5 BF-BF C5-C5 CE-CF DA-DB EE-EE | |
>23 02-03 05-05 10-10 12-12 18-18 20-21 25-27 29-2B | |
>24 23-23 60-E9 | |
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C | |
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 A3-AC B2-B7 B9-BA BC-BD BF-C1 C3-C4 | |
>25 C6-CC CE-D1 D8-D9 E2-E6 EF-EF | |
>26 05-06 0E-0F 1C-1F 2F-2F 3A-3C 40-42 60-6D 6F-6F | |
>27 13-13 8A-92 9E-9E | |
>30 00-19 1C-1E 20-29 36-36 41-94 9B-9E A1-FE | |
>31 05-29 31-8E | |
>32 00-1C 20-40 42-43 60-7B 7F-7F 8A-90 94-94 96-96 98-99 9D-9E A3-A9 | |
>33 00-00 03-03 05-05 0D-0D 14-16 18-18 1E-1E 22-23 26-27 2A-2B 31-31 33-33 | |
>33 36-36 39-39 3B-3B 42-42 47-47 49-4A 4D-4E 51-51 57-57 7B-CB CD-D6 D8-D8 | |
>33 DB-DD | |
>53 44-44 | |
>F6 BE-C0 C3-FF | |
>F7 21-21 24-24 26-26 30-39 3F-3F 60-7A A1-A2 A8-A8 AF-AF B4-B4 B8-B8 BF-BF | |
>F7 E0-F6 F8-FF | |
>F8 84-99 E5-FF | |
>FB 00-04 1F-20 2A-36 38-3C 3E-3E 40-41 43-44 46-4F 57-59 67-69 6B-6D 7B-7D | |
>FB 89-89 8B-8B 8D-8D 93-95 9F-9F A4-A5 A7-A9 AF-AF | |
>FC 08-08 0B-0C 0E-0E 48-48 4B-4B 4E-4E 58-58 5E-62 6D-6D 73-73 8D-8D 94-94 | |
>FC 9F-9F A1-A2 A4-A4 C9-CC D1-D2 D5-D5 DD-DD | |
>FD 3E-3F 88-88 F2-F2 FA-FA | |
>FE 30-44 49-50 52-52 54-55 59-5F 61-66 69-6B 82-82 84-84 86-86 88-88 8A-8C | |
>FE 8E-8E 90-92 94-94 96-98 9A-9C 9E-A0 A2-A4 A6-A8 AA-AA AC-AC AE-AE B0-B0 | |
>FE B2-B4 B6-B8 BA-BC BE-C0 C2-C4 C6-C8 CA-CC CE-D0 D2-D4 D6-D8 DA-DC DE-E0 | |
>FE E2-E4 E6-E8 EA-EC EE-EE F0-F0 F2-FC FF-FF | |
>FF 01-5E 61-9F E0-E1 E3-E3 E5-E6 | |
>#?3548 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.io.*; | |
import java.util.regex.*; | |
public class SubsetTextVerifier { | |
static String line; | |
static int lineNumber; | |
public static void main(String[] args) throws IOException { | |
BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\Michi\\Desktop\\subsets.txt")); | |
boolean[][] seen = new boolean[256][], nostars = new boolean[256][], current = new boolean[256][]; | |
Pattern rulePattern = Pattern.compile("([0-9A-F]{4}-[0-9A-F]{2}(?:,[0-9A-F]{2}-[0-9A-F]{2})*)\\(([0-9]{2})\\)"); | |
Pattern definitionPattern = Pattern.compile(">[0-9A-F]{2}( [0-9A-F]{2}(-[0-9A-F]{2})?)+"); | |
while ((line = br.readLine()) != null) { | |
lineNumber++; | |
if (line.startsWith("|")) { | |
if (line.length() != 78) | |
fail("Invalid line length " + line.length()); | |
String[] parts = line.split("\\|", 4); | |
boolean star = parts[1].contains("(*)"); | |
Matcher m = rulePattern.matcher(parts[2]); | |
if (!m.matches()) | |
fail("Invalid rule: '" + parts[2] + "'"); | |
int count = Integer.parseInt(m.group(2)); | |
StringBuilder chars = new StringBuilder(); | |
String ranges = m.group(1); | |
int base = Integer.parseInt(ranges.substring(0, 2), 16) * 0x100; | |
for (int i = 2; i < ranges.length(); i += 6) { | |
int from = Integer.parseInt(ranges.substring(i, i + 2), 16); | |
int to = Integer.parseInt(ranges.substring(i + 3, i + 5), 16); | |
for (int j = from; j <= to; j++) { | |
char ch = (char) (base + j); | |
chars.append(ch); | |
addChar(seen, ch); | |
if (!star) | |
addChar(nostars, ch); | |
} | |
} | |
if (chars.length() != count) | |
fail("Invalid count: " + count + " (should be " + chars.length() + ")"); | |
chars.append('|'); | |
while (chars.length() < 32) | |
chars.append('-'); | |
if (chars.length() < 33) | |
chars.append('|'); | |
if (!chars.toString().equals(parts[3])) | |
fail("Invalid character list '" + parts[3] + "' should be '" + chars + "'"); | |
} else if (line.startsWith(">#")) { | |
boolean[][] check = seen; | |
if (line.charAt(2) == '*') { | |
check = nostars; | |
line = line.substring(1); | |
} else if (line.charAt(2) == '?') { | |
check = current; | |
line = line.substring(1); | |
} | |
int count = Integer.parseInt(line.substring(2)); | |
for (int i = 0; i < check.length; i++) { | |
if (check[i] == null ^ current[i] == null) { | |
fail("U+" + Integer.toHexString(i) + "xx is missing from " + (check[i] != null ? "rules" : "definitions")); | |
} | |
if (check[i] == null) | |
continue; | |
for (int j = 0; j < check[i].length; j++) { | |
if (check[i][j] != current[i][j]) | |
fail("U+" + Integer.toHexString(i * 0x100 + j) + " is missing from " + (check[i][j] ? "rules" : "definitions")); | |
if (check[i][j]) | |
count--; | |
} | |
} | |
if (count != 0) | |
fail("Count off by " + count); | |
current = new boolean[256][]; | |
} else if (line.startsWith(">")) { | |
if (!definitionPattern.matcher(line).matches()) | |
fail("Invalid definition"); | |
int base = Integer.parseInt(line.substring(1, 3), 16) * 0x100; | |
for (int i = 4; i < line.length(); i += 3) { | |
int from = Integer.parseInt(line.substring(i, i + 2), 16), to = from; | |
if (i + 2 < line.length() && line.charAt(i + 2) == '-') { | |
i += 3; | |
to = Integer.parseInt(line.substring(i, i + 2), 16); | |
} | |
for (int j = from; j <= to; j++) { | |
addChar(current, (char) (base + j)); | |
} | |
} | |
} | |
} | |
br.close(); | |
} | |
private static void addChar(boolean[][] flags, char ch) throws IOException { | |
if (flags[ch >> 8] == null) | |
flags[ch >> 8] = new boolean[256]; | |
if (flags[ch >> 8][ch & 0xFF]) | |
fail("Add char twice: U+" + Integer.toHexString(ch)); | |
flags[ch >> 8][ch & 0xFF] = true; | |
} | |
private static IOException fail(String message) throws IOException { | |
throw new IOException("In line " + lineNumber + ": " + line + "\r\n" + message); | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment