Created
August 20, 2014 23:07
-
-
Save catawbasam/3ab68615b4c78a5a49b1 to your computer and use it in GitHub Desktop.
Julia Char predicates draft
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"language": "Julia", | |
"name": "", | |
"signature": "sha256:e80d1a82f7cdd977679c61b2cbabab28b09bb50363bbf45c783529686408d275" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Character class predicates based on libmojibake/utf8proc.jl\n", | |
"### with reference to Haskell Char, Go unicode, and perl unicode\n", | |
"\n", | |
"Haskell references:\n", | |
"* http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-Char.html \n", | |
"* http://hackage.haskell.org/package/base-4.7.0.1/docs/src/Data-Char.html#isMark \n", | |
"* http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-Unicode.html#isAlpha\n", | |
"\n", | |
"Go reference: http://golang.org/pkg/unicode/#IsPrint\n", | |
"\n", | |
"perl reference: http://search.cpan.org/~arc/perl-5.17.8/pod/perlunicode.pod\n", | |
"\n", | |
"\n", | |
"Most of the functions below are based on Unicode character categories. Exceptions: `isdigit`, `iscntrl`, `isspace`.\n", | |
"\n", | |
"The tests below add `isnumber`, which returns true for a broad range of numeric characters in contrast to the narrow range selected by `isdigit`. `isnumber` is a numeric-only counterpart to `isalnum`.\n", | |
"\n", | |
"Haskell Char provides other functions that might be of interest in Julia, for example `isMark` and `isSymbol`." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### libmojibake utf8 category constants" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"\n", | |
"const UTF8PROC_CATEGORY_LU = 1 # Lu: Letter, Uppercase\n", | |
"const UTF8PROC_CATEGORY_LL = 2 # Ll: Letter, Lowercase\n", | |
"const UTF8PROC_CATEGORY_LT = 3 # Lt: Letter, Titlecase\n", | |
"const UTF8PROC_CATEGORY_LM = 4 # Lm: Letter, Modifier\n", | |
"const UTF8PROC_CATEGORY_LO = 5 # Lo: Letter, Other\n", | |
"const UTF8PROC_CATEGORY_MN = 6 # Mn: Mark, Non-Spacing\n", | |
"const UTF8PROC_CATEGORY_MC = 7 # Mc: Mark, Spacing Combining\n", | |
"const UTF8PROC_CATEGORY_ME = 8 # Me: Mark, Enclosing\n", | |
"const UTF8PROC_CATEGORY_ND = 9 # Nd: Number, Decimal\n", | |
"const UTF8PROC_CATEGORY_NL = 10 # Nl: Number, Letter\n", | |
"const UTF8PROC_CATEGORY_NO = 11 # No: Number, Other\n", | |
"const UTF8PROC_CATEGORY_PC = 12 # Pc: Punctuation, Connector\n", | |
"const UTF8PROC_CATEGORY_PD = 13 # Pd: Punctuation, Dash \n", | |
"const UTF8PROC_CATEGORY_PS = 14 # Ps: Punctuation, Open\n", | |
"const UTF8PROC_CATEGORY_PE = 15 # Pe: Punctuation, Close\n", | |
"const UTF8PROC_CATEGORY_PI = 16 # Pi: Punctuation, Initial Quote\n", | |
"const UTF8PROC_CATEGORY_PF = 17 # Pf: Punctuation, Final Quote\n", | |
"const UTF8PROC_CATEGORY_PO = 18 # Po: Punctuation, Other\n", | |
"const UTF8PROC_CATEGORY_SM = 19 # Sm: Symbol, Math\n", | |
"const UTF8PROC_CATEGORY_SC = 20 # Sc: Symbol, Currency\n", | |
"const UTF8PROC_CATEGORY_SK = 21 # Sk: Symbol, Modifier\n", | |
"const UTF8PROC_CATEGORY_SO = 22 # So: Symbol, Other\n", | |
"const UTF8PROC_CATEGORY_ZS = 23 # Zs: Separator, Space\n", | |
"const UTF8PROC_CATEGORY_ZL = 24 # Zl: Separator, Line\n", | |
"const UTF8PROC_CATEGORY_ZP = 25 # Zp: Separator, Paragraph\n", | |
"const UTF8PROC_CATEGORY_CC = 26 # Cc: Other, Control\n", | |
"const UTF8PROC_CATEGORY_CF = 27 # Cf: Other, Format\n", | |
"const UTF8PROC_CATEGORY_CS = 28 # Cs: Other, Surrogate\n", | |
"const UTF8PROC_CATEGORY_CO = 29 # Co: Other, Private Use\n", | |
"const UTF8PROC_CATEGORY_CN = 30 # Cn: Other, No Assigned" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 1, | |
"text": [ | |
"30" | |
] | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# category_code modified to ignore case of unassigned c\n", | |
"function category_code_assigned(c)\n", | |
" c > 0x10FFFF && return 0x0000 # see utf8proc_get_property docs\n", | |
" cat = unsafe_load(ccall(:utf8proc_get_property, Ptr{Uint16}, (Int32,), c))\n", | |
" # note: utf8proc returns 0, not UTF8PROC_CATEGORY_CN, for unassigned c\n", | |
" #cat == 0 ? 30 : cat # adds time and not needed by character class predicates\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"category_code_assigned (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### islower / isupper -- follow Haskell \n", | |
"* TitleCase characters return true from `isupper`.\n", | |
"* Julia's isupper appears to follow this convention already." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"islower_moji(c::Char) = (category_code(c)==UTF8PROC_CATEGORY_LL)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"islower_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isUpper() -- uses uppercase + titlecase\n", | |
"function isupper_moji(c::Char) \n", | |
" ccode=category_code_assigned(c)\n", | |
" return ccode==UTF8PROC_CATEGORY_LU || ccode==UTF8PROC_CATEGORY_LT\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"isupper_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isupper_moji('H'), isupper_moji('h'), isupper_moji('\u0394'), isupper_moji('\u03b4')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"(true,false,true,false)" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"#TitleCase example\n", | |
"DZ = char(0x0001C5)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"'\u01c5'" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isupper(DZ), isupper_moji(DZ)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
"(true,true)" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### isalpha, isdigit, isnumber, isalnum -- follow Haskell" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isLetter()\n", | |
"function isalpha_moji(c::Char)\n", | |
" ccode=category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_LU <= ccode <= UTF8PROC_CATEGORY_LO) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"isalpha_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isalpha_moji('H'), isalpha_moji(' '), isalpha_moji('4'), isalpha_moji('\u221a')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"(true,false,false,false)" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isDigit() -- ASCII '0'-'9'\n", | |
"function isdigit_moji(c::Char)\n", | |
" return '0' <= c <= '9'\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
"isdigit_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isdigit_moji('3'), isdigit_moji('a'), isdigit_moji('\u03b4') " | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"(true,false,false)" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isNumber()\n", | |
"function isnumber_moji(c::Char)\n", | |
" ccode=category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_ND <= ccode <= UTF8PROC_CATEGORY_NO) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
"isnumber_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isnumber_moji('0'), isnumber_moji('g'), isnumber_moji('\u22d2') " | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 13, | |
"text": [ | |
"(true,false,false)" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"arabicnum = char(0x0663)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 14, | |
"text": [ | |
"'\u0663'" | |
] | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isnumber_moji(arabicnum)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 15, | |
"text": [ | |
"true" | |
] | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isAlphaNum()\n", | |
"function isalnum_moji(c::Char)\n", | |
" ccode=category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_LU <= ccode <= UTF8PROC_CATEGORY_LO) ||\n", | |
" (UTF8PROC_CATEGORY_ND <= ccode <= UTF8PROC_CATEGORY_NO)\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 16, | |
"text": [ | |
"isalnum_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isalnum_moji('0'), isalnum_moji('B'), isalnum_moji('\u03a3'), isalnum_moji('\u221a'), " | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 17, | |
"text": [ | |
"(true,true,true,false)" | |
] | |
} | |
], | |
"prompt_number": 17 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### iscntrl, ispunct -- follow Haskell" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"#= \n", | |
"Haskell isControl: \"Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.\n", | |
"\n", | |
"Go IsControl: \"IsControl reports whether the rune is a control character. \n", | |
" The C (Other) Unicode category includes more code points such as surrogates;\n", | |
" use Is(C, r) to test for them.\"\n", | |
"=#\n", | |
"function iscntrl_moji(c::Char)\n", | |
" #http://en.wikipedia.org/wiki/Control_characters\n", | |
" return (uint(c)< 0x1f || 0x7f<=uint(c)<=0x9f) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 18, | |
"text": [ | |
"iscntrl_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 18 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"iscntrl_moji('\\t'), iscntrl_moji('Z') " | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 19, | |
"text": [ | |
"(true,false)" | |
] | |
} | |
], | |
"prompt_number": 19 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# follows Haskell's isPunctuation()\n", | |
"function ispunct_moji(c::Char)\n", | |
" ccode=category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_PC <= ccode <= UTF8PROC_CATEGORY_PO) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 20, | |
"text": [ | |
"ispunct_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 20 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ispunct_moji(','), ispunct_moji('!'), ispunct_moji('8')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 21, | |
"text": [ | |
"(true,true,false)" | |
] | |
} | |
], | |
"prompt_number": 21 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"upunct = char(0x002021)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 22, | |
"text": [ | |
"'\u2021'" | |
] | |
} | |
], | |
"prompt_number": 22 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ispunct(upunct), ispunct_moji(upunct)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 23, | |
"text": [ | |
"(true,true)" | |
] | |
} | |
], | |
"prompt_number": 23 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### isspace: follows Go\n", | |
" Go includes newline and non-breaking space, unlike Haskell and Julia's current isspace. \n", | |
" **This definition is breaking with respect to newline and non-breaking space.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"const NEL = char(0x000085)\n", | |
"const NBSP = char(0x0000A0)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 24, | |
"text": [ | |
"'\u00a0'" | |
] | |
} | |
], | |
"prompt_number": 24 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# \n", | |
"# Haskell isSpace: Returns True for any Unicode space character, and the control characters \\t, \\n, \\r, \\f, \\v.\n", | |
"#= Go IsSpace\n", | |
"\"IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is\n", | |
"\n", | |
"'\\t', '\\n', '\\v', '\\f', '\\r', ' ', U+0085 (NEL), U+00A0 (NBSP).\n", | |
"Other definitions of spacing characters are set by category Z and property Pattern_White_Space.\"\n", | |
"=#\n", | |
"\n", | |
"function isspace_moji(c::Char)\n", | |
" return c in (' ','\\t','\\n','\\r','\\f','\\v', NEL, NBSP) || \n", | |
" category_code_assigned(c)==UTF8PROC_CATEGORY_ZS\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 25, | |
"text": [ | |
"isspace_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 25 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"category_code_assigned('\\t')==UTF8PROC_CATEGORY_ZS" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 26, | |
"text": [ | |
"false" | |
] | |
} | |
], | |
"prompt_number": 26 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"* Julia isspace() currently returns false for NEL on both Windows and Linux\n", | |
"* for NBSP it returns true on Windows but false on linux" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isspace_moji(' '), isspace_moji('\\n'), isspace_moji('T') " | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 27, | |
"text": [ | |
"(true,true,false)" | |
] | |
} | |
], | |
"prompt_number": 27 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isspace(NEL), isspace_moji(NEL), isspace(NBSP), isspace_moji(NBSP)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 28, | |
"text": [ | |
"(false,true,true,true)" | |
] | |
} | |
], | |
"prompt_number": 28 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### isprint - follows perl and current Julia manual definition\n", | |
"* Julia's isprint() is currently buggy on Windows (e.g. '\\t' returns true)\n", | |
"* Haskell Char does not have `isGraph`. It does have `isPrint`. \n", | |
"* Go's does have `isGraphic`, and its Unicode docs are clearer for these. \n", | |
"* From perl: \"\\p{Print} This matches any character that is graphical or blank, except controls.\" " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"* On Windows, isprint('\\t') returns true (incorrectly)\n", | |
"* On Linux, isprint('\\t') returns false (correctly)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"#= Julia's isprint does not match any of these currently.\n", | |
" Haskell isprint: \"Selects printable Unicode characters \n", | |
" (letters, numbers, marks, punctuation, symbols and spaces).\"\n", | |
" Go: \"IsPrint reports whether the rune is defined as printable by Go. \n", | |
" Such characters include letters, marks, numbers, punctuation, symbols, and the ASCII space character, \n", | |
" from categories L, M, N, P, S and the ASCII space character. \n", | |
" This categorization is the same as IsGraphic \n", | |
" except that the only spacing character is ASCII space, U+0020.\"\n", | |
"\n", | |
" From perl: \"\\p{Print} This matches any character that is graphical or blank, except controls.\"\n", | |
"=#\n", | |
"# Julia isprint: includes spaces\n", | |
"function isprint_moji(c::Char)\n", | |
" ccode=category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_LU <= ccode <= UTF8PROC_CATEGORY_ZS) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 29, | |
"text": [ | |
"isprint_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 29 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isprint('a'), isprint(' '), isprint('\\t')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 30, | |
"text": [ | |
"(true,true,true)" | |
] | |
} | |
], | |
"prompt_number": 30 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isprint_moji('a'), isprint_moji(' '), isprint_moji('\\t')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 31, | |
"text": [ | |
"(true,true,false)" | |
] | |
} | |
], | |
"prompt_number": 31 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### isgraph - follows perl and current Julia manual definition\n", | |
"* From perl: \"\\p{Graph} Matches any character that is graphic. \n", | |
" Theoretically, this means a character that on a printer would cause ink to be used.\"\n", | |
"* Julia isgraph: excludes spaces " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"function isgraph_moji(c::Char)\n", | |
" category_code_assigned(c)\n", | |
" return (UTF8PROC_CATEGORY_LU <= ccode <= UTF8PROC_CATEGORY_SO) \n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 32, | |
"text": [ | |
"isgraph_moji (generic function with 1 method)" | |
] | |
} | |
], | |
"prompt_number": 32 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isgraph('a'), isgraph(' '), isgraph('\\t')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 33, | |
"text": [ | |
"(true,false,false)" | |
] | |
} | |
], | |
"prompt_number": 33 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"isgraph_moji('a'), isgraph_moji(' '), isgraph_moji('\\t')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "LoadError", | |
"evalue": "ccode not defined\nwhile loading In[34], in expression starting on line 1", | |
"output_type": "pyerr", | |
"traceback": [ | |
"ccode not defined\nwhile loading In[34], in expression starting on line 1", | |
" in isgraph_moji at In[32]:3" | |
] | |
} | |
], | |
"prompt_number": 34 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
";ipython nbconvert Character_Classes.ipynb" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stderr", | |
"text": [ | |
"[NbConvertApp] Using existing profile dir: u'C:\\\\Users\\\\keithc\\\\.ipython\\\\profile_default'\r\n" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stderr", | |
"text": [ | |
"[NbConvertApp] Converting notebook Character_Classes.ipynb to html\r\n", | |
"[NbConvertApp] Support files will be in Character_Classes_files\\\r\n" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stderr", | |
"text": [ | |
"[NbConvertApp] Loaded template full.tpl\r\n" | |
] | |
}, | |
{ | |
"output_type": "stream", | |
"stream": "stderr", | |
"text": [ | |
"[NbConvertApp] Writing 231801 bytes to Character_Classes.html\r\n" | |
] | |
} | |
], | |
"prompt_number": 35 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 36 | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment