Skip to content

Instantly share code, notes, and snippets.

View andjc's full-sized avatar

Andj andjc

  • Melbourne, Australia
View GitHub Profile
###
# Downloads and parses https://lh.2xlibre.net/locales/ into a
# JSON file split into the following fields:
# - code: locale code, i.e. 'en_GB'
# - suffix: locale code suffix, i.e. 'latin' from 'be_BY'
# - name: locale name, i.e. 'English' from 'en_GB'
# - country: locale country 'title'lized, i.e. 'United Kingdom' from 'en_GB'
# Settings as on where to save the html file and locale file can be found below
###
@andjc
andjc / case_insensitive_sort.py
Last active January 6, 2025 08:56
Language and locale insensitive, case insensitive sort.
# Python functions to improve sorting of text in alphabetic scripts.
# Copyright 2025 Enabling Languages
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the “Software”), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is furnished to do so, subject
# to the following conditions:
#
@andjc
andjc / marc8_eacc.md
Created May 22, 2024 07:05
MARC-8 and EACC
@andjc
andjc / bidi_isolate.py
Created April 24, 2024 08:33
repair malformed bidi strings and add to an isolated embedding.
####################################################################################################
#
# Bidi isolation: bidiIsolate()
# Enabling Languages Python port of unicodeBidi.ts: https://github.com/signalapp/Signal-Desktop/blob/ce0fb220411b97722e1e080c14faa65d23165784/ts/util/unicodeBidi.ts
# Original code by Signal Messenger, LLC
# Released under AGPL 3.0 license
#
####################################################################################################
import regex
@andjc
andjc / installation_commands.txt
Last active April 21, 2024 13:40
Setup instructions for a Ubuntu 23.10 hashicorp vagrant box
sudo apt-get update
sudo apt-get install -y unzip git cmake python3-pip python3.11-venv libfreetype6-dev libharfbuzz-dev libfribidi-dev meson gtk-doc-tools libcairo2-dev libfontconfig-dev libjpeg-dev zlib1g-dev libpng-dev libtiff5-dev libfreetype6-dev liblcms2-dev libwebp-dev libxcb1-dev
mkdir ~/tmp
cd tmp
git clone https://github.com/HOST-Oman/libraqm.git
git clone https://github.com/ninja-build/ninja.git
cd ninja
./configure.py --bootstrap
@andjc
andjc / localised_dataframe_persian.ipynb
Created April 14, 2024 20:52
localised_dataframe_persian.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@andjc
andjc / localised_dataframe_persian.ipynb
Created April 13, 2024 12:46
localised_dataframe_persian.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@andjc
andjc / rbbi.json
Last active March 22, 2024 01:05
Rules sets for custom break iterators
{
"din": "!!quoted_literals_only; $CR = [\\p{Grapheme_Cluster_Break = CR}]; $LF = [\\p{Grapheme_Cluster_Break = LF}]; $Control = [[\\p{Grapheme_Cluster_Break = Control}]]; $Extend = [[\\p{Grapheme_Cluster_Break = Extend}]]; $ZWJ = [\\p{Grapheme_Cluster_Break = ZWJ}]; $Regional_Indicator = [\\p{Grapheme_Cluster_Break = Regional_Indicator}]; $Prepend = [\\p{Grapheme_Cluster_Break = Prepend}]; $SpacingMark = [\\p{Grapheme_Cluster_Break = SpacingMark}]; $Virama = [\\p{Gujr}\\p{sc=Telu}\\p{sc=Mlym}\\p{sc=Orya}\\p{sc=Beng}\\p{sc=Deva}&\\p{Indic_Syllabic_Category=Virama}]; $LinkingConsonant = [\\p{Gujr}\\p{sc=Telu}\\p{sc=Mlym}\\p{sc=Orya}\\p{sc=Beng}\\p{sc=Deva}&\\p{Indic_Syllabic_Category=Consonant}]; $ExtCccZwj = [[\\p{gcb=Extend}-\\p{ccc=0}] \\p{gcb=ZWJ}]; $L = [\\p{Grapheme_Cluster_Break = L}]; $V = [\\p{Grapheme_Cluster_Break = V}]; $T = [\\p{Grapheme_Cluster_Break = T}]; $LV = [\\p{Grapheme_Cluster_Break = LV}]; $LVT = [\\p{Grapheme_Cluster_Break = LVT}]; $Extended_Pict = [:ExtPict:]; !!chain; 'AA'|'Aa'|
@andjc
andjc / UAX_29.py
Created February 28, 2024 03:07 — forked from HughP/UAX_29.py
PyICU
# We start by loading up PyICU.
import PyICU as icu
# Let's create a test text. Notice it contains some punctuation.
test = u"This is (\"a\") test!"
# We create a wordbreak iterator. All break iterators in ICU are really RuleBasedBreakIterators, and we need to tell it which locale to take the word break rules from. Most locales have the same rules for UAX#29 so we will use English.
wb = icu.BreakIterator.createWordInstance(icu.Locale.getEnglish())
# An iterator is just that. It contains state and then we iterate over it. The state in this case is the text we want to break. So we set that.
@andjc
andjc / african_script_fonts.md
Created December 6, 2023 05:29
List of fonts supporting African scripts.

African Script fonts

Adlam

  • ADLaM Display – OFL 1.1; 1 file (Regular)
  • Ebrima – Commercial; 2 files (Regular, Bold)
  • Kigelia – Commercial; 6 files (Light, Light Italic, Regular, Italic, Bold, Bold Italic)
  • Noto Sans Adlam – OFL 1.1; 4 files (Regular, Medium, SemiBold, Bold); 1 variable font.
  • Noto Sans Adlam Unjoined – OFL 1.1; 4 files (Regular, Medium, SemiBold, Bold); 1 variable font.