See here for the way to put disconnecting code from the sqlite.
The code below provides a personal note to filter SQLite database using the regular expression via activating REGEXP function in SQL. Accepted SQL regex patterns are available here. In general, it appears that SQL regex does not accept look-ahead and look-behind (?).
library(DBI)
#> Warning: package 'DBI' was built under R version 4.3.3
library(tidyverse)
library(RSQLite)
#> Warning: package 'RSQLite' was built under R version 4.3.3
This gist provides my personal note and experience (as a non-Python [but a long-term R] user) in trying to provide clts-conformant initial orthography profile from a cldf dataset using the profile
command as part of the lingpy
Python package.
- The tutorial to generate initial orthography profile (after generating a valid cldf dataset under the cldf directory) is available here
- I have created a valid cldf dataset for Enggano Holle List using R
The case is the Enolex repo I forked from engganolang | |
If the forked repo in gederajeg/enolex (main branch) is n-commits behind the upstream engganolang/enolex (main branch), and we want to sync, I'll do (with GitHub CLI): | |
gh repo sync gederajeg/enolex -b main | |
# results: ✓ Synced the "gederajeg:main" branch from "engganolang:main" | |
Tutorial: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
- open pyCharm project (e.g., a directory of work)
- for my Dell Windows, select the python interpreter that is in cldf folder (that has the .venv folder)--selecting this interpreter already activate the .venv
- then in the terminal, click the dropdown arrow and select the
Command Prompt
- in the terminal, test by running
cldfbench
to check that it reads thecldfbench
module (which it works)
Here is the link to copy selected (cherry-picked) commit between old repo to a new repo:
In the transcription of Kähler (1987), the transcribers directly changed original orthography into the standardised ones. For instance, the original orthography for nasalised long vowel like ȭ
was changed into õõ
using the Keyman keyboard.
When õõ
is generated using Keyman, it contains four characters under the hood (i.e., two combinations of o + ◌̃).
In order to search and replace these multibytes characters combining letter and diacritics, we need to:
-
RStudio doesn't like the Microsoft app installation of python
-
So, install python via Anaconda
-
install the reticulate package
-
Find out the executable python in the system by opening the iPython interpreter from Anaconda, then run the code below:
This is a regex range for subscript number "[\U2080-\U2089]"
.
- Source from here.
- Motivation: When I am dealing with the Austronesian Comparative Dictionary CLDF data with forms containing homonym subscript.
This is inspired by a post here for the use of tidyverse to retrieve column(s) whose contents match certain value.
df |> select(where(function(x) any(grepl(",", x))))
The code above retrives any column that has a comma in it.