Skip to content

Instantly share code, notes, and snippets.

View fauxneticien's full-sized avatar

Nay San fauxneticien

View GitHub Profile
@fauxneticien
fauxneticien / README.md
Last active June 23, 2023 06:22
Research knowledge base with Zotero, Highlights, and Obsidian

Research knowledge base with Zotero, Highlights, and Obsidian

Ever since I started working on my honours thesis in 2013, I had been tinkering with various workflows to manage references, PDFs, PDF annotations and notes all in some coherent way. The workflow I describe here is the latest one (February 2021 as of writing), and I think I've finally found something that satisfies a lot of the [admittedly very subjective] desiderata.

Knowledge provenance

Being [perhaps overly] wary of mis-citing something, I'd like to be able to quickly go back to the original source, and the exact page and PDF highlight that I'm referring to. For the last couple of years, I have been using the Highlights App (MacOS only, unfortunately; though there may be Windows/Linux equivalents). The two main features of Highlights are that:

  1. It automatically extracts PDF highlights and is able to keep them updated in a 'sidecar' file, so for a file like Ram_et_al_2020_Neural_Network.pdf, there'll be
@fauxneticien
fauxneticien / ah-zzz-16le-cz.fea
Last active September 27, 2020 15:35
phnrec test
@fauxneticien
fauxneticien / readme.md
Last active February 11, 2021 20:01
ubuntu-web-rstudio_sept2020

Launch RStudio server on Ubuntu 20.04

1. Launch server (DigitalOcean, Packet, etc.) with Ubuntu 20.04 and get IP XXX.XXX.XXX.XXX

Test that logging in works

ssh [email protected]
@fauxneticien
fauxneticien / intro.md
Last active June 24, 2020 19:58
Extending Allosaurus for Australian languages

Allosaurus

  • Allosaurus is a pretrained universal phone recognizer: https://github.com/xinjli/allosaurus
  • It has been trained on English, Japanese, Mandarin, Tagalog, Turkish, Vietnamese, German, Spanish, Amharic, Italian and Russian

Testing on Kaytetye

  • We test this off-the-shelf version on some Kaytetye data. The data are citation form headwords recorded by a female native speaker of Kaytetye in a music studio for a multimedia dictionary.
  • For 2,360 headwords, we have ~2 repetitions per word (e.g. palpalpe), and two transcriptions (t_1, t_2) by two independent human transcribers.
  • In the table below int_t_dist is the inter-transcriber string distance and min_a_dist is the minimum string distance between the allosaurus transcription and the human transcriptions.
@fauxneticien
fauxneticien / run-kayv-docker.sh
Last active June 29, 2019 10:46
Run and configure container for Kaytetye medial vowels analyses
# Get fauxneticien/kaytetye-medial-vowels
git clone https://github.com/fauxneticien/kaytetye-medial-vowels
# Set read/write permissions for all files
chmod 777 -R kaytetye-medial-vowels
# Run rocker/verse container, name it 'kayv-docker', set login password for 'rstudio' user to 'kayv'
sudo docker run -it --rm -d -p 80:8787 -v ~/kaytetye-medial-vowels:/home/rstudio --name kayv-docker rocker/verse
# Run setup.R inside the container
docker exec kayv-docker Rscript /home/rstudio/src/setup.R
@fauxneticien
fauxneticien / elpis-deploy.sh
Last active August 18, 2019 09:11
Elpis deploy script for ICLDC workshop
#!/bin/bash
# Commands taken from https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04
# Run using:
# bash <(curl -s https://gist.githubusercontent.com/fauxneticien/159529bf6c071f90b7fd70a481e6083b/raw/47c1d9e7c922797deee9fa2ce16387d6a3ed3624/elpis-deploy.sh)
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
sudo apt install -y docker-ce
@fauxneticien
fauxneticien / install-docker-u1804.sh
Last active June 29, 2019 07:06
Install Docker on an instance of Ubuntu 18.04
#!/bin/bash
# Commands taken from https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
sudo apt install -y docker-ce
systemctl start docker
@fauxneticien
fauxneticien / wav2formants_csvs.R
Created May 7, 2018 05:44
Extract formants from wav file in CSV format using Praat and wrassp::forest trackers
# install.pacakges(...) if you do not have the packages below
library(tidyverse)
library(stringr)
library(wrassp)
library(glue)
wav2formants_csvs <- function(in_file) {
stopifnot(grepl("\\.wav$", in_file))
stopifnot(file.exists(in_file))
@fauxneticien
fauxneticien / cons_comp_durs.R
Last active April 10, 2018 06:17
Plot mean durations of various components across multiple consonant types
library(tidyverse)
# Start with likely output of a group_by(...) %>% summarise(...) on raw data
# which computes the mean and error on various distributions of durations
# name each component-measure combination variable in a dot-separated form, e.g.
# clo.dur, clo.err (mean closure duration, standard error for closure duration)
tibble(
consonant = c("Pre-stopped nasal", "Nasal", "Oral stop"),
clo.dur = c(25, NA, 30),
clo.err = c(5, NA, 6),
@fauxneticien
fauxneticien / rename_regex.R
Created April 5, 2018 03:21
Rename files with regex in R
library(purrr)
library(stringr)
list.files("~/git-repos/coedl/dbd/public/pdfs", pattern = "PDFsam", full.names = TRUE) %>%
walk(function(path) {
file.rename(
path,
str_replace(path, "(\\d+)_PDFsam_(dbd-[A-Z]+).pdf", "\\2_\\1.pdf")
)
})