Skip to content

Instantly share code, notes, and snippets.

View nacnudus's full-sized avatar

Duncan Garmonsway nacnudus

View GitHub Profile
@nacnudus
nacnudus / luxembourg-time-use-unpivotr.R
Created September 15, 2018 10:02
Tidy a spreadsheet of the Luxembourg Time Use Survey with unpivotr
# Inspired by http://www.brodrigues.co/blog/2018-09-11-human_to_machine/
# https://twitter.com/brodriguesco/status/1039604517287931904
# "You can find the data I will use here. Click on the “Time use” folder and you can download the workbook."
# http://statistiques.public.lu/stat/ReportFolders/ReportFolder.aspx?IF_Language=eng&MainTheme=3&FldrName=1&RFPath=14306
library(tidyverse)
library(tidyxl)
library(unpivotr)
library(lubridate)
@nacnudus
nacnudus / luxembourg-time-use-unpivotr-experimental.R
Created September 15, 2018 10:08
Tidy a spreadsheet of the Luxembourg Time Use Survey with experimental unpivotr branch
# Inspired by http://www.brodrigues.co/blog/2018-09-11-human_to_machine/
# https://twitter.com/brodriguesco/status/1039604517287931904
# "You can find the data I will use here. Click on the “Time use” folder and you can download the workbook."
# http://statistiques.public.lu/stat/ReportFolders/ReportFolder.aspx?IF_Language=eng&MainTheme=3&FldrName=1&RFPath=14306
# This time using experimental unpivotr code to allow custom filtering of header cells, rather than having to reposition them.
# https://github.com/nacnudus/unpivotr/commit/0961ec3c3e17b34755f0fce94db7f5bf380d43ce
library(tidyverse)
library(tidyxl)
library(tidyverse)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)
M <- 3 # Volunteers (rows)
N <- 4 # Jobs (combination of role at given time and location) (columnss)
# Jobs are:
# 1. Greet (8am-9am)
@nacnudus
nacnudus / cursed_starwars_data.R
Created December 10, 2019 19:51 — forked from brooke-watson/cursed_starwars_data.R
cursed_data_challenge
# ---------------------------------------
# untidy data
# ---------------------------------------
# this dataset is a sample of the kind of data that might appear in the wild,
# particularly when dealing with government data,
# particularly when trying to convert an output table or individual report
# back into a raw data format that can be analyzed.
# in these test datasets, discrete observations are spread out across multiple rows.
@nacnudus
nacnudus / as.list.environment.R
Last active May 3, 2020 19:22
Fastest way to count objects in an R environment
# Compare performance of ls() with as.list.environment() for counting objects
# length(as.list.environment()) is fastest
env <- environment(plot) # A handy, large environment
bench::mark(length(ls(env)), length(as.list(env)))
# # A tibble: 2 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 length(ls(env)) 4.69ms 4.96ms 198. NA 0 99 0 501ms <int [1]> <NULL> <bch:tm> <tibble [99 × 3]>
@nacnudus
nacnudus / parse-cancer-waiting-times-excel.R
Created November 25, 2020 18:40
Parse a UK government spreadsheet of cancer waiting times with R, tidyxl and unpivotr
library(tidyverse)
library(tidyxl)
library(unpivotr)
#' Convert relevant character values to dates
#'
#' @param cells Data frame derived from `tidyxl::xlsx_cells()` or
#' `readr::melt_csv()` or similar.
#' @param condition An expression that returns a logical value,
#' is defined in terms of the columns in `cells`. Similar to `dplyr::filter()`.
@nacnudus
nacnudus / gmail_message_size_analysis.R
Created May 15, 2021 18:59
Script to analyse Gmail message sizes by from/to/subject, useful when you're running out of space in the free tier
# Script to analyse Gmail message sizes by from/to/subject
# 1. Download Gmail from Google Takeout, specifically the folders "Inbox",
# "Archived", "Sent", and "Bin".
# 2. Extract it into the working directory. It should create the folders
# Takeout/Mail
# 3. Run this script in the working directory
library(tidyverse)
library(tm.plugin.mail)
@nacnudus
nacnudus / emst.jl
Created March 14, 2022 21:37
Euclidean Minimum Spanning Tree in Julia using mlpack
# The fastest implementation available is in mlpack (C++)
using Pkg
Pkg.add("mlpack")
using mlpack
x = rand(5,2)
x_emst = mlpack.emst(x)
# A far slower implementation is in EMST.jl. As of 2022-03-14 my fork has been updated to run with Julia v0.7+using Pkg
using Pkg
Pkg.add(url="https://github.com/nacnudus/EMST.jl", rev="julia-v0.7")
@nacnudus
nacnudus / log.sh
Last active March 17, 2022 13:05
Simple reminder to log my current activity.
#!/bin/bash
ENTRY=$(/usr/bin/yad --title "Activity log" --text="What are you doing?" --entry --auto-kill)
DATETIME=$(date --iso-8601=seconds)
[[ ! -z "$ENTRY" ]] && echo -e "${DATETIME}\t${ENTRY}">> /home/nacnudus/gds/log.log
# Schedule in crontab Every six minutes between 9am and 6pm on weekdays
# */6 9-18 * * 1-5 env DISPLAY=:0 && /usr/bin/bash /path/to/this_file.sh
@nacnudus
nacnudus / dfs-igraph.R
Last active May 3, 2022 19:21
Depth-first search in igraph
library(igraph)
library(tidygraph)
# Create an igraph
# c - d
# - e
# - f - b
ig <- graph_from_literal(f-+b, c-+e:f, c-+d)
ig