Thomas Sandmann tomsing1

Modern R Development Guide

This document captures current best practices for R development, emphasizing modern tidyverse patterns, performance, and style. Last updated: August 2025

Core Principles

Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs
Profile before optimizing - Use profvis and bench to identify real bottlenecks
Write readable code first - Optimize only when necessary and after profiling
Follow tidyverse style guide - Consistent naming, spacing, and structure

schemaSpy

Here's how to use it to generate schema relationship diagrams for PostgreSQL databases.

Prerequisites

Download the jar file from here (the current version is schemaSpy_5.0.0.jar)
- Note: There is a release candidate for version 6, but I couldn't get that to play nicely with graphviz on Mac OS 10.13
Get the PostgreSQL JDBC driver (either the JDBC3 or JDBC4 jar file is fine)
Install graphviz

	#!/usr/bin/env bash
	###
	# NB: You probably don't want this gist any more.
	# Instead, use this version from `fastsetup`:
	# https://github.com/fastai/fastsetup/blob/master/setup-conda.sh
	###

	set -e

	cd

	name: CI

	on:
	push:
	branches:
	- main
	pull_request:
	branches:
	- main

	library(tidyverse)
	library(patchwork)

	dat_wide <- tibble(
	x = 1:3,
	top = c(4.5, 4, 5.5),
	middle = c(4, 4.75, 5),
	bottom = c(3.5, 3.75, 4.5)
	)

	suppressPackageStartupMessages(library(plyranges))
	suppressPackageStartupMessages(library(AnnotationHub))
	suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene))

	ah <- AnnotationHub()
	query(ah, c("K562","CTCF","unipk"))
	peaks <- ah[["AH22543"]]
	peaks <- peaks %>% keepStandardChromosomes(pruning.mode="coarse")

	txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

	library(purrr)
	library(dplyr)

	na_set <- function(x, p){
	p <- as_mapper(p)
	x[p(x)] <- NA
	x
	}

	# or something like this using case_when

	from sqlalchemy.dialects import postgresql

	def bulk_upsert(session: Session,
	items: Sequence[Mapping[str, Any]]):
	session.execute(
	postgresql.insert(MyModel.__table__)
	.values(items)
	.on_conflict_do_update(
	index_elements=[MyModel.id],
	set_={MyModel.my_field.name: 'new_value'},

	## see also http://blog.revolutionanalytics.com/2017/06/doazureparallel-updated.html on how to run on Azure
	## and cloudyr project for AWS https://github.com/cloudyr/aws.ec2


	# now also in docs: https://cloudyr.github.io/googleComputeEngineR/articles/massive-parallel.html
	library(googleComputeEngineR)
	library(future)

	## auto auth to GCE via environment file arguments

	#!/bin/bash

	## How to install ascp, in a gist.

	## The URI below is not persistent!
	## Check for latest link: https://www.ibm.com/aspera/connect/
	wget -qO- https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/OSA/0a07f/0/ibm-aspera-connect_4.1.0.46-linux_x86_64.tar.gz \| tar xvz

	## run it
	chmod +x ibm-aspera-connect_4.1.0.46-linux_x86_64.sh