Skip to content

Instantly share code, notes, and snippets.

View sfirke's full-sized avatar

Sam Firke sfirke

  • City of Ann Arbor
  • Ann Arbor, MI
  • 01:21 (UTC -05:00)
View GitHub Profile
@sfirke
sfirke / airflow_health_monitor.sh
Last active October 24, 2024 07:54
Bash script to monitor health of Airflow scheduler container deployed with docker compose, restarting it if necessary
#!/bin/bash
# Requires the sendemail package being installed on the host machine
# Get container health status
healthy=$(docker inspect -f '{{.State.Health.Status}}' airflow-airflow-scheduler-1)
# Check if healthy
if [[ $healthy != "healthy" ]]; then
# attempt to restart Airflow - the scheduler will stop if the Azure Postgres DB becomes unavailable due to maintenance
@sfirke
sfirke / janitor_usage_analysis.R
Last active March 13, 2021 19:16
Analysis of janitor downloads
# Exploring download counts of a single package
x <- cranlogs::cran_downloads("janitor", from = "2016-10-03", to = "2021-03-12")
library(tidyverse)
library(lubridate)
library(tntpr) # from devtools::install_github("tntp/tntpr")
x$wday <- wday(x$date)
x$weekday <- ifelse(x$wday %in% c(1,7), "Weekend", "Weekday")
x$year <- tntpr::date_to_sy(x$date, as.Date("2016-10-02")) # segments into years using a cutoff date
@sfirke
sfirke / cran_downloads_jan_feb_2021.csv
Created March 4, 2021 14:59
Download counts of all R packages on CRAN for Jan-Feb 2021
package total_downloads
A3 2769
aaSEA 1254
AATtools 855
ABACUS 1298
abbyyR 1745
abc 3119
abc.data 3364
ABC.RAP 1412
abcADM 1135
@sfirke
sfirke / solaredge_retrieval.py
Created December 30, 2020 15:37
Retrieving SolarEdge solar panel generation data from the API using Python
import pandas as pd
import solaredge
import time
s = solaredge.Solaredge("YOUR-API-KEY")
site_id = YOUR-SITE-ID
# Edit this date range as you see fit
# If querying at the maximum resolution of 15 minute intervals, the API is limited to queries of a month at a time
# This script queries one day at a time, with a one-second pause per day that is polite but probably not necessary
@sfirke
sfirke / tidytext_wordclouds.R
Created March 9, 2018 14:05
Make wordclouds from a text column in R
library(pacman)
p_load(tidytext, wordcloud, janeaustenr, dplyr)
data("stop_words")
ppdf <- data.frame(prideprejudice, stringsAsFactors = FALSE)
# create a word cloud
create_word_cloud <- function(dat, col_name, exclude = "", max.words = 50, colors = "#034772", ...){
col <- deparse(substitute(col_name))
dat %>%
@sfirke
sfirke / split_tinker_combine_tidyverse.R
Created March 7, 2018 13:58
Using split with magrittr's $%$ to reference the names of the listed data.frames
# I want to remove duplicate mpg rows where cylinder is 4
# Split, tinker with the data.frames by name, bind_rows
library(magrittr)
library(dplyr)
mtcars %>%
split(., .$cyl == 4) %$%
bind_rows(`FALSE`,
`TRUE` %>%
distinct(mpg, .keep_all = TRUE))
@sfirke
sfirke / render_keep_md.R
Last active May 22, 2020 17:49
Function to build R package vignettes, retaining both .md and .Rmd
# From https://stackoverflow.com/questions/45575971/compile-a-vignette-using-devtoolsbuild-vignette-so-that-md-is-kept-in-the-v
# Usage: render_keep_md("tabyls")
render_keep_md <- function(vignette_name){
# added the "encoding" argument to get the oe character passed through correctly to the resulting .Md
rmarkdown::render(paste0("./vignettes/",vignette_name, ".Rmd"), clean=FALSE, encoding = 'UTF-8')
files_to_remove = paste0("./vignettes/",vignette_name, c(".html",".knit.md",".utf8.md"))
lapply(files_to_remove, file.remove)
}
@sfirke
sfirke / fix_surveymonkey_two_row_headers.R
Last active March 29, 2018 16:06
(roughly) handle SurveyMonkey exports where the variable names are split over the first two rows
# Fix dual-row names: if the first row is not NA or containing the word "response", use the one from the first row
# Note: read your SurveyMonkey .csv with readr::read_csv, not read.csv - otherwise this may not work
library(dplyr)
library(janitor)
fix_SM_dual_row_names <- function(dat){
current_names <- names(dat)
row_1 <- unlist(dat[1, ])
@sfirke
sfirke / add_centered_title.R
Last active November 8, 2024 19:43
Center all of your ggplot2 titles over the whole plot using a function
library(ggplot2)
library(dplyr)
library(grid)
library(gridExtra)
add_centered_title <- function(p, text, font_size){
title.grob <- textGrob(
label = text,
gp = gpar(fontsize = font_size,
Package: janitor
Title: Simple Tools for Examining and Cleaning Dirty Data
Version: 0.3.0.9000
Authors@R: c(person("Sam", "Firke", email = "[email protected]", role = c("aut", "cre")),
person("Chris", "Haid", email = "[email protected]", role = "ctb"),
person("Ryan", "Knight", email = "[email protected]", role = "ctb"))
Description: The main janitor functions can: perfectly format data.frame column
names; provide quick one- and two-variable tabulations (i.e., frequency
tables and crosstabs); and isolate duplicate records. Other janitor functions
nicely format the tabulation results. These tabulate-and-report functions