Jason Heppler hepplerj

💭

building the history web

Historian + senior developer, @chnm. Formerly, @sul-cidr @CDRH.

hepplerj / census_cleanup.R

Created March 11, 2020 19:36

An example script for census data in R

	library(tidyverse)
	library(tidycensus)

	# My recommendation is to use the tidycensus library to make getting this data
	# easier than reading in the data from the Census website.
	#
	# Before you can begin, you'll need to get an API key from the Census Bureau.
	# You can acquire one here:
	#
	# Once you have the API key, run the following in RStudio:

hepplerj / messy.R

Created February 20, 2020 20:58

Messy data in R, for teaching the tidyverse

	library(charlatan)
	library(salty)
	library(magrittr)
	library(readr)

	messydata <- ch_generate('name','job','phone_number', n = 200)

	messydata <- messydata %>%
	mutate(job = salt_capitalization(job)) %>%
	mutate(phone_number = salt_na(phone_number)) %>%

hepplerj / frequency_to_list.R

Created December 11, 2019 18:11

Turn a frequency table into a list of individual items

	library(tidyverse)
	library(readxl)

	data <- readxl::read_xlsx("data.xlsx")
	reshaped <- data %>% gather(word, freq, 2:21)
	reshaped <- reshaped %>% drop_na()

	cleaned <- reshaped %>%
	uncount(freq)

hepplerj / geofilter.R

Created November 8, 2019 15:12

Checking points and filtering incorrect or unneeded data.

	library(tidyverse)
	library(maps)
	library(mapdata)

	data <- read_csv("~/Desktop/nplsuperfund.csv")
	names(data) <- c("lat","lon","date")

	# Filter down to USA extent to remove extraneous points
	tidy <- data %>%
	filter(lat < -67, lat > -125) %>%

hepplerj / hex_logo.R

Last active January 14, 2020 19:32

Hex logo generator for R User Group

	library(hexSticker)
	library(tidyverse)
	library(tidycensus)
	library(sf)
	library(viridis)

	options(tigris_use_cache = TRUE)

	nebraska_raw <- get_acs(state = "NE",
	geography = "tract",

hepplerj / pandas.py

Created May 23, 2018 14:45

An evolving set of pandas snippets I find useful

	# Unique values in a dataframe column
	df['column_name'].unique()

	# Grab dataframe rows where column = value
	df = df.loc[df.column == 'some_value']

	# Grab dataframe rows where column value is present in a list
	value_list = ['value1', 'value2', 'value3']
	df = df.loc[:,df.columns.isin(valuelist)]
	# or grab rows where a value is not present in a list

hepplerj / README.md

Last active May 10, 2018 20:12

Add leaflet points on click

Click on the map to add points. See the console for lat/long output.

hepplerj / index.html

Created March 25, 2018 02:30

WebGL Mapping

	<!DOCTYPE html>
	<head>
	<meta charset="utf-8">
	<script src="https://d3js.org/d3.v4.min.js"></script>
	<script src="http://www.webglearth.com/v2/api.js"></script>

	<script>
	function map() {
	var options = { zoom: 1.5, position: [47.19537,8.524404] };
	var earth = new WE.map('earth_div', options);

hepplerj / batch.sh

Created October 20, 2017 17:09

Batch compress PDFs for Omeka

	# This requires the use of GhostScript
	# On macOS, the easiest way to get started is install with Homebrew
	# brew install ghostscript
	#
	# This file should live in the directory that contains the PDFs. From
	# the command line, just running `bash batch.sh` will compress the PDFs
	# and fix any issues that might be present with JPEG2000 images. The
	# compression process should preserve the OCR and will likely reduce the
	# size of the PDF as well.
	#

hepplerj / asc_crawler.py

Created October 20, 2017 15:33

Using tweepy to crawl for archives, special collections, and library users.

	import tweepy

	# OAuth is the preferred method for authenticating to Twitter
	# Consumer keys are under the application's Details page at
	# http://dev.twitter.com/apps
	consumer_key = ""
	consumer_secret = ""

	# Access tokens are found on your applications' Details page
	# at http://dev.twitter.com/apps.