Skip to content

Instantly share code, notes, and snippets.

View jmclawson's full-sized avatar

James Clawson jmclawson

View GitHub Profile
@jmclawson
jmclawson / topic_model.R
Created May 4, 2023 13:15
Functions for building a topic model and exploring it. Visualizations include document-level distributions (static and interactive), word distributions per topic, and topic word clouds.
library(wordcloud)
library(topicmodels)
library(plotly)
# Moves a table of texts through the necessary
# steps of preparation before building a topic
# model. The function applies these steps:
# 1. identifies text divisions by the `doc_id`
# column
# 2. divides each of the texts into same-sized
@jmclawson
jmclawson / unnest_without_caps.R
Last active November 4, 2023 16:31
Applies tidytext's unnest_tokens() function but also filters out any word that appears in the text only with a capital letter. In English texts, this should be a quick way to remove all proper nouns.
unnest_without_caps <- function(
df,
column = "text") {
full <- df |>
tidytext::unnest_tokens(word, {{column}}, to_lower = FALSE)
big <- full |>
dplyr::filter(str_detect(word, "^[A-Z]")) |>
dplyr::pull(word)
@jmclawson
jmclawson / tidy_some_texts.R
Last active May 4, 2023 13:44
Reads all files in a directory that match a certain naming pattern, returning a one-word-per-row table, with stanza and line numbers for poetry and with paragraph numbers for prose. Set word=FALSE to retain one line per row.
# library(tidyverse)
# library(tidytext)
##### Use the following function for reading a folder of prose text files. #####
## Put all the text files you want to read in the same folder. If that folder's
## called, for example, "project2", here's the function in practice:
##
## my_table <- tidy_prose_texts(folder = "project2")
##
@jmclawson
jmclawson / stylo_log.R
Last active January 27, 2023 21:57
Log details for replicable analyses using stylo, then re-run prior analyses.
##### stylo_log #####
# Pipe from stylo() directly into stylo_log()
# or wrap stylo() in stylo_log()
# Examples:
# stylo() |> stylo_log()
# stylo_log(stylo())
stylo_log <- function(
stylo_object,
log_label = NULL,
@jmclawson
jmclawson / get_if_needed.R
Last active January 13, 2023 17:50
Downloads a url if it doesn't already exist locally
get_if_needed <- function(
# Url to be downloaded, necessary
url,
# destination filename (optional)
filename = NULL,
# destination directory (optional)
destdir = "data"
) {
@jmclawson
jmclawson / clean_eebo.R
Last active July 21, 2022 15:14
Preprocess EEBO TCP full text, cleaning OCR blips and removing page references
##### Load libraries #####
library(dplyr)
library(stringi)
library(stringr)
library(tidyr)
library(stringdist)
library(tokenizers)
##### Set up replicable workflow #####
# Set the directories to be used. Both directories should exist in project directory, and dir_start should include text files needing to be processed.
# import_bib.R
# To convert from Bibtex to a data frame for working with the data in R.
library(dplyr)
library(stringr)
library(tidyr)
# 0. Set filename for the bibfile
the_bibfile <- "~/path/to/my.bib"
@jmclawson
jmclawson / recreationthursday_2021-07-15.R
Last active July 16, 2021 21:15
Code for #RecreationThursday for July 15
library(tidyverse)
# make a pinwheel: first set up directions. The blades are drawn in different orders for clockwise and counterclockwise
clockwise_t <- c(2, 1, 3, 4)
clockwise_f <- c(4, 3, 1, 2)
direction <- list(clockwise_t, clockwise_f)
# create a 4-color pinwheel with 4 blades facing the same direction
get_pinwheel <-
function(
@jmclawson
jmclawson / bibtex_documentation.sty
Last active June 16, 2021 16:03
Introduce .bib-file data directly into Latex documentation using \bibcitem{citekey}
\usepackage{listings}
\usepackage{xcolor}
\let\oldaddbibresource\addbibresource
\renewcommand{\addbibresource}[1]{%
\oldaddbibresource{#1}%
\expandafter\newcommand\csname thebibfile\endcsname{#1}%
}
% \makeatletter
@jmclawson
jmclawson / prepare_bib.R
Last active June 16, 2021 13:51
Pre-process a bib file for clean use in documentation
# To prepare it for use in documentation, import a .bib file, strip Bibdesk's extra fields and additions, and enclose each entry with code compatible with Latex's {listings} package.
library(dplyr)
library(stringr)
library(readr)
# 0. Set relative file path for the bibfile
# setwd()
# 1. read the bib file as a vector of lines