Skip to content

Instantly share code, notes, and snippets.

@vpnagraj
vpnagraj / README.md
Last active April 1, 2025 17:12
Brief demonstration of techniques to explore and analyze open text data in R

Overview

This Gist includes an example script to perform basic exploratory analysis on open text data with R. The script includes steps to read in a dataset, tokenize text, summarize counts of tokens per document, perform sentiment analysis, create a document term matrix, and runs a topic modeling procedure.

Data

The script references a file called simulated_emr_data.csv. This data was created by ChatGPT 4o using the following prompt:

Simulate EMR data. Include 6 columns: Encounter ID, Patient ID, Name, Age, Visit Date, Chief Complaint, Provider Notes (free text). Include some patients with repeat visits and ensure that repeated patients have matching IDs, names, and ages. The provider notes should be at least 25 words and should mix tone and style of notes. Create a csv file with 1000 rows.

@vpnagraj
vpnagraj / animation.R
Created October 8, 2024 02:45
Examples of animating data visualizations in R
## -------------------------------------------------------------------------------------------------------------------
library(tidyverse)
#remotes::install_github("hrbrmstr/cdcfluview")
library(cdcfluview)
library(MMWRweek)
library(tidyverse)
library(plotly)
library(gganimate)
@vpnagraj
vpnagraj / answer.R
Last active August 12, 2024 22:08
Example of reprex in R for asking *and* answering questions. The question.R script generates the content for question.md. The answer.R script generates the content for answer.md. The proof.R script uses the answer with data in example.csv.
reprex::reprex({
library(tidyverse)
## define the example tibble
dat <-
tribble(
~individual, ~Q1_A, ~Q1_B, ~Q1_C, ~Q1_D, ~Q2_A, ~Q2_B, ~Q2_C, ~Q2_D,
"alice", NA, "cat", NA, NA, "tacos", NA, NA, NA,
"bob", "dog", NA, NA, NA, NA, NA, NA, "pizza"
@vpnagraj
vpnagraj / anomalies.R
Last active May 14, 2024 18:35
Demo of anomaly detection in R with anomalize()
###############################################################################
## brief demo of anomaly detection in R using the timetk anomalize() function
## current as of 2024-05-13
###############################################################################
## set up
## load dplyr for data manipulation
library(dplyr)
## load timetk for anomaly detection functionality
library(timetk)
## load jsonlite to read in example data
@vpnagraj
vpnagraj / eda_tools.R
Created April 9, 2024 14:17
Simple (and non-exhaustive) demo of exploratory data analysis tools in R
###############################################################################
## brief demo of exploratory data analysis (EDA) tools for data frames in R
## NOTE: the code below is intended to preview the EDA tools ...
## ... it does not exhaustively demonstrate functionality for these tools ...
## ... and it is current as of 2024-04-09 ...
## ... for more information refer to the documentation for each package
###############################################################################
###############################################################################
## set up
@vpnagraj
vpnagraj / benchmark.R
Created July 4, 2018 15:45
benchmarking `svSocket` versus base `load` versus `redis`
library(microbenchmark)
library(redux)
library(svSocket)
# clear workspace
rm(list = ls())
# set up svSocket
startSocketServer()
con <- socketConnection(host = "localhost", port = 8888, blocking = FALSE)
@vpnagraj
vpnagraj / scales.R
Created January 4, 2018 16:52
scale exploration
library(ggplot2)
library(tidyr)
dat <- data.frame(x = rnorm(n = 1000, mean = 2.8, sd = 0.05),
y1 = sample(64503:73034, size = 1000, replace = TRUE),
y2 = sample(18738:19602, size = 1000, replace = TRUE))
dat %>%
ggplot() +
geom_point(aes(x,y1)) +
@vpnagraj
vpnagraj / outbreak_animation_list.R
Created June 16, 2017 17:04
script to demonstrate outbreak animation with a subset of mers_korea_2015 data
# script to demonstrate outbreak animation with a subset of mers_korea_2015 data
# must install github release of threejs package
# devtools::install_github("bwlewis/rthreejs")
library(threejs)
library(outbreaks)
library(dplyr)
# use dplyr to subset results to only include hospital visit exposure
@vpnagraj
vpnagraj / scratch.R
Last active January 25, 2018 15:07
files for shiny workshop
# install.packages("babynames")
library(babynames)
# install.packages("tidyverse")
library(tidyverse)
# install.packages("ggplot2")
# install.packages("dplyr")
# let's take a look at the data
babynames %>%
View()
# script to demonstrate outbreak animation with a subset of mers_korea_2015 data
# must install github release of threejs package
# devtools::install_github("bwlewis/rthreejs")
library(threejs)
library(outbreaks)
library(dplyr)
# use dplyr to subset results to only include hospital visit exposure