Skip to content

Instantly share code, notes, and snippets.

@tjvananne
tjvananne / process GloVe pre-trained word vector.R
Created May 4, 2017 14:45
How to read and process a downloaded pre-trained GloVe word vector (turn it into a data.frame) in base R
#' A word vector is a giant matrix of words, and each word contains a numeric array that represents the semantic
#' meaning of that word. This is useful so we can discover relationships and analogies between words programmatically.
#' The classic example is "king" minus "man" plus "woman" is most similar to "queen"
# function definition --------------------------------------------------------------------------
# input .txt file, exports list of list of values and character vector of names (words)
proc_pretrained_vec <- function(p_vec) {
@tjvananne
tjvananne / build_data_dictionary.R
Last active May 3, 2017 15:53
Generate Generic Data Dictionary in R. It will count the number of blanks, the number of NAs, tell you the number of unique values per column, calculate the percentages of the previously mentioned column aggregations, and report out the top n (5 is default) number of unique values per row.
# generic data dictionary creation using base-R
#' a couple notes: this could of course be done much faster using
#' third party packages, but I like to provide base-R solutions before
#' branching out into packages just in case they aren't available
#'
#' Also, this could be done in a much less verbose and modular way,
#' but I did want to also demonstrate the "Functional Programming"
@tjvananne
tjvananne / ggplot2_heatmap_simple.R
Last active August 6, 2017 18:22
Create a Heat Map in R using ggplot2 with viridis Color Scale
# references:
# https://rud.is/b/2016/02/14/making-faceted-heatmaps-with-ggplot2/
# This is basically the TL;DR and it also uses a built-in dataset to foster reproducibility
library(ggplot2)
library(viridis)
gg <- ggplot(airquality, aes(x=Day, y=Month, fill=Temp))
gg <- gg + geom_tile(color='White', size=0.1)