mrdwab’s gists

mrdwab / stratified.R

Last active April 27, 2024 19:57

Stratified random sampling from a `data.frame` in R

	stratified <- function(df, group, size, select = NULL,
	replace = FALSE, bothSets = FALSE) {
	if (is.null(select)) {
	df <- df
	} else {
	if (is.null(names(select))) stop("'select' must be a named list")
	if (!all(names(select) %in% names(df)))
	stop("Please verify your 'select' argument")
	temp <- sapply(names(select),
	function(x) df[[x]] %in% select[[x]])

mrdwab / Cbind.R

Last active April 12, 2020 22:51

Cbind for unequal length vectors.

	padNA <- function (mydata, rowsneeded, first = TRUE)
	{
	temp1 = colnames(mydata)
	rowsneeded = rowsneeded - nrow(mydata)
	temp2 = setNames(
	data.frame(matrix(rep(NA, length(temp1) * rowsneeded),
	ncol = length(temp1))), temp1)
	if (isTRUE(first)) rbind(mydata, temp2)
	else rbind(temp2, mydata)
	}

mrdwab / concat.split.DT.R

Last active December 24, 2015 22:29

Bringing some speed to `concat.split`

	#' Split concatenated cells in a \code{data.frame} or a \code{data.table}
	#'
	#' A variation of the \code{concat.split} family of functions designed for
	#' large rectangular datasets.
	#'
	#' While the general \code{concat.split} functions are able to handle
	#' "unbalanced" datasets (for example, where the number of fields in a given
	#' column might differ from row to row) because of the nature of \code{fread}
	#' from the "data.table" package, this function does not support such data
	#' types.

mrdwab / Factor.R

Last active December 25, 2015 18:19

The `factor` function in R doesn't work nicely with duplicated levels, but there is a workaround using the `levels` function. This is a wrapper function to combine those two steps into one.

	#' Factor vectors with multiple levels
	#'
	#' \code{\link{factor}} does not let you use duplicated levels nicely. It results
	#' in an ugly warning message and you need to use \code{\link{droplevels}} to get
	#' the desired output.
	#'
	#' The "solution" is to first factor the vector, and then use a named \code{list}
	#' with the \code{\link{levels}} function. This function is a wrapper around
	#' those steps.
	#'

mrdwab / moveme.R

Created October 27, 2013 15:34

Shuffle items in a character string.

	moveme <- function(invec, movecommand) {
	movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], ",\|\\s+"),
	function(x) x[x != ""])
	movelist <- lapply(movecommand, function(x) {
	Where <- x[which(x %in% c("before", "after", "first", "last")):length(x)]
	ToMove <- setdiff(x, Where)
	list(ToMove, Where)
	})
	myVec <- invec
	for (i in seq_along(movelist)) {

mrdwab / CUT.R

Created October 31, 2013 15:43

Makes the output of `cut` a `list` with the values of `cut` and a `data.frame` with the lower and upper values of each interval.

	CUT <- function (x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE,
	dig.lab = 3L, ordered_result = FALSE, ...)
	{
	if (!is.numeric(x))
	stop("'x' must be numeric")
	if (length(breaks) == 1L) {
	if (is.na(breaks) \|\| breaks < 2L)
	stop("invalid number of intervals")
	nb <- as.integer(breaks + 1)
	dx <- diff(rx <- range(x, na.rm = TRUE))

mrdwab / Interleave.R

Created November 5, 2013 04:24

`interleave` from "gdata" with a `list` as an input.

	Interleave <- function(myList, append.source = TRUE, sep = ": ", drop = FALSE) {
	sources <- myList
	sources[sapply(sources, is.null)] <- NULL
	sources <- lapply(sources, function(x) if (is.matrix(x) \|\|
	is.data.frame(x))
	x
	else t(x))
	nrows <- sapply(sources, nrow)
	mrows <- max(nrows)
	if (any(nrows != mrows & nrows != 1))

mrdwab / dailyCalendar.R

Last active December 28, 2015 14:29

Defines a "WeekDays" constant and a "dailyCalendar" function

	WeekDays <- function(startOn = "Monday", abbreviate = FALSE) {
	WD <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
	x <- match(startOn, WD)
	WD <- WD[c(x:7, setdiff(1:7, x:7))]
	if (isTRUE(abbreviate)) {
	substring(WD, 0, 3)
	} else WD
	}

mrdwab / helpExtract.R

Last active December 29, 2015 00:29

Extracts the requested section of a function's help files from R to the console.

	helpExtract <- function(Function, section = "Usage", type = "m_code") {
	A <- deparse(substitute(Function))
	x <- capture.output(tools:::Rd2txt(utils:::.getHelpFile(utils::help(A)),
	options = list(sectionIndent = 0)))
	B <- grep("^_", x) ## section start lines
	x <- gsub("_\b", "", x, fixed = TRUE) ## remove "_\b"
	X <- rep(FALSE, length(x))
	X[B] <- 1
	out <- split(x, cumsum(X))
	out <- out[[which(sapply(out, function(x)

mrdwab / dummy_bench.R

Last active January 2, 2016 03:19

Results of a benchmark for creating dummy variables from a single column.

	library(microbenchmark)


	## Change "n" to experiment with different sized `data.frame`s
	set.seed(1)
	n = 100000

	## I couldn't think of other constants off the top of my head
	## This should give us 26+26+50+50 "dummy variable" columns.
	example <- data.frame(strcol = sample(

Ananda Mahto mrdwab