Skip to content

Instantly share code, notes, and snippets.

@rewinfrey
Created August 21, 2015 00:41
Show Gist options
  • Select an option

  • Save rewinfrey/1c406d5be9af2fc7227b to your computer and use it in GitHub Desktop.

Select an option

Save rewinfrey/1c406d5be9af2fc7227b to your computer and use it in GitHub Desktop.
R Utility Functions for cleaning, randomizing, sorting and creating test / training sets
clean_data <- function(dataset) {
na.omit(dataset)
}
randomize <- function(dataset) {
dataset$randomized <- runif(length(dataset[,1]))
dataset
}
randomize_ordered <- function(dataset) {
randomized_dataset <- randomize(dataset)
randomized_dataset[order(randomized_dataset$randomized), ]
}
split_dataset <- function(dataset, first_half = 1) {
half <- length(dataset[,1]) / 2
if (first_half == 1) {
dataset[seq(1,half),]
} else {
dataset[seq(half + 1, length(dataset[,1])),]
}
}
# Example
example_dataset <- read.csv("example_data.csv", header = TRUE, na.strings="")
training <- split_dataset(randomize_ordered(clean_data(example_dataset)), first_half = 1)
test <- split_dataset(randomize_ordered(clean_data(example_dataset)), first_half = 0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment