Created
June 15, 2015 13:33
-
-
Save agstudy/b93ad037715f80848f35 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Data/Structure Validation" | |
author: "agstudy" | |
--- | |
This post is an answer to [SO question](http://stackoverflow.com/questions/30844363/data-structure-validation-for-r#comment49735091_30844363) about creating a well typed data structure in R. | |
I think, that in R the only way to define a typed data structure is to `S4 class`. I should not that even S4 classes are *not strongly typed* since you can define your slot as `list`. | |
I Create an S4 class **TypedData**: | |
* I define 4 slots . I set the type of each slot. S4 class will validate gracefully for us that created object respect this typing | |
* Then I add a validation part to check our slots against some conditions. Here for example age and weight should be positives values. | |
* You can also add some slot's default values. | |
```{r} | |
# Create TypedData class | |
TypedData <- setClass( | |
# Set the name for the class | |
"TypedData", | |
# Define the slots | |
representation ( | |
date = "Date", | |
age = "numeric", | |
weight = "numeric", | |
job = "character" | |
), | |
# Set the default values for the slots | |
prototype=list( | |
job = NA_character_ | |
), | |
# Make a function that can test to see if the data is consistent. | |
validity=function(object) | |
{ | |
if(length(object@age) ==0 || any(object@age < 0) ) | |
return("Age should be >0") | |
if(length(object@weight) ==0 || any(object@weight < 0) ) | |
return("Weight should be >0") | |
return(TRUE) | |
} | |
) | |
``` | |
Implement the S3 method to convert the S4 class to a `data.frame` | |
```{r} | |
as.data.frame.TypedData <- | |
function(x, row.names=NULL, optional=FALSE, ...) | |
{ | |
value <- setNames( | |
lapply(slotNames(x),function(y){ | |
col <- slot(x,y) | |
if(length(col)<length(x@date)) | |
col <- rep(NA_character_,length(x@date)) | |
col | |
}), | |
slotNames(x)) | |
attr(value, "row.names") <- | |
as.character(seq_len(length(x@date))) | |
class(value) <- "data.frame" | |
value | |
} | |
``` | |
Create some data. | |
I Use the vector interpretation of the S4 class to create a data example. | |
It is is better for performance to manipulate columns. | |
```{r} | |
pers <- new("TypedData", | |
date=seq(as.Date("2015/1/1"), as.Date("2015/3/1"), "months"), | |
age=c(20,30,50), | |
weight=c(80,50,64)) | |
pers | |
``` | |
Now we convert the S4 object to a data.frame. | |
We check that the result is well typed. | |
```{r} | |
as.data.frame(pers) | |
str(as.data.frame(pers)) | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment