Skip to content

Instantly share code, notes, and snippets.

@peterdalle
Created March 20, 2018 17:51
Show Gist options
  • Save peterdalle/e61909427ebe48dbba578ecd1911e523 to your computer and use it in GitHub Desktop.
Save peterdalle/e61909427ebe48dbba578ecd1911e523 to your computer and use it in GitHub Desktop.
Create a data.frame with a fixed correlation between x and y
create_correlation <- function(n=1000, r=0.5) {
# Based on caracal:s answer at
# https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variables/15040#15040
theta <- acos(r) # corresponding angle
x1 <- rnorm(n, 1, 1) # fixed given data
x2 <- rnorm(n, 2, 0.5) # new random data
X <- cbind(x1, x2) # matrix
Xctr <- scale(X, center=TRUE, scale=FALSE) # centered columns (mean 0)
Id <- diag(n) # identity matrix
Q <- qr.Q(qr(Xctr[ , 1, drop=FALSE])) # QR-decomposition, just matrix Q
P <- tcrossprod(Q) # = Q Q' # projection onto space defined by x1
x2o <- (Id-P) %*% Xctr[ , 2] # x2ctr made orthogonal to x1ctr
Xc2 <- cbind(Xctr[ , 1], x2o) # bind to matrix
Y <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2))) # scale columns to length 1
y <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1] # final new vector
return(data.frame(x=x1, y=y))
}
df <- create_correlation()
cor.test(df$x, df$y)
plot(df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment