Skip to content

Instantly share code, notes, and snippets.

View MJacobs1985's full-sized avatar

Marc Jacobs MJacobs1985

View GitHub Profile
@MJacobs1985
MJacobs1985 / Data Exploration
Created September 30, 2021 08:15
Time-Series Analysis in SAS
/** DATA EXPLORATION **/
ods graphics / reset width=7in height=5in imagemap imagefmt=svg;
proc sgplot data=WORK.want;
series x=date y=total_all ;
series x=date y=total_male ;
series x=date y=total_female ;
xaxis label='Date' type=time grid;
yaxis label="Total Mortality" grid;
title "Total Mortality in the Netherlands over Time"; run;
ods graphics / reset width=10in height=5in imagemap imagefmt=svg;
@MJacobs1985
MJacobs1985 / Decomposition analysis
Last active September 30, 2021 08:53
Time-series analysis in SAS
/** Decomposition Analysis**/
ods graphics / reset width=10in height=8in imagefmt=svg;
proc sort data=WORK.MONTHLONG out=Work.preProcessedData;by variable date;run;
proc timeseries data=Work.preProcessedData seasonality=12
plots=(series histogram cycles spectrum corr acf wn decomp sa pcsa tcc sc ic)
print=(descstats seasons decomp);
where variable in ('65_80_all' '65_80_male' '65_80_female');
by variable;
id date interval=month;
var value / accumulate=none transform=none dif=0 sdif=0;
@MJacobs1985
MJacobs1985 / UCM Models
Created September 30, 2021 08:54
Time-series analysis in SAS
/* Female 65-80 */
ods noproctitle;
ods graphics / imagemap=on;
proc sort data=WORK.MONTH out=Work.preProcessedData;by date;run;
proc ucm data=Work.preProcessedData;
id date interval=month;
deplag lags=(9 14 19 20);
cycle order=2 plot=smooth;
model '65_80_female'n;
level plot=smooth;
@MJacobs1985
MJacobs1985 / Libraries
Created September 30, 2021 18:24
How many samples do you need? -- Simulation in R
rm(list=ls())
#### LIBRARY ####
library(readr)
library(rms)
library(lattice)
library(dplyr)
library(car)
library(boot)
library(reshape)
@MJacobs1985
MJacobs1985 / Data import & exploration
Created September 30, 2021 18:25
How many samples do you need?
#### DATA IMPORT & MANAGEMENT ####
ARCdata <- read_csv("ARC_crosssectional_presentation.csv")
ARCdata$gain<-ARCdata$End-ARCdata$Start
str(ARCdata)
ARCdata<-as.data.frame(ARCdata)
ARCdata$Diet<-as.factor(ARCdata$Diet)
ARCdata$Tank<-as.factor(ARCdata$Tank)
#### DATA EXPLORATION ####
## Look at data
print(tapply(ARCdata$gain, ARCdata$Tank, mean, na.rm=TRUE))
@MJacobs1985
MJacobs1985 / Model building
Created September 30, 2021 18:26
How many samples do you need?
#### MODEL BUILDING ####
fit<-lm(gain~Diet,data=ARCdata)
summary(fit)
unlist(summary(aov(fit)))[[9]] # significance of F value from ANOVA -->0.2288
unlist(summary(aov(fit)))[[7]] # F value from ANOVA
@MJacobs1985
MJacobs1985 / Post-hoc power analysis
Last active September 30, 2021 18:32
How many samples do you need
#### POST-HOC POWER ANALYSIS ####
## To obtain a power analysis, I need to look at the probability of obtaining a significant p-value (a<=0.05)
## What I could do is use bootstrapping to estimate the population mean and standard deviation (sd)
## Once I have the population mean and sd, I can then simulate 1000 trials, and look at the achieved 'power'
k=(length(unique(ARCdata$Tank)))
n=10000
popmean<-matrix(data=NA, nrow=n, ncol=k)
popsd<-matrix(data=NA, nrow=n, ncol=k)
for (i in 1:n){
ARCdatanew<-lapply(split(ARCdata, ARCdata$Tank),
@MJacobs1985
MJacobs1985 / Simulate new trials
Created September 30, 2021 18:31
How many samples do you need
## We now simulate 10k new trials, using population mean and sd, and look for the power -35 samples per time
set.seed(1155)
ARCdatasim<-ARCdata
n=10000
x=matrix(data=NA, nrow=n, ncol=1)
power <- function(x){ sum(x<=0.05)/n *100}
for (i in 1:n){
ARCdatasim$gain[ARCdatasim$Tank==120]<-rnorm(35, mean=popmean[1], sd=popsd[1])
ARCdatasim$gain[ARCdatasim$Tank==121]<-rnorm(35, mean=popmean[2], sd=popsd[2])
ARCdatasim$gain[ARCdatasim$Tank==122]<-rnorm(35, mean=popmean[3], sd=popsd[3])
@MJacobs1985
MJacobs1985 / Full simulation
Created September 30, 2021 18:35
How many samples do you need
#### SIMULATION ESTIMATING THE INFLUENCE OF RANDOMLY SELECTING FISH TAKING INTO ACCOUNT THE EFFECT SIZE ####
## Now, lets see what happens if we increase the population mean of a diet (so the true mean, not in a sample)
## We keep the variation proportionate to the mean using the coefficient of variation (sd / mean)
# First, change columns names of population mean and population sd to make sure we do not make mistakes
samplemean<-tapply(ARCdata$gain, ARCdata$Tank, mean, na.rm=TRUE)
samplesd<-tapply(ARCdata$gain, ARCdata$Tank, sd, na.rm=TRUE)
popmean<-as.array(popmean)
popsd<-as.array(popsd)
for (i in 1:12){
rownames(popmean)[i]<-rownames(samplemean)[i]
@MJacobs1985
MJacobs1985 / Simulation check
Created September 30, 2021 18:35
How many samples do you need
# Check if analyses went well, by looking at meas at different levels
apply(x,3, mean) # if different then okay
## Summarize power values
power <- function(x){sum(x<=0.05)/n *100}
powervalue<-as.data.frame(apply(x,c(2:3), FUN=power)) # provides for each effect size the power of each k
powervalue<-powervalue[,c(0:51)]
powervalue<-setDT(powervalue, keep.rownames = TRUE)[]
powervalue<-as.data.frame(powervalue)
colnames(powervalue)[1]<-"N"