Marc Jacobs MJacobs1985

Scientist. PhD in psychology. Builder of models, and enthousiast of statistics, research, epidemiology, probability, and simulations for 10+ years.

MJacobs1985 / Data Exploration

Created September 30, 2021 08:15

Time-Series Analysis in SAS

	/ DATA EXPLORATION /
	ods graphics / reset width=7in height=5in imagemap imagefmt=svg;
	proc sgplot data=WORK.want;
	series x=date y=total_all ;
	series x=date y=total_male ;
	series x=date y=total_female ;
	xaxis label='Date' type=time grid;
	yaxis label="Total Mortality" grid;
	title "Total Mortality in the Netherlands over Time"; run;
	ods graphics / reset width=10in height=5in imagemap imagefmt=svg;

MJacobs1985 / Decomposition analysis

Last active September 30, 2021 08:53

Time-series analysis in SAS

	/ Decomposition Analysis/
	ods graphics / reset width=10in height=8in imagefmt=svg;
	proc sort data=WORK.MONTHLONG out=Work.preProcessedData;by variable date;run;
	proc timeseries data=Work.preProcessedData seasonality=12
	plots=(series histogram cycles spectrum corr acf wn decomp sa pcsa tcc sc ic)
	print=(descstats seasons decomp);
	where variable in ('65_80_all' '65_80_male' '65_80_female');
	by variable;
	id date interval=month;
	var value / accumulate=none transform=none dif=0 sdif=0;

MJacobs1985 / UCM Models

Created September 30, 2021 08:54

Time-series analysis in SAS

	/* Female 65-80 */
	ods noproctitle;
	ods graphics / imagemap=on;
	proc sort data=WORK.MONTH out=Work.preProcessedData;by date;run;
	proc ucm data=Work.preProcessedData;
	id date interval=month;
	deplag lags=(9 14 19 20);
	cycle order=2 plot=smooth;
	model '65_80_female'n;
	level plot=smooth;

MJacobs1985 / Libraries

Created September 30, 2021 18:24

How many samples do you need? -- Simulation in R

MJacobs1985 / Data import & exploration

Created September 30, 2021 18:25

How many samples do you need?

	#### DATA IMPORT & MANAGEMENT ####
	ARCdata <- read_csv("ARC_crosssectional_presentation.csv")
	ARCdata$gain<-ARCdata$End-ARCdata$Start
	str(ARCdata)
	ARCdata<-as.data.frame(ARCdata)
	ARCdata$Diet<-as.factor(ARCdata$Diet)
	ARCdata$Tank<-as.factor(ARCdata$Tank)
	#### DATA EXPLORATION ####
	## Look at data
	print(tapply(ARCdata$gain, ARCdata$Tank, mean, na.rm=TRUE))

MJacobs1985 / Model building

Created September 30, 2021 18:26

How many samples do you need?

	#### MODEL BUILDING ####
	fit<-lm(gain~Diet,data=ARCdata)
	summary(fit)
	unlist(summary(aov(fit)))[[9]] # significance of F value from ANOVA -->0.2288
	unlist(summary(aov(fit)))[[7]] # F value from ANOVA

MJacobs1985 / Post-hoc power analysis

Last active September 30, 2021 18:32

How many samples do you need

	#### POST-HOC POWER ANALYSIS ####
	## To obtain a power analysis, I need to look at the probability of obtaining a significant p-value (a<=0.05)
	## What I could do is use bootstrapping to estimate the population mean and standard deviation (sd)
	## Once I have the population mean and sd, I can then simulate 1000 trials, and look at the achieved 'power'
	k=(length(unique(ARCdata$Tank)))
	n=10000
	popmean<-matrix(data=NA, nrow=n, ncol=k)
	popsd<-matrix(data=NA, nrow=n, ncol=k)
	for (i in 1:n){
	ARCdatanew<-lapply(split(ARCdata, ARCdata$Tank),

MJacobs1985 / Simulate new trials

Created September 30, 2021 18:31

How many samples do you need

	## We now simulate 10k new trials, using population mean and sd, and look for the power -35 samples per time
	set.seed(1155)
	ARCdatasim<-ARCdata
	n=10000
	x=matrix(data=NA, nrow=n, ncol=1)
	power <- function(x){ sum(x<=0.05)/n *100}
	for (i in 1:n){
	ARCdatasim$gain[ARCdatasim$Tank==120]<-rnorm(35, mean=popmean[1], sd=popsd[1])
	ARCdatasim$gain[ARCdatasim$Tank==121]<-rnorm(35, mean=popmean[2], sd=popsd[2])
	ARCdatasim$gain[ARCdatasim$Tank==122]<-rnorm(35, mean=popmean[3], sd=popsd[3])

MJacobs1985 / Full simulation

Created September 30, 2021 18:35

How many samples do you need

	#### SIMULATION ESTIMATING THE INFLUENCE OF RANDOMLY SELECTING FISH TAKING INTO ACCOUNT THE EFFECT SIZE ####
	## Now, lets see what happens if we increase the population mean of a diet (so the true mean, not in a sample)
	## We keep the variation proportionate to the mean using the coefficient of variation (sd / mean)
	# First, change columns names of population mean and population sd to make sure we do not make mistakes
	samplemean<-tapply(ARCdata$gain, ARCdata$Tank, mean, na.rm=TRUE)
	samplesd<-tapply(ARCdata$gain, ARCdata$Tank, sd, na.rm=TRUE)
	popmean<-as.array(popmean)
	popsd<-as.array(popsd)
	for (i in 1:12){
	rownames(popmean)[i]<-rownames(samplemean)[i]

MJacobs1985 / Simulation check

Created September 30, 2021 18:35

How many samples do you need

	# Check if analyses went well, by looking at meas at different levels
	apply(x,3, mean) # if different then okay

	## Summarize power values
	power <- function(x){sum(x<=0.05)/n *100}
	powervalue<-as.data.frame(apply(x,c(2:3), FUN=power)) # provides for each effect size the power of each k
	powervalue<-powervalue[,c(0:51)]
	powervalue<-setDT(powervalue, keep.rownames = TRUE)[]
	powervalue<-as.data.frame(powervalue)
	colnames(powervalue)[1]<-"N"