walterst’s gists

walterst / A_linear_mixed_models_microbiome.Rmd

Last active May 31, 2024 08:12

Linear mixed models with metadata filtering and data transformation

	# The initial part of this script has settings for filepaths, parameters, metadata.
	# many parts may need to be altered based upon input data changes, metadata fields used, etc.

	library('data.table')
	library('dtplyr')
	library('tidyverse')
	library('glmmTMB')
	library('ggplot2')
	library('broom')
	library('DHARMa')

walterst / MME_R_script_growth_modeling.txt

Last active November 15, 2023 14:58

This is an R script for fitting and plotting infants' growth (weight and height) from ages 0-3 with a modified Michaelis-Menten equation.

	# This code will read in the STARR heights and weight data that accompanied the article:
	# "A modified Michaelis-Menten equation estimates growth from birth to 3 years in healthy babies in the US"
	# The filepaths will need to be modified for the correct local filepath. dplyr and ggplot2, gplots, & gridExtra graphics
	# libraries are needed. Interpolation of weight/heights from a given age in days
	# would be done through the predict() function, passing the fitted model and a dataframe of days.
	# Subjects that fail to fit due to errors with nls() will be plotted as raw data, if errors occur.
	# Increase the default number_of_subjects_to_fit to 100 to see an example.

	library(dplyr)
	library(ggplot2)

walterst / parse_ipod_to_metadata.py

Last active March 15, 2019 10:11

Custom script used to parse tab delimited Ipod data, match up dates from tab-delimited QIIME mapping data, and write averages of data from multiple days on and prior to qiime metadata samples as metadata columns. This script uses a QIIME 1.9X environment for the parse_mapping_file function.

	#!/usr/bin/env python
	from __future__ import division
	# USAGE: python parse_ipod_to_metadata.py mapping_file days_to_consider ipod_tab_delim_file raw_output_file qiime_compatible_output_file
	# where days_to_consider counts the same-day as one of the days, and comma-seperated columns needs to be
	# an exact match to the field label in the ipod data file, e.g. Gastrointestinal_issues
	# All dates must be in the format of DD/MM/YY in the ipod source tab delimited data.


	from sys import argv
	from operator import itemgetter

walterst / random_subsample_fastq.py

Created December 17, 2018 16:08

Randomly subsamples a directory of fastq.gz files, writes out subsampled fastq files to output directory

	#!/usr/bin/env

	from sys import argv
	from random import random

	#from gzip import open as gz_open
	from glob import glob

	import gzip
	import os

walterst / find_fastq_errors.py

Last active April 17, 2018 12:57

Very simple fastq parser/checker to try and detect errors. assumes lines will be exactly (@Label, sequence, +, quality scores). Checks for expected chars at label/optional label, equal length of seq/qual.

	#!/usr/bin/env python

	# Used to find fastq seqs in gzipped files, write first error, if any, to a log file
	# Usage: python find_fastq_errors.py fastq_folder log_file
	# where fastq_folder has all of the fastq files in it-will search subdirectories

	from sys import argv
	from glob import glob

	import gzip

walterst / record_singletons.py

Created April 3, 2018 08:59

Use to count the number of singletons present in an QIIME OTU mapping file, write these sequence IDs to an output file.

	#!/usr/bin/env python

	"""Usage: python record_singletons.py X Y
	where X is the input OTU mapping file and Y is the output singleton sequence ID file.
	"""

	from sys import argv

	otu_mapping = open(argv[1], "U")
	singletons_out = open(argv[2], "w")

walterst / parse_otu_mapping_from_uc.py

Created April 3, 2018 08:02

Parses data from .uc files (tested with vsearch, should work with uclust/usearch too) to create an QIIME 1.X OTU mapping file.

	#!/usr/bin/env python

	""" This is modified from the bfillings usearch app controller

	usage: python parse_otu_mapping_from_uc.py X Y
	where X is the input .uc file, Y is the output OTU mapping file"""


	from sys import argv

walterst / get_rank_sorted_data.py

Created January 31, 2018 13:10

Generate rank/frequency (and log-transformed) data for OTU counts to match approach described in article listed in script text.

	#!/usr/bin/env python

	from sys import argv

	from operator import itemgetter
	from scipy.stats import rankdata
	from numpy import log

	from biom import load_table

walterst / filter_barcode_header.py

Last active November 16, 2017 22:28

Filters a barcode header to remove target characters, e.g. "+" character. Splits on target identifiers.

	#!/usr/bin/env python


	# Usage: python filter_barcode_header.py original_barcode_seqs.fastq new_barcode_seqs.fastq
	# WARNING-the second file specified will be overwritten if it exists!

	bc_start_indicator = "1:N:0:"
	chars_to_strip = ["+"]

	from sys import argv

walterst / count_zipped_fastq_reads.py

Created September 14, 2017 08:08

	#!/usr/bin/env python

	# Used to count fastq seqs in gzipped files, write counts and file name to log file
	# Usage: python count_zipped_fastq_reads.py fastq_folder log_file
	# where fastq_folder has all of the fastq files in it (doesn't search subdirectories)

	from sys import argv
	from glob import glob

	from cogent.parse.fastq import MinimalFastqParser

Tony walterst