Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / A_linear_mixed_models_microbiome.Rmd
Last active May 31, 2024 08:12
Linear mixed models with metadata filtering and data transformation
# The initial part of this script has settings for filepaths, parameters, metadata.
# many parts may need to be altered based upon input data changes, metadata fields used, etc.
library('data.table')
library('dtplyr')
library('tidyverse')
library('glmmTMB')
library('ggplot2')
library('broom')
library('DHARMa')
@walterst
walterst / MME_R_script_growth_modeling.txt
Last active November 15, 2023 14:58
This is an R script for fitting and plotting infants' growth (weight and height) from ages 0-3 with a modified Michaelis-Menten equation.
# This code will read in the STARR heights and weight data that accompanied the article:
# "A modified Michaelis-Menten equation estimates growth from birth to 3 years in healthy babies in the US"
# The filepaths will need to be modified for the correct local filepath. dplyr and ggplot2, gplots, & gridExtra graphics
# libraries are needed. Interpolation of weight/heights from a given age in days
# would be done through the predict() function, passing the fitted model and a dataframe of days.
# Subjects that fail to fit due to errors with nls() will be plotted as raw data, if errors occur.
# Increase the default number_of_subjects_to_fit to 100 to see an example.
library(dplyr)
library(ggplot2)
@walterst
walterst / parse_ipod_to_metadata.py
Last active March 15, 2019 10:11
Custom script used to parse tab delimited Ipod data, match up dates from tab-delimited QIIME mapping data, and write averages of data from multiple days on and prior to qiime metadata samples as metadata columns. This script uses a QIIME 1.9X environment for the parse_mapping_file function.
#!/usr/bin/env python
from __future__ import division
# USAGE: python parse_ipod_to_metadata.py mapping_file days_to_consider ipod_tab_delim_file raw_output_file qiime_compatible_output_file
# where days_to_consider counts the same-day as one of the days, and comma-seperated columns needs to be
# an exact match to the field label in the ipod data file, e.g. Gastrointestinal_issues
# All dates must be in the format of DD/MM/YY in the ipod source tab delimited data.
from sys import argv
from operator import itemgetter
@walterst
walterst / random_subsample_fastq.py
Created December 17, 2018 16:08
Randomly subsamples a directory of fastq.gz files, writes out subsampled fastq files to output directory
#!/usr/bin/env
from sys import argv
from random import random
#from gzip import open as gz_open
from glob import glob
import gzip
import os
@walterst
walterst / find_fastq_errors.py
Last active April 17, 2018 12:57
Very simple fastq parser/checker to try and detect errors. assumes lines will be exactly (@Label, sequence, +, quality scores). Checks for expected chars at label/optional label, equal length of seq/qual.
#!/usr/bin/env python
# Used to find fastq seqs in gzipped files, write first error, if any, to a log file
# Usage: python find_fastq_errors.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it-will search subdirectories
from sys import argv
from glob import glob
import gzip
@walterst
walterst / record_singletons.py
Created April 3, 2018 08:59
Use to count the number of singletons present in an QIIME OTU mapping file, write these sequence IDs to an output file.
#!/usr/bin/env python
"""Usage: python record_singletons.py X Y
where X is the input OTU mapping file and Y is the output singleton sequence ID file.
"""
from sys import argv
otu_mapping = open(argv[1], "U")
singletons_out = open(argv[2], "w")
@walterst
walterst / parse_otu_mapping_from_uc.py
Created April 3, 2018 08:02
Parses data from .uc files (tested with vsearch, should work with uclust/usearch too) to create an QIIME 1.X OTU mapping file.
#!/usr/bin/env python
""" This is modified from the bfillings usearch app controller
usage: python parse_otu_mapping_from_uc.py X Y
where X is the input .uc file, Y is the output OTU mapping file"""
from sys import argv
@walterst
walterst / get_rank_sorted_data.py
Created January 31, 2018 13:10
Generate rank/frequency (and log-transformed) data for OTU counts to match approach described in article listed in script text.
#!/usr/bin/env python
from sys import argv
from operator import itemgetter
from scipy.stats import rankdata
from numpy import log
from biom import load_table
@walterst
walterst / filter_barcode_header.py
Last active November 16, 2017 22:28
Filters a barcode header to remove target characters, e.g. "+" character. Splits on target identifiers.
#!/usr/bin/env python
# Usage: python filter_barcode_header.py original_barcode_seqs.fastq new_barcode_seqs.fastq
# WARNING-the second file specified will be overwritten if it exists!
bc_start_indicator = "1:N:0:"
chars_to_strip = ["+"]
from sys import argv
#!/usr/bin/env python
# Used to count fastq seqs in gzipped files, write counts and file name to log file
# Usage: python count_zipped_fastq_reads.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it (doesn't search subdirectories)
from sys import argv
from glob import glob
from cogent.parse.fastq import MinimalFastqParser