Skip to content

Instantly share code, notes, and snippets.

View ashenkin's full-sized avatar

Alexander Shenkin ashenkin

View GitHub Profile
@ashenkin
ashenkin / coauthor_list_generator.r
Last active March 22, 2023 21:11
R script to generate co-author lists often required by funding agencies
# Alexander Shenkin 2023.
# License: CC BY 4.0: https://creativecommons.org/licenses/by/4.0/. TLDR; Share and adapt with attribution.
#
# This script reads in a csv file of publications, and produces a list of coauthors
# 1) To get the csv of publications, download a bibtex or other archive of desired pubs from ORCID or another publication search tool.
# Note: don't use Google Scholar! They limit the number of co-authors exported.
# 2) Import that bibtex or other format archive into Zotero
# 3) Export those imported references in a .csv format.
#
# This does not include institutions of the coauthors, unfortunately. That still has to be done manually.
@ashenkin
ashenkin / unscale.r
Last active June 12, 2017 10:57
A function to reverse the process of scale()'ing a vector in R
unscale <- function(scaled, scale, center) {   
# provide either scale & center, or a scaled vector with the proper attributes   
if (missing(scale) | missing(center)) {       
stopifnot( c("scaled:center", "scaled:scale") %in% names(attributes(scaled)) )       
scale = attr(scaled, "scaled:scale")       
center = attr(scaled, "scaled:center")       
attr(scaled, "scaled:scale") <- NULL       
attr(scaled, "scaled:center") <- NULL   
}   
unscaled = scaled * scale + center   
@ashenkin
ashenkin / set_dev_contrasts.r
Created July 28, 2016 11:04
Deviation (sum to zero) contrasts are important to use in models when one is interested in grand mean parameters and per-level deviations from that mean. However, the usual method of coding these contrasts in R ends up throwing away the associated variable names. As long as you understand what deviation contrasts do, there is no danger in mainta…
contr.sum.keepnames <- function(...) {
# make deviation contrasts that don't lose the names of the factors in the model results
# from https://stackoverflow.com/questions/10808853/why-does-changing-contrast-type-change-row-labels-in-r-lm-summary
conS <- contr.sum(...)
colnames(conS) = rownames(conS)[-length(rownames(conS))]
conS
}
set_dev_contrasts <- function(df, colname = "site") {
# Set contrasts to "deviation coding" so site effects are as compared to overall mean across all sites. I.e., sites together should have a mean 0 effect.
@ashenkin
ashenkin / predict_scaled_glmer.r
Created July 27, 2016 09:51
How to predict results from lme4's glmer when fit with scaled data
# We often fit LMM/GLMM's with scaled variables. However, making predictions using those models isn't straightforward (at least to me!)
# It turns out that you have to re-scale your prediction data using the same parameters used to scale your original data frame used to fit the model
# See below, and pay special attention to the section where the new data are rescaled.
library(lme4)
library(VGAM)
reps = 3000
dbh = rexp(reps); dbh = dbh/max(dbh) * 100
@ashenkin
ashenkin / make_elevation_profile_swaths.r
Last active March 18, 2016 15:31
You have a digital elevation model (DEM), and you want to make an elevation profile between two or more points. You want that elevation profile to reflect an average elevation across a swath, and not just a single point from a line. Here's one way to do it in R. Thanks to Forrest Stevens and other folks from the R-sig-geo list.
# Thanks to Forrest Stevens. Some of the code here borrowed from him here: https://github.com/ForrestStevens/Scratch/blob/master/swath_slices.R
library(raster)
library(rgdal)
library(sp)
library(rgeos)
library(gtools)
library(ggplot2)
library(plyr)
library(zoo)
@ashenkin
ashenkin / query_higher_taxa_classes.r
Created November 29, 2015 10:14
Efficiently get higher taxon names (family, order, and subdivision) when you know genus or family. Merges NCBI and ITIS database returns.
query_higher_taxa_classes <- function(species_list, known = "genus", order = c("dataframe", "unique_sp")) {
# Pass in a character vector of species, genera, families, or whatever (the search is flexible)
# Returns a dataframe with the columns: query, db, family, order, subdivision
# The dataframe returned is guaranteed to be in the same order as the species list passed in if order is "dataframe"
order = match.arg(order)
library(taxize)
library(plyr)
species_list = sub("^([^ ]*).*$","\\1",species_list) # just take the top level name before the space
# remove short names that clog the taxon query - replace later
@ashenkin
ashenkin / na.omit.somecols.r
Created November 29, 2015 10:10
When you want to omit rows in your dataframe that have NA's in particular colums.
na.omit.somecols <- function(data, noNAsInTheseCols, allOutputCols = names(data)) {
# usage: na.omit.somecols(my_dataframe, c("col1", "col2")). You can also supply a vector of names (allOutputCols) if you just want certain columns returned.
completeVec <- complete.cases(data[, noNAsInTheseCols])
return(data[completeVec, allOutputCols])
}