Skip to content

Instantly share code, notes, and snippets.

View rmflight's full-sized avatar

Robert M Flight rmflight

View GitHub Profile
@rmflight
rmflight / fit_predict_lm.R
Created May 24, 2019 15:10
fitting and predicting models using just coefficients
fit_square_root = function(x, y){
sr_x = sqrt(x)
# assumes you have an intercept, so need to add the ones to the matrix
X = matrix(c(rep(1, length(x)), sr_x), nrow = length(x), ncol = 2, byrow = FALSE)
sr_fit = stats::lm.fit(X, y)
names(sr_fit$coefficients) = NULL
sr_fit$coefficients
}
@rmflight
rmflight / purrr_vs_furrr.R
Created April 24, 2019 02:11
using purrr or furrr
# so I have this code in two different packages now, and I'm thinking of making a single
# package that they (and other packages) could easily depend on, that lets the developer
# enable the use of furrr::future_map when a user has multi-processing available.
#
# basically the way this works right now, is after loading the package, if you want to use furrr::future_map
# you do:
# set_internal_map(furrr::future_map)
# plan(multiprocess)
# and magically you have multiprocessing everywhere I have
# internal_map$map_function()
@rmflight
rmflight / remove_duplicate_rows_columns.R
Last active November 11, 2018 19:16
removing duplicate entries across rows and columns
ex_data = data.frame(A = c("A", "C", "E", "F", "G", "H", "I"),
B = c("B", "D", "A", "E", "I", "J", "K"),
C = "C",
stringsAsFactors = FALSE)
irow = 2
consider_cols = c("A", "B")
all_entries = unlist(ex_data[1, consider_cols], use.names = FALSE)
while (irow <= nrow(ex_data)) {
message(c(irow, nrow(ex_data)))
x = rnorm(500)
library(microbenchmark)
microbenchmark(
replicate(5000, sample(x)),
do.call(c, purrr::map(seq(1, 5000), function(.x){sample(x)}))
)
#Unit: milliseconds
#expr
#replicate(5000, sample(x))
@rmflight
rmflight / last_modified_files.R
Created March 28, 2018 02:24
display X modified files
#!/usr/bin/Rscript
#
# Installation:
#
# Copy this file to an accessible location, and then do a chmod u+x last_modified_files
#
# Make sure you have docopt installed: install.packages("docopt")
#
# License: MIT. Copyright Robert M Flight, 2018.
#
@rmflight
rmflight / social_media_nsf.md
Last active September 19, 2017 15:29
Social media activities for NSF Synergistic Activities

Communication and Education via Social Media

I actively maintain Twitter and GitHub accounts, as well as a blog to interact with other scientists both junior and senior, and provide links to works that may be useful to others, as well as reply to questions about programming in the R statistical language, often advising users I do not know who use the #rstats hashtag. Interactions on Twitter directly resulted in my (and others) suggesting improvements to the draft version of Ten Simple Rules for Taking Advantage of Git and GitHub in PLOS Comp Bio, resulting in my co-authorship of a paper with 19,000 views and 9 citations. My Twitter activities also directly lead to my involvement in the ROpenSci organization. I was accepted to attend the ROpenSci un-conference in May 2017, which resulted in the generation of the testRmd R package (authored with 3 others), as well as my open review of the gitlabr R package, which is expected to improve an R package already used by many to interface with the GitLab softwa

@rmflight
rmflight / non_working.Rmd
Created August 14, 2017 19:22
working vs non-working line passing Rmd documents
---
title: "Vignette Title"
author: "Vignette Author"
package: PackageName
output:
BiocStyle::html_document2
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
@rmflight
rmflight / .gitlab-ci.yml
Last active June 27, 2017 13:40
gitlab-ci-failing-r-tests
testit:
script:
- R CMD INSTALL .
- Rscript run_tests.R
@rmflight
rmflight / tidy_colMeans.R
Created October 5, 2016 18:58
tidy colMeans
data <- tbl_df(data.frame(values = rnorm(100), id = rep(c("a", "b"), 50)))
data
group_by(data, id) %>% summarise(mean = mean(values))
> library(UpSetR)
> library(org.Hs.eg.db)
> all_genes <- keys(org.Hs.eg.db)
> n_gene <- c(2000, 500, 1000, 900)
> # create a list, where each entry is the vector of Gene IDs that were diff
> # expressed in that condition