Skip to content

Instantly share code, notes, and snippets.

@infotroph
infotroph / git-biggest-blobs.md
Last active February 14, 2019 20:44
Find the largest objects in a Git repository

So your Git repository is getting ungainly large. What's causing it? Are the problem files still being updated, or are they the deleted ghosts of old binaries?

git cat-file --batch-check --batch-all-objects --unordered \
  | sort --numeric-sort --key=3 \
  | tail -n10 \
  | xargs -L1 sh -c \
    'pth=`git describe --always $0`; \
 printf "%s %s %s %s\n" $0 $1 $2 $pth'
@infotroph
infotroph / gist:378bce83b589bf993b847e88e7566a14
Last active October 28, 2018 16:21
Filter all timepoints from uninteresting parameters

Have: several treatments measured across time, many parameters recorded

dat <- expand.grid(
	time = 1:3,
	trt = letters[1:2],
	param = LETTERS[1:3])
dat$value <- c(
	rep(1.0, 6), # A: constant, should drop
	rep(c(1.0, 1.5, 2.0), 2), # B: trts identical within days, should drop
@infotroph
infotroph / gist:4067c8371236baefdff7c4fe5e3150a1
Last active October 18, 2018 18:16
How prevalent is the practice of committing Rd files to Git?
# Do R packages on GitHub commit their Rd files?
# Searched all of GitHub for "RoxygenNote", ordered by recently indexed,
# clicked through on all DESCRIPTION files in first five pages of results,
# if a fork clicked through to source, if a source then
# checked whether man/ directory contains Rd files
pik-piam/madrat, n (Rd files are gitignored)
hemberg-lab/scfind, y
daewoooo/primatR, y
dleutnant/muasdown, y
# selflibrary/showsearch.R:
showsearch_lib <- function(){
print(search())
library(selflibrary)
print(search())
}
$ R -q -e 'selflibrary::showsearch_lib()' chrisb@morus:~/Desktop/selflibrary
> selflibrary::showsearch_lib()
[1] ".GlobalEnv" "package:stats" "package:graphics"
@infotroph
infotroph / gist:bf522f5959a0ff709377919131bf23d8
Created August 20, 2018 09:41
R package installation errors inside parallel make
root@7c782f3c2892:~# cat Makefile
all:
Rscript -e 'install.packages("getPass", type = "source")'
root@7c782f3c2892:~# make
Rscript -e 'install.packages("getPass", type = "source")'
Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/getPass_0.2-2.tar.gz'
Content type 'application/x-gzip' length 252439 bytes (246 KB)
@infotroph
infotroph / PEcAn_dependencies.md
Created June 28, 2018 13:52
Investigating dependencies for a large suite of tightly-coupled, non-CRAN R packages

What software do I need to have installed for a working copy of PEcAn?

Great question. Let's find out, with two big caveats.

  1. This approach will find components formally required by one or more of the PEcAn R packages. It will not tell us what dependencies are missing from the package descriptions, nor about any of PEcAn's non-R dependencies -- notably, the list it produces will not contain Postgres or any of the components of Bety. But we will get a list of the system libraries needed by each R package (e.g. RCurl depends on your OS's libcurl), at least to the extent that the packages declare them.

  2. Ironically, it only works on a system that already has all of PEcAn installed. If your machine is already in dependency hell, this probably won't help because R won't know how to find and recursively check the dependencies it doesn't yet have. But with some refinements, this approach could probably autogenerate a list of dependencies so that we can, say, mention new ones in the changelog.

For th

# OS X 10.13.4, Postgres 10.4
0> curl -o bety.sql.gz http://pecan.ncsa.illinois.edu/dump/betydump.psql.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167M 100 167M 0 0 7790k 0 0:00:21 0:00:21 --:--:-- 8289k
@infotroph
infotroph / gist:19513e684c97b576e24c8b1058b082ee
Last active January 28, 2018 19:44
Why not .data in sql filter?
library(tidyverse)
# local df works as expected with or without .data
mtcars %>% select(mpg) %>% filter(mpg > 33)
# mpg
# 1 33.9
mtcars %>% select(.data$mpg) %>% filter(.data$mpg > 33)
# mpg
# 1 33.9
@infotroph
infotroph / join_suffix.R
Last active January 14, 2018 13:32
join hangs on empty suffix
library(dplyr)
a = data.frame(id=1:3, x=1:3)
## local join: works as expected
inner_join(a, a, by="id", suffix=c(".1", ".2"))
# id x.1 x.2
# 1 1 1 1
# 2 2 2 2
@infotroph
infotroph / combos.R
Last active December 17, 2017 02:02
Find all combinations with cost in range
library(dplyr)
# invent some data
budget_min = 2e4
budget_max = 2.5e4
costs = round(rnorm(n=20, mean=1e4, sd=3e3))
names(costs) = LETTERS[1:20]