Skip to content

Instantly share code, notes, and snippets.

@PeteHaitch
PeteHaitch / resursiveIntersection.r
Created August 25, 2013 22:38
An R function to recursively find the intersection of a list of GRanges objects. In fact, will likely work for a list of other objects but this has not been tested.
#### Function to do a recursive intersection of a list of GRanges (x) ####
recursiveIntersect <- function(x){
if (length(x) > 2){
print(paste0('Recursing (length = ', length(x), ')'))
z <- intersect(x[[1]], x[[2]])
x <- x[seq.int(3, length(x))]
x <- c(x, z)
recursiveIntersect(x)
} else if (length(x) == 2){
print(paste0('Base case (length = ', length(x), ')'))
#### Modified version of Dirk's impoved ls() function and shortcut (http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session) ####
# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
decreasing=FALSE, head=FALSE, n=5) {
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
@PeteHaitch
PeteHaitch / faster_duplicated_method.R
Last active August 29, 2015 14:01
Reproducible example for my question to the "R and C++" Google Group
## Create some test data (a matrix)
# The matrix, x, has at least 3 columns, all of which contain integers.
# The first column is an integer-encoding of a chromosome and so there are approximately 20-30 unique values.
# The second column is an integer-encoding of a genomic strand and so there are at most 3 unique values (representing positive, negative or unknown/irrelevant).
# The remaining columns are genomic positions, which are integers in the range of approximately 1-250,000,000.
# sim_data adds 'd' duplicates as the last 'd' rows
# n is the number of rows
# m + 2 is the number of columns. m = 1 is the minimum.
# d is the number of duplicates added to the end of the matrix
# sim_strand is whether the strand is simulated (column 2 of the matrix)
@PeteHaitch
PeteHaitch / rafa_par.R
Created May 15, 2014 06:25
Rafael Irizarry's default `par` for R. From a discussion on the Simply Stats about the paste0 function: http://disq.us/8ier3p
library(RColorBrewer)
mypar <- function(a = 1, b = 1, brewer.n = 8, brewer.name = "Dark2",...){
par(mar=c(2.5,2.5,1.6,1.1),mgp=c(1.5,.5,0))
par(mfrow=c(a,b),...)
palette(brewer.pal(brewer.n,brewer.name))
}
@PeteHaitch
PeteHaitch / mergeDT.R
Created June 25, 2014 08:37
Merge a list of data.tables
# Could be generalised to handle the full arguments of merge.data.table but I've kept it simple.
# mergeDT based on http://r.789695.n4.nabble.com/merge-multiple-data-frames-td4331089.html
# Takes a (named) list of data.tables (lodt) where all columns are common to all data.tables
# The key of each table is the same but is only a subset of the columns, e.g. (chr, pos1, pos2)
# The remaining columns of each data.table are the "counts", e.g. (MM, MU, UM, UU)
# We append the names of each sample (the names of lodt) to the "counts" so that we can keep
# track of from which sample the counts came.
mergeAll <- function(lodt) {
dotNames <- lapply(lodt, names)
repNames <- Reduce(intersect, dotNames)
@PeteHaitch
PeteHaitch / 0_reuse_code.js
Created July 22, 2014 07:00
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
## Necessary packages (BioC-devel)
library(GenomicRanges)
library(S4Vectors)
## Class A using a DataFrameOrNULL in internalPos slot
setClassUnion(name = "DataFrameOrNULL", members = c("DataFrame", "NULL"))
setClass("A",
contains = "GRanges",
representation(
@PeteHaitch
PeteHaitch / SummarizedExperiment_assay_names.R
Last active August 29, 2015 14:10
Benchmarking accessing names of assays in SummarizedExperiment object.
library(GenomicRanges)
library(microbenchmark)
nrows <- 2000000; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowData <- GRanges(rep(c("chr1", "chr2"), c(0.25 * nrows, 0.75 * nrows)),
IRanges(floor(runif(nrows, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), nrows, TRUE))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
@PeteHaitch
PeteHaitch / combining-SummarizedExperiment-objects.md
Last active July 22, 2023 11:06
Developing a method for combining SummarizedExperiment objects

Combining SummarizedExperiment objects

Peter Hickey
20 October 2015

Motivation

I often find myself with multiple SE objects (I'm using SE as a shorthand for the SummarizedExperiment0 and RangedSummarizedExeriment classes), each with potentially non-distinct samples and potentially non-overlapping features/ranges. Currently, it is difficult to combine these objects; rbind() can only combine objects with the same samples but distinct features/ranges and cbind() can only combine objects with the same features/ranges but distinct samples. I think it would be useful to have a "combine" method for SE objects that handles the most general situation where each object has potentially non-distinct samples and potentially non-overlapping features/ranges.

@PeteHaitch
PeteHaitch / bioc-mirror-troubleshoot.md
Last active October 19, 2015 01:42
Troubleshooting Bioconductor GitHub mirror setup
# (1) Using existing repo
git clone [email protected]:PeteHaitch/GenomicTuples.git
cd GenomicTuples
curl -O https://raw.githubusercontent.com/Bioconductor/mirror/master/update_remotes.sh
bash update_remotes.sh
# Bump Version and Date in DESCRIPTION (v1.5.2)
# NOTE: Version isn't up-to-date with SVN; why?
git add DESCRIPTION
git commit -m "Bump version number"