Skip to content

Instantly share code, notes, and snippets.

View tomsing1's full-sized avatar

Thomas Sandmann tomsing1

View GitHub Profile
@tomsing1
tomsing1 / s3_permissions.R
Last active August 31, 2022 17:07
Using the paws R package to simulate / check get, put and delete actions on AWS S3
#' Retrieve temporary AWS authentication key, secret key and token
#'
#' Saturn Cloud instances use web authentication to obtain temporary
#' access to AWS resources, e.g. S3 buckets, instead of environmental variables.
#' But there are situations where the typical `AWS_ACCESS_KEY_ID`,
#' `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN` environmental variables are
#' needed, e.g. to authenticate with tools that don't support web
#' authentication such as the `aws.s3` and `paws` R packages. This function
#' retrieves the temporary variables (but does not export them as environmental
#' variables).
@tomsing1
tomsing1 / rentrez_geo_biosample.R
Created June 3, 2022 00:50
Querying NCBI GEO and Biosample databases from R using the rentrez package
library(glue)
library(purrr)
library(rentrez)
library(snakecase)
library(tidyr)
library(xml2)
rentrez::entrez_dbs() # for reference: available databases
# query NCBI GEO for information about Series GSE178265
@tomsing1
tomsing1 / rootograms.R
Created June 1, 2022 02:02
R script demonstrating how to plot rootograms in R
library(vcd)
library(patchwork)
set.seed(123)
# we make 4 independent draws from a poisson
# distribution, fit the poisson distribution
# and then draw the rootogram to compare observed
# and fitted values.
plots <- lapply(1:4, function(draw) {
@tomsing1
tomsing1 / reactome_dplyr.R
Created April 6, 2022 23:10
Extracting gene sets from the Bioconductor reactome.db R package's SQLite backend with dplyr
#' Retrieve Reactome sets of Entrez identifiers for a selected species
#'
#' @param species Scalar character, the species of interest, e.g. `Homo sapiens`
#' @importFrom dplyr tbl right_join select collect mutate check_dbplyr
#' @importFrom glue glue_sql glue
#' @importFrom checkmate assert_choice
#' @export
#' @return A named list of Entrez identifiers
#' @examples
#' ReactomeSets("Mycobacterium tuberculosis")
@tomsing1
tomsing1 / kegg_and_go_gene_sets.R
Created March 24, 2022 23:34
Listifying KEGG and GO gene sets
library(limma)
library(AnnotationDbi)
library(GO.db)
library(org.Hs.eg.db)
# KEGG
kegg.names <- getKEGGPathwayNames("hsa", remove.qualifier = TRUE)
kegg.sets <- getGeneKEGGLinks("hsa", convert = TRUE)
kegg.gsc <- with(kegg.sets, split(GeneID, PathwayID))
names(kegg.gsc) <- kegg.names[
@tomsing1
tomsing1 / ena_rest.R
Last active March 10, 2022 22:56
Accessing ENA's REST APIs from R
library(checkmate)
library(dplyr)
library(glue)
library(htmltidy)
library(httr)
library(purrr)
library(xml2)
# https://ena-docs.readthedocs.io/en/latest/submit/general-guide/accessions.html
identify_accession_type <- function(accessions) {
@tomsing1
tomsing1 / ena_apis.md
Created March 10, 2022 02:03
Notes on ENA's REST apis for computational retrieval of NGS metadata

ENA APIs

The ENA has multiple APIs. The most important ones are:

  1. ENA Portal API: search ENA's databases using (potentially complex) queries.
  2. ENA Browser API: retrieve entire records programmatically

In addition, quick summaries of metadata and file retrieval locations can be retrieved

@tomsing1
tomsing1 / read_write_sqlite_with_dm.R
Created February 7, 2022 02:28
Reading and writing a SQLite database in R with the dm R package
library(DiagrammeR) # must be v 1.0.6.1 https://github.com/cynkra/dm/issues/823
library(dm)
library(RSQLite)
# download and decompress the chinook example SQLite database
zip_file <- tempfile(fileext = ".zip")
download.file(
"https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip",
destfile = zip_file)
chinook <- unzip(zip_file, exdir = tempdir())
@tomsing1
tomsing1 / mapping_read_with_subread_or_sublong.R
Created January 11, 2022 01:11
Mapping long read to a genome in R with the RSubread Bioconductor package
library(BSgenome.Hsapiens.UCSC.hg38)
library(Biostrings)
library(Rsubread)
library(GenomicRanges)
library(parallel)
library(GenomicAlignments)
kCores <- parallel::detectCores() - 1L
kQuery <- GRanges(seqnames = "chr12", IRanges(40263807, 40264221))
@tomsing1
tomsing1 / read_write_h5ad.R
Created November 5, 2021 19:12
Writing and reading h5ad files from R using the zellkonverter Bioconductor R package
libraries = c("zellkonverter", "SingleCellExperiment")
for (lib in libraries) {
suppressPackageStartupMessages(library(lib, character.only = TRUE,
quietly = TRUE))
}
# dummy SingleCellExperiment
ncells <- 100
u <- matrix(rpois(20000, 5), ncol = ncells)
v <- log2(u + 1)