Leandro Hermida hermidalc

cancer data science / bioinformatics / computational biology / genomics

hermidalc / tcga_impute_sandbox.R

Created September 6, 2024 16:06

Testing imputation methods on GDC TCGA clinical data

	library(missForest)
	library(mice)
	library(ggplot2)
	library(ggmice)

	input_df <- gdc_case_meta[
	c("project_id", "gender", "age_at_diagnosis", "tumor_stage")
	]

	input_df$project_id <- factor(input_df$project_id)

hermidalc / ml_tmm_tpm.md

Last active September 3, 2024 11:48

Using sklearn-extensions to perform edgeR TMM-TPM normalization in your python ML code

Download and install Mambaforge

On Linux:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh

hermidalc / column_selector.py

Last active September 3, 2024 11:48

scikit-learn compatible ColumnSelector class

	# if column selection on feature names X must be pandas df
	# if used in Pipeline must be the first step or you have no
	# feature selection step before it and you can then still
	# use col indices

	import warnings
	import numpy as np
	from sklearn.base import BaseEstimator
	from sklearn.utils import check_X_y
	from sklearn.feature_selection import SelectorMixin

hermidalc / analyze_rna_seq_ml_nested_cv.py

Last active September 3, 2024 11:48

Building and evaluating ML models of RNA-seq count data using nested CV

	import atexit
	import os
	import re
	import sys
	from argparse import ArgumentParser
	from decimal import Decimal
	from glob import glob
	from pprint import pprint
	from shutil import rmtree
	from tempfile import mkdtemp, gettempdir

hermidalc / analyze_rna_seq_gsea_preranked.R

Created May 16, 2021 22:01

RNA-seq differential gene expression GSEA preranked analysis

	suppressPackageStartupMessages({
	library(Biobase)
	library(data.table)
	library(edgeR)
	library(fgsea)
	library(msigdbr)
	library(ggplot2)
	})

	set.seed(777)

hermidalc / analyze_rna_seq_batch_effects.R

Last active September 3, 2024 11:48

RNA-seq batch effect analysis, plotting, and batch effect removal with DESeq2, edgeR, limma

	suppressPackageStartupMessages({
	library(Biobase)
	library(DESeq2)
	library(EDASeq)
	library(edgeR)
	library(limma)
	library(RColorBrewer)
	})

	fig_dim <- 5

hermidalc / analyze_rna_seq_diff_expr.R

Last active September 3, 2024 11:49

RNA-seq normalization, differential expression, transformation, volcano plotting with DESeq2, edgeR, limma-voom

	suppressPackageStartupMessages({
	library(Biobase)
	library(DESeq2)
	library(edgeR)
	library(EnhancedVolcano)
	library(limma)
	})

	fc <- 1.0
	lfc <- log2(fc)

hermidalc / keybase.md

Created September 11, 2020 18:00

I hereby claim:

To claim this, I am signing this object:

hermidalc / transform_feature_meta.py

Last active September 3, 2024 11:47

Inspect any scikit-learn fitted Pipeline to transform a feature metadata pandas DataFrame through the Pipeline and add model interpretation.

	def transform_feature_meta(pipe, feature_meta):
	transformed_feature_meta = None
	for estimator in pipe:
	if isinstance(estimator, ColumnTransformer):
	for _, trf_pipe, trf_columns in estimator.transformers_:
	if isinstance(trf_pipe, str) and trf_pipe == 'drop':
	trf_feature_meta = feature_meta.iloc[
	~feature_meta.index.isin(trf_columns)]
	elif ((isinstance(trf_columns, slice)
	and (isinstance(trf_columns.start, str)