Skip to content

Instantly share code, notes, and snippets.

View simon-anders's full-sized avatar

Simon Anders simon-anders

  • University of Heidelberg
  • Heidelberg, Germany
View GitHub Profile
---
format:
html:
embed-resources: true
---
# Density estimation for single-cell gene expression with MCMC
Simon Anders, 2024-10-23
library( rstan )
options(mc.cores=6)
n <- 1000
s <- round( 10^rnorm( n, 3, .5 ) )
fracs <- 10^ifelse( runif(n)<.7, rnorm( n, -3.3, .4 ), rnorm( n, -1.7, .2 ) )
k <- rpois( n, fracs*s )
model <- stan_model( model_code="
data {
# Make example data
n <- 15
x <- runif( n, 0, 10 )
y <- sin(x) + rnorm( n, sd=.1 )
# Make grid to plot smooth curves
xg <- seq( 0, 10, length.out=1000 )
# Determine knot positions:
import gzip, random
# Load FASTQ file for Chromosome 10 from GRCm38
with gzip.open("data/Mus_musculus.GRCm38.dna.chromosome.10.fa.gz") as f:
firstline = f.readline()
assert firstline.startswith(b'>')
chrom_seq = b"".join(l.rstrip() for l in f)
# This here is the file from papagei:mnt/raid/scnmt_data/CpG_filtered
cpg = scipy.sparse.load_npz( "data/CpG_10.npz" ).tocoo()
## Beispiel für Plot-Größe
Tidyverse laden:
```{r}
library( tidyverse )
```
Wir benutzen `mtcars`, eine Standard-Beispiel-Tabelle von R mit technischen Daten
für (recht alte) Autos:
@simon-anders
simon-anders / pca.R
Created April 28, 2021 14:24
Imputation via PCA
library( irlba )
m <- 10000 # nbr of features (rows)
n <- 5000 # nbr of cells (colums)
r <- 5 # nbr of latent components
## The true latent values
# True importance of latent factors
true_importance <- c( 1, .8, .4, .2, .1 )
# Wir haben 300 Stämme
m <- 300
# Die wahre mittlere Floureszenz der Stämme ist
true_mu <- exp( rnorm( m, 3, 2) )
# Die mittlere Hintergrund-Floureszenz ist
true_bg <- 10
# Die Hintergrund-Floureszenz schwankt mit einer Standardabweichung von
# So zieht man 10000 Werte mit Mittelwert 178 und Standardabweichung 7:
rnorm( 10000, 178, 7 ) -> x
# Und so plotted man das Histogramm aller Werte in x
library( tidyverse)
tibble(x) %>% ggplot + geom_histogram(aes(x))
library( jrc )
library( rlc )
myfun <- function(x) { print( paste( "user clicked on", x ) ) }
rlc::openPage(useViewer=FALSE)
jrc::allowFunctions( "myfun" )
genes <- c( "gene1", "gene2", "gene3" )

In order to estimate transduction efficiency from sc-RNA-Seq data, we use the following model: We assume that a non-transduced cell expresses NeoR such that an expected fraction $\mu^\text{R}$ of its UMIs maps to this gene. For each individual cell $j$, the actual expression strength of the gene varies around this expectation with some coefficient of variation, which we denote $\alpha^R$.