Skip to content

Instantly share code, notes, and snippets.

View PoisonAlien's full-sized avatar
🕉️

Anand Mayakonda PoisonAlien

🕉️
  • Heidelberg
  • 12:05 (UTC +01:00)
View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
---
title: "Sequencing run summary"
date: "Generated on: `r Sys.Date()`"
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
self_contained: yes
theme: sandstone
#!/usr/bin/env python3
#A simple script to predict gnomAD ancestry using PCA loadings trained on gnomAD V3 datasets
#See here for details: https://gnomad.broadinstitute.org/news/2021-09-using-the-gnomad-ancestry-principal-components-analysis-loadings-and-random-forest-classifier-on-your-dataset/
#Author: Anand Mayakonda
import sys
import os.path
import shutil
import argparse
@PoisonAlien
PoisonAlien / bwview.sh
Created July 27, 2023 09:13
subset a bigWig file
#!/usr/bin/env bash
#Script to subset a bigWig file for user specific loci
#MIT License (Anand Mayakonda; [email protected])
function usage (){
echo "Subset a bigWig file for genomic loci.
Requires UCSC kentutils bigWigToBedGraph and bedGraphToBigWig to be installed
Binaries available from: https://hgdownload.soe.ucsc.edu/admin/exe/
pipeline_dir = "./"
echo "Downloading VEP cache.." 1>&2
mkdir -p ${pipeline_dir}/resources/vep_cache/
cd ${pipeline_dir}/resources/vep_cache/
curl -O https://ftp.ensembl.org/pub/release-107/variation/indexed_vep_cache/homo_sapiens_vep_107_GRCh38.tar.gz
tar -xzf homo_sapiens_vep_107_GRCh38.tar.gz -C ./
wget https://ftp.ensembl.org/pub/release-107/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip -c Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz | bgzip > Homo_sapiens.GRCh38.dna.primary_assembly.fa.bgzip
@PoisonAlien
PoisonAlien / createproject.sh
Last active November 28, 2021 11:41
A minimal project template directory structure that I use for my Bioinformatic projects
#!/usr/bin/env bash
#A minimal project template structure that I use for my Bioinformatic projects
#MIT License (Anand Mayakonda; [email protected])
function usage() {
echo "createproject.sh - Create a project template directory structure
Usage: createproject.sh [option] <project_name>
#Wrapper around goseq
#'@param assayedGenes total gene IDs that were measured
#'@param deGenes DE gene IDs
#'@param source_id Can be `ensGene` or `geneSymbol`
#'@param hyperGeo Dfault TRUE. Set to FALSE for rna-seq data
goseq_wrapper = function(assayedGenes, deGenes, source_id = "ensGene", hyperGeo = TRUE){
gene_vector = as.integer(assayedGenes %in% deGenes)
names(gene_vector)= assayedGenes
pwf = suppressWarnings(suppressMessages(goseq::nullp(DEgenes = gene_vector, genome = "hg19", id = source_id, plot.fit = FALSE)))
####################################################################################
#
# Best-practice 450k/EPIC QC and preprocessing workflow for the PPCG project
#
# creator: Pavlo Lutsik
#
# 30.01.2021
####################################################################################
library(RnBeads)
@PoisonAlien
PoisonAlien / ntcounts.c
Last active October 15, 2021 12:56
Tool to extract nucleotide counts at user specific loci
//A minimal program to extract nucelotide counts of selected genomic loci from the BAM file
//gcc -g -O3 -pthread ntcounts.c -lhts -Ihts -o ntcounts
//MIT License
//Copyright (c) 2021 Anand Mayakonda <[email protected]>
#include <unistd.h>
#include <stdio.h>
# Get the COSMIC variant file from here: https://cancer.sanger.ac.uk/cosmic/download (for. ex: CosmicCompleteTargetedScreensMutantExport.tsv.gz)
# You will have to register and sign in
# Readin only these selected columns: `Gene name GENOMIC_MUTATION_ID Mutation AA Mutation Description Mutation genome position SNP FATHMM prediction HGVSG`
cosm = data.table::fread(cmd = "zcat CosmicCompleteTargetedScreensMutantExport.tsv.gz | cut -f 1,17,21,22,26,28,30,40 | sed 1d | sort -k1,2", header = FALSE)
csom = cosm[!V2 %in% ""]
csom = csom[!V4 %in% "Substitution - coding silent"] #Remove silent variants
csom = csom[!V4 %in% ""] #Remove vars with no sub. type variants
csom[, id := paste0(V2, ":", V3)]