Skip to content

Instantly share code, notes, and snippets.

View PoisonAlien's full-sized avatar
🕉️

Anand Mayakonda PoisonAlien

🕉️
View GitHub Profile
@PoisonAlien
PoisonAlien / createproject.sh
Last active November 28, 2021 11:41
A minimal project template directory structure that I use for my Bioinformatic projects
#!/usr/bin/env bash
#A minimal project template structure that I use for my Bioinformatic projects
#MIT License (Anand Mayakonda; [email protected])
function usage() {
echo "createproject.sh - Create a project template directory structure
Usage: createproject.sh [option] <project_name>
#Wrapper around goseq
#'@param assayedGenes total gene IDs that were measured
#'@param deGenes DE gene IDs
#'@param source_id Can be `ensGene` or `geneSymbol`
#'@param hyperGeo Dfault TRUE. Set to FALSE for rna-seq data
goseq_wrapper = function(assayedGenes, deGenes, source_id = "ensGene", hyperGeo = TRUE){
gene_vector = as.integer(assayedGenes %in% deGenes)
names(gene_vector)= assayedGenes
pwf = suppressWarnings(suppressMessages(goseq::nullp(DEgenes = gene_vector, genome = "hg19", id = source_id, plot.fit = FALSE)))
####################################################################################
#
# Best-practice 450k/EPIC QC and preprocessing workflow for the PPCG project
#
# creator: Pavlo Lutsik
#
# 30.01.2021
####################################################################################
library(RnBeads)
@PoisonAlien
PoisonAlien / ntcounts.c
Last active October 15, 2021 12:56
Tool to extract nucleotide counts at user specific loci
//A minimal program to extract nucelotide counts of selected genomic loci from the BAM file
//gcc -g -O3 -pthread ntcounts.c -lhts -Ihts -o ntcounts
//MIT License
//Copyright (c) 2021 Anand Mayakonda <[email protected]>
#include <unistd.h>
#include <stdio.h>
# Get the COSMIC variant file from here: https://cancer.sanger.ac.uk/cosmic/download (for. ex: CosmicCompleteTargetedScreensMutantExport.tsv.gz)
# You will have to register and sign in
# Readin only these selected columns: `Gene name GENOMIC_MUTATION_ID Mutation AA Mutation Description Mutation genome position SNP FATHMM prediction HGVSG`
cosm = data.table::fread(cmd = "zcat CosmicCompleteTargetedScreensMutantExport.tsv.gz | cut -f 1,17,21,22,26,28,30,40 | sed 1d | sort -k1,2", header = FALSE)
csom = cosm[!V2 %in% ""]
csom = csom[!V4 %in% "Substitution - coding silent"] #Remove silent variants
csom = csom[!V4 %in% ""] #Remove vars with no sub. type variants
csom[, id := paste0(V2, ":", V3)]
@PoisonAlien
PoisonAlien / compile_bwtool
Last active November 29, 2020 07:55
Compile bwtool
git clone 'https://github.com/CRG-Barcelona/bwtool'
git clone 'https://github.com/CRG-Barcelona/libbeato'
git clone https://github.com/madler/zlib
cd libbeato/
git checkout 0c30432af9c7e1e09ba065ad3b2bc042baa54dc2
./configure
make
cd ..
@PoisonAlien
PoisonAlien / geneCloud.r
Created May 21, 2020 09:39
Plots wordcloud from MAF object
#' Plots wordcloud.
#'
#' @description Plots word cloud of mutated genes or altered cytobands with size proportional to the event frequency.
#' @param input an \code{\link{MAF}} or \code{\link{GISTIC}} object generated by \code{\link{read.maf}} or \code{\link{readGistic}}
#' @param minMut Minimum number of samples in which a gene is required to be mutated.
#' @param col vector of colors to choose from.
#' @param top Just plot these top n number of mutated genes.
#' @param genesToIgnore Ignore these genes.
#' @param ... Other options passed to \code{\link{wordcloud}}
#' @return nothing.
@PoisonAlien
PoisonAlien / stat_problems.r
Last active April 17, 2020 08:29
Answers in R to frequently discussed statistical questions. Mostly derived from Coursera `An Intuitive Introduction to Probability`
#In a room of n people what is the probability of someone having a bday today
#This can be solved by 1 - prob of someone not havng bday today
# prob of someone not havng bday today = 364/365
# for n people this probabilty is same everyone. Hence these probability gets multiplied.
# 1 - above probability would the answer
#For n=253, thes probability reaches 50%
birthday_today = function(n){
1 - ((364/365)^n)
}
@PoisonAlien
PoisonAlien / somatictoolkit.sh
Created February 11, 2020 16:21
a command line toolkit for WGS/WXS data processing
#!/bin/bash
#----------------------------------------- Required files and binaries------------------------------------------------------------------------------
#for fq2bam
GATK="/home/csipk/NGS/gatk-protected-local.jar"
REGIONS="/usr/share/ref_genomes/hg19/agilent_v5_v3_merged_baits.bed"
REFFILE="/usr/share/ref_genomes/hg19/hg19.fa"
# one can obtain these from gatk ftp (knonw as gatk resource bundle)
@PoisonAlien
PoisonAlien / parse_rc.rs
Last active June 11, 2019 13:57
Parse output from bam-readcount
use std::io::{BufRead, BufReader};
use std::fs::File;
use std::env;
use std::process;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 2{